CSS 343: Notes from Lecture 11 (DRAFT)

Administrivia

midterm Thursday
- office hours today after class
- no office hours Wednesday
assignment 1
- marks distributed via dropbox
- will post rubrick some time after midterm

Recap

a graph is a set of vertices and a set of edges G: (V, E).
informally, graphs can be thought of as a diagram of bubbles and arrows
vertices (nodes/bubbles) can be represented by a class
edges (links/arrows) can be represented by class or a vertex pair (if undecorated)
edges may be stored by an adjacency matrix or by a per-vertex list
- generally, algorithms work better if the edges of a vertex are represented by a list rather than an adjacency matrix
- adjacency matrix is faster for determining whether a given pair of vertices share an edge
graphs may be directed or undirected
a graph has a cycle if there exists a non-empty path from some vertex back to itself
- tree: directed or undirected graph with at most one path between any two vertices
- DAG (Directed Acyclic Graph): directed graph (without cycles) which may have multiple paths between some pair of vertices
many interesting graph problems/algorithms
- some are computationally expensive
- graph coloring (such that no two vertices sharing an edge have the same color) is one of may problems that are NP-complete
depth-first & breadth-first search/traversal
- depth-first: stack (or recursive)
- breadth-first: queue
search vs. traversal

Finding Connected Components

To find the connected start with a set unvisited of all vertices. Removee a vertex from the list and add it to a a new set representing the next connected component. Perform a depth-first or breadth-first search from that vertex. All reachable nodes are added to the current connected component, until no more vertices can be reached. Repeat until unvisited is empty. Run time is O(|V| + |E|).

Removal from the unvisited set is O(1). The set can be represented as a (double-) linked list or array/vector. If representing the set as an array, the vertex requires a field to hold the array index and the removed element is swapped with the last element.

Network Flow Problem

We just scratch the surface of graph theory. There are numerous interesting graph problems and algorithms. Flow is a problem useful for computer-network data traffic, power transmission, pipelines, and shipping logistics.

In a network flow problem, there are two distinguished nodes, source and sink. Each edge has a numeric value, capacity. for every edge other than the source and sink, the flow out must equal the flow in. The objective is to achieve the maximum flow from source to sink.

TODO: example diagram

Shortest Path Problem

In an unweighted graph, the shortest path between two vertices can be found via breadth-first search.

A weighted graph associates a (non-negative) cost, or weight, with each edge. The weighted-graph shortest path problem sums the weights along the edges of the path and seeks the path between a pair of nodes with the minimum total weight. It is not necessarily the most direct path.

If we need to disambiguate, we can distinguish between shortest path and lowest-cost path.

TODO: diagram

Dijkstra's Algorithm

Dijkstra's algorithim is one of many shortest-path algorithms and is an example of a hill-climbing, or greedy algorithm.

This algorithm requires non-negative weights. Other algorithms may work in the presence of negative weights, but no algorithm will work on a graph that has a negative cycle (min-cost path would be undefined).

Two fields must be added to the vertex class: current_cost and predecessor (or come_from).

Initially, all nodes except the start node has infinite cost (INT_MAX is a useful proxy). The start node has initial cost 0. All nodes start with NULL predecessor fields.

If a there exists a path from the start node to some node u and an edge (u,v) with weight w, then there exists a path from the start node to v with a cost cost(u) + w. This cost may be cheaper than the current known cost to v. Define a function that takes two vertices connected by an edge and weight of the edge. relax(u, v, w):


	if v.current_cost > u.current_cost + w(u, v) then
   v.current_cost = u.current_cost + w(u, v)
   v.predecessor = u // or v.predecessor = (u,v)

The algorithm works with a set of unvisited vertices (initially, all vertices are unvisited). We greedily select the lowest-cost unvisited vertex and remove it from the set. All successor nodes are relaxed. Proceed until the goal node is visited.

The predecessor node field may be followed from the goal node back to the start node. This the path in reverse order; to get the path in forward order, reverse the reverse path.

Note that the predecessor field forms a tree from any node back to the start node (root). The predecessor field is a parent pointer (there are no child pointers).

TODO: argue why this works (greedy choice property).

Analysis:

let N = |V| (number of vertices)
let M = |E| (number of edges)
- a graph may be dense (many edges) or sparse (few edges)
- a complete graph has every possible connection
  - (N-1) + (N-2) + ... + 2 + 1 = O(N²) for undirected graphs
  - N * (N-1) = O(N²) for directed graphs
- a dense graph may have O(N²) edges
- a sparse but connected graph would have have at least O(N) edges.
  - a planar graph (i.e. can be drawn without any crossing edges) must be sparse
every edge must be examined: O(M)
picking the least-cost vertex (naive implementation): O(N) operations performed O(N) times
total cost: O(N²) + O(M) which is O(N²) + O(N²) = O(N²) for dense graphs
if the graph is sparse, say, M = O(N log N), we could find a cheaper find-min operation
- like, say, a priority queue, giving O(N log N) cost

TODO: animation