CSS 343: Notes from Lecture 3 (DRAFT)

Administrivia

Our Story So Far

Casting

Sequential Processing

Sequential processing is essentially a first and next operation. Additional operations include last, prev, insert (first, last, before, after), delete (current, next, prev, first, last), find. Special cases include the stack, queue, and deque.

Sequential operations may be implemented using a linked list (singly or doubly linked) or a vector. The choice determines the asymptotic performance of the operations (e.g. insert into a linked list is O(1) but insert into an array is O(N)). In practice one uses the STL continer classes std::list or std::vector.

Sequential processing so important, the 2011 language standard introduced the range-based for statement (other languages already had it).

Iterators use operator overloading to mimic pointer arithmetic. The following idiom works for serveral different standard container types:

for(ContainerType::iterator it = container.begin(); it != container.end(); ++it) {
  do_something_with(*it);
}
      

Random Access

Dictionaries

The dictionary abstraction is extremely important. Lookup by data is not the same as finding the nth entry. Typically, we look up a value based on a key.

the main operations are insert and lookup. Additional operations may include:

This dictionary is so important that, naturally, it has many names:

A dictionary can be implemented using a linked list with O(N) lookup time if N is small or very few lookups are being performed. Naturally, we can do better.

Binary Search (Review)

If we can sort the data by the key O(N log N), lookup can be performed in O(log N) time.

Note that sorting is more expensive than than an O(N) one-time lookup, but sorting is justified if performing a large number of lookups or if the data can be sorted "offline". In some cases, the data may arrive pre-sorted.

Inverted Index

inverted index: separate tables for each key

data by name original (raw) data data by value
name ref
0 bambam 5
1 barney 1
2 betty 3
3 fred 0
4 pebbles 4
5 wilma 3
name value
0 fred 42
1 barney 68
2 wilma 33
3 betty 24
4 pebbles 18
5 bambam 54
value ref
0 18 4
1 24 3
2 33 2
3 42 0
4 54 5
5 68 1

Problems with Binary Search

Essentially, the problem is that binary search is inflexible: insert and delete operations are O(N).

Binary Search Tree (review)

The binary search tree (BST) is essentially a binary tree structure that represents the decision pattern of the binary search: binary tree animation

Additional Binary Tree Properties

Problems with the BST

The binary search tree worst-case performance is O(N) instead of O(log N) because a BST degenerates to a linked list with pathological input. Unfortunately, two pathological cases are building the tree in sorted order and building the tree in reverse sorted order.

unbalanced binary search tree animation

Tree Balancing Technique

Binary Search Tree

2-3 Tree

The 2-3 tree is a special case of the general B-tree. The key insight is that each node holds one or two keys, and all non-leaf nodes have two or three children (depending on the number of keys in the node). All leaves are maintained at the same level.