CSS 343: Notes from Lecture 5 (DRAFT)

Administrivia

Our Story So Far

Red-Black Trees (Again)

Here is a sequence of images generated for building up a binary search tree (unbalanced) from word sequence based on an arbitrarily-chosen text found on Project Gutenberg. As you can see, the tree looks roughly balanced. Click the image to get a full-size fiew, or download a zip file of the individual frames. Unbalanced Binary Search Tree (random input)

Here is a BST built from the same sequence that maintains the red-black properties. As you can see intuitively, the tree appears to have slightly better balance. Note the various cases that crop up during the rebalancing phases (balanced.zip). Balanced Binary Search Tree (random input)

One extreme pathological case for the BST is input in sorted (or reverse-sorted) order. Here is a BST build from the original data, sorted with groups of 5 words randomly permuted. Still pretty bad (permute5-unbalanced.zip). Unbalanced Binary Search Tree (sorted, permute5)

And the same input to a red-black tree (permute5-balanced.zip). Balanced Binary Search Tree (sorted, permute5)

Any questions?

2-3 Trees

Counters

The assignment introduces a new concept: counters

BTrees

The B tree is a generalization of the 2-3 tree (alternatively, the 2-3 tree is a special case of the B Tree).

Motivation:

We wish to minimize the number of blocks read for insert/lookup/delete, but fitting as many keys as possible into a single block.

Example: if we can fit 100 keys into a block, we can choose among 1,000,000 records in 2 probes (binary search requires up to 20 probes).

2-3-4 Trees

The 2-3-4 tree is another special case of the B tree, in which each node has 1 to 3 keys (hence 2-4 children). It is isomorphic to the red-black tree. Conceptually, the red children of a black node are folded into the black node.

Hashing (Preview)

Example: we can determine population count (number of 1-bits) of a byte using a table of 256 entries.

Generally, a hash function maps a large key space (0..M-1) into a smaller key space (0..N) which can be indexed in a table of size N. A good hash function randomizes the keys so "collisions" are unlikely (hence the name).

A collision occurs when two keys map to the same hash value. Hash tables must deal with this problem.