CSS 343: Notes from Lecture 13 (DRAFT)

Administrivia

Bit Twiddling

Binary Numbers

A decimal number is the sum of powers of 10, e.g.

Similarly, a binary number is the sum of powers of 2. Furthermore, binary number can be directly converted to a hexadecimal number by taking groups of 4 bits (half-byte or nybble). For example, consider the binary number 0110_1011_0001_1100:

place 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
power of 2 215
32768
214
16384
213
8192
212
4096
211
2048
210
1024
29
512
28
256
27
128
26
64
25
32
24
16
23
8
22
4
21
2
20
1
power of 16 3
(4096)
2
(256)
1
(16)
0
(1)
binary 0 1 1 0 1 0 1 1 0 0 0 1 1 1 0 0
hexadecimal 1 * 4 + 1 * 2
6
1 *  8 + 1 * 2 + 1 * 1
B (11)
1 * 1
1
1 * 8 + 1 * 4
C (12)
So, we conclude the binary number 0110_1011_0001_1100 is

Shift Operations

Left-shift zero-fills on the right. Right-shift zero-fills (unsigned operand) or sign-extends (signed operand) on the left.

Left-shift by one is the same as multiplying by 2; right-shift is the same as dividing by 2.

Bitwise Operations

&
(and)
0 1
0 0 0
1 0 1
|
(or)
0 1
0 0 1
1 1 1
^
(xor)
0 1
0 0 1
1 1 0
~
(not)
0 1
1 1

Useful identities:

Examples

let A = 0xd8 (1101_1000) and B = 0x76 (0111_0110):

The constant value we use to select particular bits is known as a mask. If you were working on software that needed to be bit-aware (e.g. device drivers), you might define a function unsigned mask(unsigned from, unsigned to); that would set a range of bits.

Outputting Bits-at-a-time

class BitWriter {
public:
  BitWriter(ostream* out) : out_(out), bits_(0), count_(0) {}
  put(unsigned bit);
  void flush();
private:
  ostream* out_;
  unsigned byte bits_;
  unsigned int count_;
};

BitWriter::put(unsigned bit) {
  assert(count_ < 8);
  assert((bit & 1) == bit)
  bits_ = (bits_ << 1) | (bit & 1);
  ++count_;
  if (count == 8) {
    out->write(&bits_, 1);
    count_ = 0;
  }
}

BitWriter::flush() {
  while(count_ != 0) {
    put(0);
  }
}
  

Hashing (cont.)

Hashing maps N entries in a large keyspace into the restricted range 0-..M-1. The mapping function is chosen empirically (heuristically) to minimize the number of collisions (two keys mapping to the same hash value).

Despite careful choice of hash function, collisions happen. The two basic strategies are to have the hash bucket hold a list of entries, or to use a different cell within the hash table.

Load Factor is the ratio of the number of items in the hash table to the table size: α = N / M.

Theorem: when using open chaining, the average number of probes is &THETA;(1 + α).

Theorem: when using open addressing (probing) with α < 1, the average number of probes for an unsuccessful search 1 / (1 - α).

Theorem: when using open addressing, the expected number of probes in a successful search is at most: (1 / α) * ln(1 / (1 - α))

The takeaway: with a good hash function and a table that is not completely full, you can get better performance then you get for a binary-search (or tree-based equivalent).

Dynamic resizing

Maintain α below some limit (e.g. 75% full). If α exceeds the threshold, the table is doubled and the enties are rehashed. The amortized cost is O(1) as with dynamic arrays.

Other Uses for Hashing

Diving Deeper into C++

C++ is a multiparadim language. Object-oriented style is only one of serveral programming styles supported by the language. C++ contains features to support the development of large-scale systems.

Misuse of language features can lead to unmaintainable programs.

References

Private Data Members

The language does not force data members of a class to be private, but it is generally considered to be good practice. Shared data members increases the coupling between a class and its client. Contrast the following two versions:

class Inflexible {
public:
  string color;
  //...
};
  

class Flexible {
public:
  void set_color(const string& color) {color_ = color;}
  const string& color() {return color_;}
  //...
private:
  string color_;
  //...
};
  

At first glance, the later seems to require extra boilerplate. The win comes when you decide to change the way you represent color:

enum Colors {
  //...
};

class Flexible {
public:
  void set_color(const string& color) {color_ = color_by_name[color];}
  const string& get_color() {return color_name[color_];}
private:
  static map<string, Colors> color_by_name;
  static map<Colors, string> color_name;
  Colors color_;
}
  

Not a single line of client code needs to be altered!

Static Members

Static members (data and methods) are properties of the class, not the individual instances (objects) of the class.

Static methods have no this pointer.

Virtual Methods

In C++, a method is nonvirtual unless it is explicitly declared virtual. Java has no virtual keyword because all methods in Java are implicitly virtual.

The difference between virtual and nonvirtual methods is the behavior when there is inheritance, which will be discussed next lecture.