CSS 343: Notes from Lecture 13 (DRAFT)

Administrivia

instructor's solution solution to assignment 2: 455 NCSL (non-comment source lines), including scaffolding
assignment to: do not use std::vector<bool>
- or, use it and take the hit on the grade
- the point of the exercise is to understand how this manipulation works
"Hello World" variations

Bit Twiddling

Binary Numbers

A decimal number is the sum of powers of 10, e.g.

8742
8000 + 700 + 40 + 2
8 * 1000 + 7 * 100 + 4 * 10 + 3 * 1
8 * 10³ + 7 * 10² + 4 * 10¹ + 3 * 10⁰1
3
(10³) 2
(10²) 1
(10¹) 0
(10⁰)

8 7 4 2

3 (10³)	2 (10²)	1 (10¹)	0 (10⁰)
8	7	4	2

Similarly, a binary number is the sum of powers of 2. Furthermore, binary number can be directly converted to a hexadecimal number by taking groups of 4 bits (half-byte or nybble). For example, consider the binary number 0110_1011_0001_1100:

place	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
power of 2	2¹⁵ 32768	2¹⁴ 16384	2¹³ 8192	2¹² 4096	2¹¹ 2048	2¹⁰ 1024	2⁹ 512	2⁸ 256	2⁷ 128	2⁶ 64	2⁵ 32	2⁴ 16	2³ 8	2² 4	2¹ 2	2⁰ 1
power of 16	3 (4096)				2 (256)				1 (16)				0 (1)
binary	0	1	1	0	1	0	1	1	0	0	0	1	1	1	0	0
hexadecimal	1 * 4 + 1 * 2 6				1 * 8 + 1 * 2 + 1 * 1 B (11)				1 * 1 1				1 * 8 + 1 * 4 C (12)

So, we conclude the binary number 0110_1011_0001_1100 is

0x6b1c (hex)
6 * 4096 + 11 * 256 + 1* 16 + 12 * 1
27420 (decimal)

Shift Operations

shift left: << (e.g. (0101_1101 << 2) = 0111_0100)
shift right: << (e.g. (0101_1101 >> 2) = 0001_0111)

Left-shift zero-fills on the right. Right-shift zero-fills (unsigned operand) or sign-extends (signed operand) on the left.

Left-shift by one is the same as multiplying by 2; right-shift is the same as dividing by 2.

Bitwise Operations

C++ Boolean logic operators: &&, ||, !
C++ Boolean bitwise operators: &, |, ^, ~

& (and)	0	1
0	0	0
1	0	1

\| (or)	0	1
0	0	1
1	1	1

^ (xor)	0	1
0	0	1
1	1	0

~ (not)	0	1
	1	1

Useful identities:

X & 0 = 0 (clears bit)
X & 1 = X (leaves bit unchanged)
X | 0 = X (leaves bit unchanged)
X | 1 = 1 (sets bit)
X ^ 0 = X (leaves bit unchanged)
X ^ 1 = ~X (flips/toggles bit)

Examples

let A = 0xd8 (1101_1000) and B = 0x76 (0111_0110):

set bit 2:
A | 0x04 = A | (1 << 2) = 1101_1000 | 0000_0100 = 1101_1100 = 0xdb
clear all bits except bit 4:
B & 0x10 = B & (1 << 4) = 0111_0110 & 0001_0000 = 0110_0110 = 0x66
clear bit 6:
A & 0xBF = A & ~(1 << 6) = A & ~(0100_0000) = 1101_1000 & 1011_11111 = 1001_1000 = 0x98
toggle (flip) bits 0-3:
B ^ 0x0f = B ^ ((1 << 3) | (1 << 2) | (1 << 1) | 1) = 0111_0110 ^ 0000_1111 = 0111_1001 = 0x79

The constant value we use to select particular bits is known as a mask. If you were working on software that needed to be bit-aware (e.g. device drivers), you might define a function unsigned mask(unsigned from, unsigned to); that would set a range of bits.

Outputting Bits-at-a-time

class BitWriter {
public:
  BitWriter(ostream* out) : out_(out), bits_(0), count_(0) {}
  put(unsigned bit);
  void flush();
private:
  ostream* out_;
  unsigned byte bits_;
  unsigned int count_;
};

BitWriter::put(unsigned bit) {
  assert(count_ < 8);
  assert((bit & 1) == bit)
  bits_ = (bits_ << 1) | (bit & 1);
  ++count_;
  if (count == 8) {
    out->write(&bits_, 1);
    count_ = 0;
  }
}

BitWriter::flush() {
  while(count_ != 0) {
    put(0);
  }
}

Hashing (cont.)

Hashing maps N entries in a large keyspace into the restricted range 0-..M-1. The mapping function is chosen empirically (heuristically) to minimize the number of collisions (two keys mapping to the same hash value).

Despite careful choice of hash function, collisions happen. The two basic strategies are to have the hash bucket hold a list of entries, or to use a different cell within the hash table.

Load Factor is the ratio of the number of items in the hash table to the table size: α = N / M.

Theorem: when using open chaining, the average number of probes is &THETA;(1 + α).

Theorem: when using open addressing (probing) with α < 1, the average number of probes for an unsuccessful search 1 / (1 - α).

Theorem: when using open addressing, the expected number of probes in a successful search is at most: (1 / α) * ln(1 / (1 - α))

The takeaway: with a good hash function and a table that is not completely full, you can get better performance then you get for a binary-search (or tree-based equivalent).

Dynamic resizing

Maintain α below some limit (e.g. 75% full). If α exceeds the threshold, the table is doubled and the enties are rehashed. The amortized cost is O(1) as with dynamic arrays.

Other Uses for Hashing

checksums
cryptographic (one-way) hashes

Diving Deeper into C++

C++ is a multiparadim language. Object-oriented style is only one of serveral programming styles supported by the language. C++ contains features to support the development of large-scale systems.

some features (which are essential for large programs) add unnecessary complexity to small programs
- e.g. namespaces

Misuse of language features can lead to unmaintainable programs.

spaghetti code
rat's nest

References

references are aliases
references are essentially pointers that are auto-dereferenced and cannot be reassigned
unlike pointers, references can never be NULL
otherwise references have pitfalls
- never return a reference to a local variable

Example:

int x = 42;
cout << x << endl;
int* px = &x;
*px = 17;
cout << x << endl;
int& rx = x;
rx = 64;
cout << x << endl;
      
42
17
64

references may be used to pass function parameters that can be modified
- YMMV whether this is good style
the reason why references were introduced into the language was to allow classes to have proper semantics when they override operator=

Private Data Members

The language does not force data members of a class to be private, but it is generally considered to be good practice. Shared data members increases the coupling between a class and its client. Contrast the following two versions:

class Inflexible {
public:
  string color;
  //...
};
  
class Flexible {
public:
  void set_color(const string& color) {color_ = color;}
  const string& color() {return color_;}
  //...
private:
  string color_;
  //...
};

At first glance, the later seems to require extra boilerplate. The win comes when you decide to change the way you represent color:

enum Colors {
  //...
};

class Flexible {
public:
  void set_color(const string& color) {color_ = color_by_name[color];}
  const string& get_color() {return color_name[color_];}
private:
  static map<string, Colors> color_by_name;
  static map<Colors, string> color_name;
  Colors color_;
}

Not a single line of client code needs to be altered!

Static Members

Static members (data and methods) are properties of the class, not the individual instances (objects) of the class.

Static methods have no this pointer.

Virtual Methods

In C++, a method is nonvirtual unless it is explicitly declared virtual. Java has no virtual keyword because all methods in Java are implicitly virtual.

The difference between virtual and nonvirtual methods is the behavior when there is inheritance, which will be discussed next lecture.