CSS 343: Notes from Lecture 1 (DRAFT)

Administrivia

course web site
syllabus
office hours: Sunday 1pm-3pm in the Linux lab
Programming Principles and Practice Using C++, Bjarne Stroustrup, Addison-Wesley, 2009
- highly recommended
- covers important material that is not covered in class
- not required reading for this course; read it on your own at your own pace
- this complements your coursework; does not replace it

Introduction

assumption: you are taking this course because you wish to make your living writing software
chances are, you will never have to code your own red-black tree
- you make a library call (which is probably more-tuned and better-tested than anything you might hand-roll)
this raises the question: why study data structures & algorithms?

Why Study Data Structures & Algorithms?

the difference between the technician and the engineer is that the engineer understands the library calls being made and can make competent decisions about various tradeoffs involved
data structures and algorithms is about perfecting the craft of programming
- think of it as "still life" for programmers
- especially pointers and recursion
demonstrating deep understanding of data structures is a critical job interview skill
- http://www.joelonsoftware.com/articles/ThePerilsofJavaSchools.html
- https://sites.google.com/site/steveyegge2/five-essential-phone-screen-questions

Arguably, Data Structures & Algorithms is the single most important course in the software engineering programme.

Goals

deeper understanding of how Data Structures & Algorithms work
deeper understanding of how complexity is managed via layered abstractions
deeper understanding of Object-Oriented concepts
- what problems the techniques are trying to solve
- how the techniques work under the hood
- how to use the techniques
introduction to the C++ Standard Template Library (STL)
- not too deep, since we are trying to learn algorithms the library implments
increase your programming dexterity
- espelcially pointers & recursion

Non-Goals

There are numerous important topics that are beyond the scope of this course.

project management
"professional-grade" C++ programming
- STL
- advanced idiomatic expressions
- input validation and santization
  - http://xkcd.com/327/
- internal consistency checking
internationalization (I18N)
concurrency (threading & networking)
Linux & shell programming
software development process
math & formal proofs

In passing, we may discuss tactical software-engineering concepts such as:

self-checking (assertions)
code reviews
source code management systems
build systems
unit & integration testing
deployment & rollback
logging & logs analysis
monitoring

What is Software?

Your humble instructor distinguishes between (computer) programs, i.e. the instructions that drive hardware and software which includes the program(s) plus other artifacts. Software:

is intended to be used by persons other than the original programmer(s)
represents a significant capital investment and can be expected to be used for a long time
will evolve over its useful lifespan
will be maintained by persons other then the original programmers as personnel move around during their careers

This has implications on how we design and construct software.

This course is about the programming, but it is important to keep in mind that software is about more than just the programs.

CSS 342 Review

C++
- classes
sorting, binary search
linked lists, stacks, queues
pointers, recursion
complexity theory: Big-Oh notation

Complexity Theory (Performance)

run-time performance as a function of input size: O(f(N))
- concerned about "asymptotic performance", i.e. the general shape of the curve
- What is N?
  - depends on the specific problem
  - some algorithm performance may be a function of two parameters, e.g. Dijkstra's algorithm can be implemented in O(M + N log N) time using a Fibonacci heap
    - M is the number of edges
    - N is the number of nodes
    - this may also be written as O(|E| + |V| log |V|)
- we say, for example that sorting is O(N log N) where N is the number of items
  - it is sloppy not to include the definition of what N is supposed to be
    - when we are informal, we allow ourselves to be sloppy
key hierarchy: O(1) O(log N) O(N) O(N log N) O(N²) O(N³) O(N^k) O(2^N) O(k^N)
we may be concerned with best/worst/average cases
advanced technique: amortized cost analysis
- an operation that performs O(N) steps once every N times has an amortized cost of O(1)
basic identities: O(f(N)) + O(g(N)) = O(fN) + g(N)) = O(max(f(N), g(N))) O(f(N)) * O(g(N)) = O(f(N) * g(N)) harmonic series: H(N) = S(n, -1) = sum for i = 1..N (1/i) = O(lg(N)) sum for i = 1..N (i^k) = O(N^{k + 1})
empirical data reported in The Algorithm Design Manual:
- all algorithms take approximately same time for N = 10
- any algorithm with N! running time is useless for N ≥ 20
- 2^N is intractable for n ≥ 40
- O(N²) is usable for N = 10,000 but horrendous for N ≥ 1,000,000
- O(N) and O(N log N) are practical for inputs of size 1,000,000,000
- O(log N) was okay for any input size

Example:

finding the phone number for a given name in the telephone book is O(log N)
Finding the name for a given phone number is O(N)
- it is intractable to search an entire phone book (even for a smallish town)
- it is tractable to search a single column (much smaller N)
- distributed processing: an entire phone book divided among enough people may be searched in a reasonable amount of time

Note that a particular program (algorithm) has a running time calculated using big-oh notation, but an algorithmic problem may have a lower bound. For example, we say that sorting is an O(N log N), but bubble sort (a particular implementation of a sorting algorithm) runs in O(N²) time.

Post-lecture addendum: demonstration of Fibonacci function (an O(N) problem) coded as an O(2^N) function using naive recursion and an O(N) function using a tail recursion technique

tail recursion is a when the last thing a function does is calls itself and allows an optimization technique that turns the recursion into a loop

Complexity (Cognitive Load)

the word complexity has multiple meanings:
- complextity as performance (resource usage)
- complexity in the informal sense: complicated
try not to get confused
a key property of modern software is that it is inherently complex (complicated)
- so complicated that our tiny, little minds cannot comprehend it
  - large systems are beyond human understanding
our solution to complexity management distills down to one word: abstraction
- if you learn nothing else in this class, learn this

Abstraction

Abstraction is such an important concept that we use multiple terms that mean essentially the same thing:

modeling
divide & conquer
module
- interface & implementation
public/private (class)
information hiding

It's important to note that information hiding is not a security feature:

information is hidden is a convenience, so you don't have to worry about the details when you don't have to
the public part of a class is the contract with the client
- if the client relies on knowledge of the private parts, the client deserves what it gets
post-lecture addendum: demonstration of C++ language feature allowing a class user to change a private const data member voyeur.cc

There are two parts to an abstraction:

visible part
hidden part

Example: if you built a robot, you would have to code individual "muscle" contractions. From those individual movements, you might code more complex action, such as "flip the light switch". Ultimately, movement boils down to those individual muscle contractions, but we don't think in those terms.

Abstraction layers:

at some abstraction layer, the layer above treats it as magic, and the layer below is magic
the same individual may need jump up and down the abstraction levels: it's a matter of switching context (or putting on a different hat)
sometimes it is necessary (or expedient) to break an abstraction
- usually this is a bad idea
if abstractions are well-designed, an implementation may be substantially modified (e.g. to change performance characteristics)
- for unit-testing purposes, one replaces lower-level code with mock implementations with simple, predictable functions

The key to programming (systems design) is finding good abstractions.

good abstractions facilitate software maintenance by isolating the effects of change

What makes a good abstraction:

beyond the scope of this class
2 key concepts
- cohesion (stickiness): do the things inside the module belong together
- coupling (surface area): how much two modules depend on each other

Modeling

Modeling: keep essence; discard nonessential details.

all models are imperfect
some models are useful

Example: the Earth is flat. If you are building a house, you do not need to take the curvature of the planet into account.

What is essential depends highly on the application.

Example: modeling of an an airplane:

if you are building an air traffic control system, an airplane is all about it's position, altitude, and airspeed
if you are building a maintenance system, an airplane is a collection of parts and/or inspections
if you are building an airline reservation system, an airplane is a collection of seats