CSS 343: Notes from Lecture 1 (DRAFT)
Administrivia
-
course web site
-
syllabus
-
office hours: Sunday 1pm-3pm in the Linux lab
-
Programming Principles and Practice Using C++,
Bjarne Stroustrup, Addison-Wesley, 2009
-
highly recommended
-
covers important material that is not covered in class
-
not required reading for this course; read it on your own at
your own pace
-
this complements your coursework; does not replace it
Introduction
-
assumption: you are taking this course because you wish to make
your living writing software
-
chances are, you will never have to code your own red-black tree
-
you make a library call (which is probably more-tuned and
better-tested than anything you might hand-roll)
-
this raises the question: why study data structures & algorithms?
Why Study Data Structures & Algorithms?
-
the difference between the technician and the engineer is that
the engineer
understands
the library calls being made and can make competent decisions
about various tradeoffs involved
-
data structures and algorithms is about perfecting the
craft
of programming
-
think of it as "still life" for programmers
-
especially pointers and recursion
-
demonstrating deep understanding of data structures is a
critical job interview skill
Arguably,
Data Structures & Algorithms
is the single most important course in the software engineering programme.
Goals
-
deeper understanding of how Data Structures & Algorithms work
-
deeper understanding of how complexity is managed via layered abstractions
-
deeper understanding of Object-Oriented concepts
-
what problems the techniques are trying to solve
-
how the techniques work under the hood
-
how to use the techniques
-
introduction to the C++ Standard Template Library (STL)
-
not too deep, since we are trying to learn algorithms the
library implments
-
increase your programming dexterity
-
espelcially pointers & recursion
Non-Goals
There are numerous
important
topics that are beyond the scope of this course.
-
project management
-
"professional-grade" C++ programming
-
STL
-
advanced idiomatic expressions
-
input validation and santization
-
internal consistency checking
-
internationalization (I18N)
-
concurrency (threading & networking)
-
Linux & shell programming
-
software development process
-
math & formal proofs
In passing, we may discuss tactical software-engineering concepts
such as:
-
self-checking (assertions)
-
code reviews
-
source code management systems
-
build systems
-
unit & integration testing
-
deployment & rollback
-
logging & logs analysis
-
monitoring
What is Software?
Your humble instructor distinguishes between (computer) programs,
i.e. the instructions that drive hardware and
software
which includes the program(s) plus other artifacts. Software:
-
is intended to be used by persons other than the original
programmer(s)
-
represents a significant capital investment and can be expected
to be used for a long time
-
will evolve over its useful lifespan
-
will be maintained by persons other then the original
programmers as personnel move around during their careers
This has implications on how we design and construct software.
This course is about the programming, but it is important to keep
in mind that software is about more than just the programs.
CSS 342 Review
-
C++
-
sorting, binary search
-
linked lists, stacks, queues
-
pointers, recursion
-
complexity theory: Big-Oh notation
Complexity Theory (Performance)
-
run-time performance as a function of input size:
O(f(N))
-
concerned about "asymptotic performance", i.e. the general
shape of the curve
-
What is
N?
-
depends on the specific problem
-
some algorithm performance may be a function of two
parameters, e.g.
Dijkstra's algorithm
can be implemented in
O(M + N log N)
time using a
Fibonacci heap
-
M is the number of edges
-
N is the number of nodes
-
this may also be written as
O(|E| + |V| log |V|)
-
we say, for example that sorting is
O(N log N)
where N is the number of items
-
it is sloppy not to include the definition of what
N
is supposed to be
-
when we are informal, we allow ourselves to be sloppy
key hierarchy:
-
O(1)
-
O(log N)
-
O(N)
-
O(N log N)
-
O(N2)
-
O(N3)
-
O(Nk)
-
O(2N)
-
O(kN)
we may be concerned with best/worst/average cases
advanced technique: amortized cost analysis
-
an operation that performs
O(N)
steps once every N times has
an
amortized
cost of
O(1)
basic identities:
-
O(f(N)) + O(g(N)) = O(fN) + g(N)) = O(max(f(N), g(N)))
-
O(f(N)) * O(g(N)) = O(f(N) * g(N))
-
harmonic series: H(N) = S(n, -1) = sum for i = 1..N (1/i) = O(lg(N))
-
sum for i = 1..N (ik) = O(Nk + 1)
empirical data reported in
The Algorithm Design Manual:
-
all algorithms take approximately same time for N = 10
-
any algorithm with N! running time is useless for N ≥ 20
-
2N is intractable for n ≥ 40
-
O(N2) is usable for N = 10,000 but horrendous for
N ≥ 1,000,000
-
O(N) and O(N log N) are practical for inputs of size 1,000,000,000
-
O(log N) was okay for any input size
Example:
-
finding the phone number for a given name in the telephone book
is
O(log N)
-
Finding the name for a given phone number is
O(N)
-
it is intractable to search an entire phone book (even for a
smallish town)
-
it is tractable to search a single column (much smaller N)
-
distributed processing: an entire phone book divided among
enough people may be searched in a reasonable amount of time
Note that a particular program (algorithm) has a running time
calculated using big-oh notation, but an algorithmic problem may
have a lower bound. For example, we say that sorting is an
O(N log N),
but bubble sort (a particular
implementation
of a sorting algorithm) runs in
O(N2)
time.
Post-lecture addendum: demonstration of Fibonacci function (an
O(N)
problem) coded as an
O(2N)
function using naive recursion and an
O(N)
function using a tail recursion technique
-
tail recursion is a when the
last
thing a function does is calls itself and allows an optimization
technique that turns the recursion into a loop
Complexity (Cognitive Load)
-
the word
complexity
has multiple meanings:
-
complextity as performance (resource usage)
-
complexity in the informal sense: complicated
try not to get confused
-
a key property of modern software is that it is inherently
complex (complicated)
-
so complicated that our
tiny, little minds
cannot comprehend it
-
large systems are beyond human understanding
-
our solution to complexity management distills down to one word:
abstraction
-
if you learn nothing else in this class, learn
this
Abstraction
Abstraction
is such an important concept that we use multiple terms that mean
essentially the same thing:
-
modeling
-
divide & conquer
-
module
-
interface & implementation
-
public/private (class)
-
information hiding
It's important to note that
information hiding
is not a security feature:
-
information is hidden is a convenience, so you don't have to
worry about the details when you don't have to
-
the public part of a class is the
contract
with the client
-
if the client relies on knowledge of the private parts, the
client deserves what it gets
-
post-lecture addendum:
demonstration of C++ language feature
allowing a class user to change a
private const
data member
voyeur.cc
There are two parts to an abstraction:
Example: if you built a robot, you would have to code individual
"muscle" contractions. From those individual movements, you might
code more complex action, such as "flip the light switch".
Ultimately, movement boils down to those individual muscle
contractions, but we don't think in those terms.
Abstraction layers:
-
at some abstraction layer, the layer above treats it as magic,
and the layer below
is
magic
-
the same individual may need jump up and down the
abstraction levels: it's a matter of switching context (or
putting on a different hat)
-
sometimes it is necessary (or expedient) to
break
an abstraction
-
if abstractions are well-designed, an implementation may be
substantially modified (e.g. to change performance
characteristics)
-
for unit-testing purposes, one replaces lower-level code
with
mock
implementations with simple, predictable functions
The key to programming (systems design) is finding good
abstractions.
-
good abstractions facilitate software maintenance by isolating
the effects of change
What makes a good abstraction:
-
beyond the scope of this class
-
2 key concepts
-
cohesion (stickiness): do the things inside the module
belong together
-
coupling (surface area): how much two modules depend on each
other
Modeling
Modeling: keep essence; discard nonessential details.
-
all models are imperfect
-
some
models are
useful
Example:
the Earth is flat.
If you are building a house, you do not need to take the curvature
of the planet into account.
What is essential depends highly on the application.
Example: modeling of an an airplane:
-
if you are building an air traffic control system, an airplane
is all about it's position, altitude, and airspeed
-
if you are building a maintenance system, an airplane is a
collection of parts and/or inspections
-
if you are building an airline reservation system, an airplane
is a collection of seats