CSS 390: Notes from Lecture 7 (DRAFT)

Administrivia

assignment 2: will close tomorrow
assignment 3: due next week (Feb 3)
assignment 4: due Feb 17
- front-end + back-end
assignment 5: load generator
assignment 6: monitor
scaled-back assignments
- more time to complete them
- missing pieces
  - load balancing
  - deployment
  - supervisor
grading policy: best 5 of 6 assignments
- please complete all assignments
syllabus
- no revision yet, but need to make room for more discusion of implementation
tutorials and examples aside, the minimum requirement for programming in Go is to read the specification and Effective Go
- otherwise, you're handicapping yourself

Refactoring

Informally, we've already seen an example of refactoring in the http server shown in Lecture 2:

observe 3 distinct transformations
1. extracting the inline function literal into a separate function
2. multiple print statements instead of a single overly-long string literal
3. separating the webserver from the error reporting in the main function
note that both versions perform equally well
compare the readability of the two versions

Architectural integrity: how the system is decomposed (into modules, packages, objects). Robust systems have high intra-modular cohesion and low inter-modular coupling. As software evolves, the cumulative effect of special cases, patches, work-arounds, hacks, and kludges degrades the system's architectural integrity.

coupling: interface across modules
cohesion: how well components within modules belong together

Technical Debt represents the increased costs of evolving software systems with suboptimal design.

Why don't we build Snap-on quality software in the first place?

it was overkill at the time the system was originally build, but needs have changed
the original design was good, but requirements have evolved so the original architecture is no longer suitable
we cannot produce a polished final product in a single draft (much like writing prose)

Code Smells: easily-observed indication of possible problems. Numerous catalogued odors including:

function/method that is too long
duplicated code (don't repeat yourself)
overly-large class
long parameter lists
variable names that are too short or too long
cyclomatic complexity (overly complex branching/looping)
too many explaining comments
shotgun changes: adding a feature requires code changes all over the place

Refactoring: a change made to the internal structure of software to make it easer to understand and cheaper to modify without changing its external behavior

no bug fixes
no new functionality

Key idea: refactoring is to support evolution of the codebase.

Martin Fowler, Refactoring: Improving the Design of Existing Code: exposition plus catalog of refactoring patterns

Java-centric, but contains excellent before-and-after examples

Cost of change:

cost of change C: c(C)
cost of refactor R: c(R)
cost of change after refactor: c(C')
if c(R) + c(C') < c(C), then refactoring is a no-brainer
if c(R) + c(C1') > c(C1) but c(R) + c(C1') + c(C2') < c(C1) + c(C2), then the situation is more complex:
- c(R) + c(C1') - c(C1) is an investment
- if the refactor is not done, the extra cost (c(C1) + c(C2)) - (c(R) + c(C1') + c(C2')) is the interest payment on the technical debt
management must prioritize short-term vs. long-term
cost estimates are often judgement calls

Need to develop a dual mindset (angel on one shoulder, devil on the other):

How can we make this better?
Is this good enough?

Code that works is usually better than code that is pretty.

Key ideas:

refactoring is not an end in itself: it supports ongoing system evolution
how much to do: as little or as much as necessary to support forward progress
- it is not an all-or-none proposition
prerequisites:
1. change management system
2. tests (unit & regression)
refactoring is an incremental process: make small changes at a time
- small changes are easier to get right
- small changes are easier to debug when you don't get it right
- small changes are easier to roll back if you really didn't get it right (it does happen!)
- small changes are far less scary (and risky) than scrapping and rewriting an entire system
- small changes can be interleaved with development
  - one may swap hats multiple times per day
cumulative effect of small changes is powerful and dramatic
some patterns may be language-specific
many refactoring patterns are reversible: sometimes it's better one way; sometimes it's better the other way e.g.
- move method/field from a derived class to the base class: it's more general than originally anticipated
- move method/field from the base class to a derived class: it's less general than originally anticipated
did we mention small changes?

Examples

Check out the version history of the top-ten word count at github.com. Note which changes are primarily refactoring and which are bugfixes or enhancements.

Example of a code small from the diskmon manager package:

if utilization > thresholdRed {
	if thisAlert := alerts[disk]; thisAlert != nil {
		if thisAlert.Level() != alert.Red {
			thisAlert.Reset(alert.Red, fmt.Sprintf("%s: over %5.1f% (%.1f%%)",
				disk, thresholdRed, utilization))
		}
	} else {
		alerts[disk] = alert.New(alert.Red, fmt.Sprintf("%s: over %.1f%% (%.1f%%)",
			disk, thresholdRed, utilization))
	}
} else if utilization > thresholdYellow {
	if thisAlert := alerts[disk]; thisAlert != nil {
		if thisAlert.Level() != alert.Yellow {
			thisAlert.Reset(alert.Yellow, fmt.Sprintf("%s: over %.1f%% (%.1f%%)",
				disk, thresholdYellow, utilization))
		}
	} else {
		alerts[disk] = alert.New(alert.Yellow, fmt.Sprintf("%s: over %.1f%% (%.1f%%)",
			disk, thresholdYellow, utilization))
	}
}

Simple patterns of change:

// replace temp with query:


// before:
basePrice := this.quantity * this.price
if basePrice > 1000 {
	return basePrice * 0.95
} else {
	return basePrice * 0.98
}


// after:
if this.basePrice() < 1000 {
	return this.basePrice() * 0.95
} else {
	return this.basePrice() * 0.98
}


// more cleanup:
if this.basePrice() < this.VolumeDiscountThreshold {
	return this.basePrice() * 0.95
} else {
	return this.basePrice() * 0.98
}


// idiomatic Go:
if this.basePrice() < this.VolumeDiscountThreshold {
	return this.basePrice() * 0.95
}
return this.basePrice() * 0.98

package order

import "time"

type Item struct {
	//...
	UnitPrice int
	//...
}

type Order struct {
	Quantity int
	Item     Item
	Date     time.Time
	//...
}

type Sale struct {
	Start    time.Time
	End      time.Time
	Discount int // percent
}


//-----------------------------------------------------------------------

// Version 1

func Price1(order *Order, sale *Sale) int {
	if order.Date.Before(sale.Start) || sale.End.Before(order.Date) {
		return order.Quantity * order.Item.UnitPrice
	}
	return order.Quantity * order.Item.UnitPrice * (100 - sale.Discount) / 100
}


// Version 2

func NotOnSale(order *Order, sale *Sale) bool {
	return order.Date.Before(sale.Start) || sale.End.Before(order.Date)
}

func Price2(order *Order, sale *Sale) int {
	if NotOnSale(order, sale) {
		return order.Quantity * order.Item.UnitPrice
	}
	return order.Quantity * order.Item.UnitPrice * (100 - sale.Discount) / 100
}


// Version 3

func OnSale(order *Order, sale *Sale) bool {
	return sale.Start.Before(order.Date) && order.Date.Before(sale.End)
}

func Price3(order *Order, sale *Sale) int {
	if OnSale(order, sale) {
		return order.Quantity * order.Item.UnitPrice * (100 - sale.Discount) / 100
	}
	return order.Quantity * order.Item.UnitPrice
}


// Version 4

func Price4(order *Order, sale *Sale) int {
	price := order.Quantity * order.Item.UnitPrice
	if OnSale(order, sale) {
		price = price * (100 - sale.Discount) / 100
	}
	return price
}


// Version 5

func Discount(price int, discount int) int {
	return price * (100 - discount) / 100
}

func Price5(order *Order, sale *Sale) int {
	price := order.Quantity * order.Item.UnitPrice
	if OnSale(order, sale) {
		price = Discount(price, sale.Discount)
	}
	return price
}

More on Concurrency

Shared memory is bad for concurrency. Concurrent reads are okay, but modification requires exclusive access (no other reads or writes).

A mutex provides lock and unlock methods such that a thread (or goroutine) can only lock the mutex when no other thread is "holding the lock". If another thread is already holding the lock, the thread requesting the lock waits until the other thread releases the lock. When the thread holding the lock is finished with the mutex, it calls the release method, allowing other threads to take the lock

A programmer wishing to protect a block of shared memory couples the memory with a mutex, acquiring the lock before using the memory and releasing the lock when done with the memory. This requires discipline on the part of the programmer: failure to use the lock at a single location will result in subtle, difficult-to-find bugs.

It is less error-prone to make the mutex a class member and force access to the object exclusively through methods that will control the locking and unlocking of the mutex.

Sources of shared memory:

globals
pointers
lexical closures

Lexical closures occur when a function literal makes references to variables defined in the outer enclosing (lexical) scope:

func VeryBad () int {
	// Concurrent access to n via 2 separate goroutines: not a
	// good idea
	int n;
	go func () {
		for i := 0; i < 10; i++ {
			n = n + i
		}
	}()
	go func () {
		for i := 9; i >= 0; i++ {
			n = n - i
		}
	}()
	// Using sleep to wait for the goroutines isn't a particularly
	// good idea either.  sync.WaitGroup was designed exactly for
	// this.
	time.Sleep(30 * time.Second)
	return n;
}

Channels

A channel is primarily a communication mechanism, but it is also a syncronization mechanism. When sending to an unbuffered channel, the goroutine is blocked until another goroutine receives from the channel. Similarly, if a goroutine tries to receive from the channel before another goroutine sends, the receving channel blocks. This forms a barrier mechanism.

A buffered channel has slightly different semantics: a receving thread blocks when the buffer is empty and a sending thread blocks when the buffer is full.


// create channel of type T:
c := make(chan T)

// create buffered channel of type T
c := make(chan T, bufSize)

// send to channel
c <- t

// receive from channel
t := <-c

// select on several channels (first one wins)
select {
case t1 := <- c1:
	//...
case t2:= <-c2:
	//...
}

// poll several channels (first one wins)
select {
case t1 := <- c1:
	//...
case t2:= <-c2:
	//...
default:
	//...
}

// range on channel (until close(c))
for t := range(c) {
	//...
}

A channel may be used as a mutex.

Communication by Message Passing

An alternate approach to concurrency is to avoid concurrent memory access entirely and communicate by message passing. Go channels allow this approach.

Go language motto: "don't communicate by sharing memory; share memory by communicating".

Concurrent access to a map, say for a counter is easily implemented using sync.RWMutex, but an alternate approach is to have a single thread responsible for the map. All other threads send a message to the counter-manager thread.

package counter

type request struct {
	resp      chan int
	increment bool
	key       string
	delta int
}

type Counter struct {
	req chan request
}

func New() *Counter {
	c := &Counter{
		req: make(chan request),
	}
	go counter(c)
	return c
}

func (c *Counter) Get(key string) int {
	resp := make(chan int)
	c.req <- request{
		resp:      resp,
		increment: false,
		key:       key,
	}
	return <-resp
}

func (c *Counter) Incr(key string, delta int) {
	c.req <- request{
		resp:      nil,
		increment: true,
		key:       key,
		delta: delta,
	}
}

func counter(c *Counter) {
	data := make(map[string]int)
	for req := range c.req {
		if req.increment {
			data[req.key] = data[req.key] + req.delta
		} else {
			req.resp <- data[req.key]
		}
	}
}

Timeouts

A requirement for a robust distributed system is that all remote operations have timeouts. A failure in a back-end processor should not cause the front-end to timeout. The front-end can handle retry logic or report back a failure message.

The http.Client has a Timeout field for controlling this. Unfortunately, exec.Cmd does not, but it does have the ability to kill the subprocess.

Realistically, the two external programs we looked at, uuidgen and df, are not going to hang, but the technique was demonstrated in the notes for lecture 4

Essentially, the technique is to run the external command in a separate goroutine, sending the status through a channel. The main thread also sets a timer (which runs in its own goroutine).

There are two possible outcomes handled by the select statement. Either the command completes normally (in which case we cancel the timer thread), or the timer thread sends on its channel and we have to kill the external command.

On timeout, we also want to receive from the command result channel to allow completion of the command's wrapping goroutine.

package main

import (
	"fmt"
	"net/http"
	"os/exec"
	"time"
)

const (
	timeout = 2 * time.Second
)

func stats(response http.ResponseWriter, request *http.Request) {
	response.Header()["content-type"] = []string{"text/plain"}

	var (
		done    = make(chan error)
		bytes   []byte
		err     error
		command = exec.Command("df", "-h")
	)

	// Execute the command in a separate goroutine so the main
	// thread can timeout gracefully.
	go func() {
		var err error
		bytes, err = command.Output()
		done <- err
	}()

	// Start the clock.
	timer := time.NewTimer(timeout)

	// Wait for the command to complete or time out.
	select {
	case err = <-done:
		// The goroutine completed or failed.  Stop the timer
		// so its goroutine can terminate.
		timer.Stop()
	case <-timer.C:
		// Too late.  Kill the process so the goroutine can
		// complete.  Otherwise, we'd have a potential leak.
		if err := command.Process.Kill(); err == nil {
			// Catch the done message without blocking, so
			// it won't leak.
			go func() { <-done }()
		}
		err = fmt.Errorf("timeout")
	}

	// Did it work?
	if err != nil {
		http.Error(response, err.Error(), http.StatusInternalServerError)
		return
	}
	response.Write(bytes)
}

func main() {
	http.HandleFunc("/stats", stats)
	err := http.ListenAndServe(":8080", nil)
	fmt.Printf("Server fail: %s\n", err)
}