lab commentary

compression:
source coding for compression - use smalles num of bits to rep the data source for a given reconstruction quality
lossy vs lossless
compression is possible because redundancy exists
compression schemes vary in how redundancy is exploited
theoretical max compression? information theory
tradeoffs
	small and lossy or huge and perfect
	compression factor vs computational complexity and memory required
	

run length encoding
huffman codes - shorter code words for more frequently used symbols
	S -> 0
	C -> 10
	R -> 11
	SCSSCSSRCR -> 010001000111011

	arrange symbols according to frequency
	symbol	freq	code
	s	0.5	0 _________0____
	c	0.3	10____0____1_/
	r	0.2	11____1_/

compression bounds
	theoretical foundation - shannon information theory
	self info - uncertainty of an event I(x) = -logv(2)*P(x)
	weather prediction example
		given prob(tomorrow sunny) = 0.9, the fact that tomorrow is sunny is expected, the usual, happens often, little information
		given prob(tomorrow rainy) = 0.1, the fact that tomorrow is rainy is unexpected, carries more information
	source entropy H(X) = E[I(X=i)] = -sum<i=1..inf>(P(X=i)*logv(2)*P(X=i)) == bits/symbol required to represent information
	higher entropy requires more info to represent
	
lossless preserves original exactly
lossy is for when you don't need exact quality (multimedia data, video, audio, images, etc)

string matching algorithms
	huffman coding
		map fixed length sequence of inputs to VL codes
		based on known or estimated statistics

lz77 - rips out repeated chunks of data and replaces them with a symbol
lzw - abondon fixed window approach of lz77 algorithm, maintain a dictionary fo strings an dcodes, as compression occurs new strings added to dict, deletion of old strings from dict not likely to appear in future, dict deduced by decoder
lossless compression for images yields only 50-66% compression
some images are crazy hard to compress lossless, especially given their entropy calculation
lossy compression (jpeg, png, gif, etc) - >10x compression possible, artifacts introduced

image compression depends on required quality (Q factor)
jpeg - lossy still image compression, 3 phase process: DCT, quantization, encoding, processed in 8x8 blocks, dct transforms signal from spatial domain into an equiv sig in freq domain (lossless), quantization applied (lossy), RLE like encoding (lossless)
takes advantage of the human visual system