lab commentary compression: source coding for compression - use smalles num of bits to rep the data source for a given reconstruction quality lossy vs lossless compression is possible because redundancy exists compression schemes vary in how redundancy is exploited theoretical max compression? information theory tradeoffs small and lossy or huge and perfect compression factor vs computational complexity and memory required run length encoding huffman codes - shorter code words for more frequently used symbols S -> 0 C -> 10 R -> 11 SCSSCSSRCR -> 010001000111011 arrange symbols according to frequency symbol freq code s 0.5 0 _________0____ c 0.3 10____0____1_/ r 0.2 11____1_/ compression bounds theoretical foundation - shannon information theory self info - uncertainty of an event I(x) = -logv(2)*P(x) weather prediction example given prob(tomorrow sunny) = 0.9, the fact that tomorrow is sunny is expected, the usual, happens often, little information given prob(tomorrow rainy) = 0.1, the fact that tomorrow is rainy is unexpected, carries more information source entropy H(X) = E[I(X=i)] = -sum(P(X=i)*logv(2)*P(X=i)) == bits/symbol required to represent information higher entropy requires more info to represent lossless preserves original exactly lossy is for when you don't need exact quality (multimedia data, video, audio, images, etc) string matching algorithms huffman coding map fixed length sequence of inputs to VL codes based on known or estimated statistics lz77 - rips out repeated chunks of data and replaces them with a symbol lzw - abondon fixed window approach of lz77 algorithm, maintain a dictionary fo strings an dcodes, as compression occurs new strings added to dict, deletion of old strings from dict not likely to appear in future, dict deduced by decoder lossless compression for images yields only 50-66% compression some images are crazy hard to compress lossless, especially given their entropy calculation lossy compression (jpeg, png, gif, etc) - >10x compression possible, artifacts introduced image compression depends on required quality (Q factor) jpeg - lossy still image compression, 3 phase process: DCT, quantization, encoding, processed in 8x8 blocks, dct transforms signal from spatial domain into an equiv sig in freq domain (lossless), quantization applied (lossy), RLE like encoding (lossless) takes advantage of the human visual system