


Paul Cockshott 

Faraday partnership 




We will look at some or the currently well established methods
of compressing images. 

These are used for digital TV and for digital
cameras 






Lossless
compression 

Quadtree compression 



Lossy compression 

Reduce to 8 bits 

Frequency domain compression systems 

Orthogonal bases 

Representation of blocks using orthogonal bases 

Quantization 

JPEG 

MPEG 1 




Lossless compression system 

Image is successively divided into 4 until all
pixels in a square are the same 






Further compression is possible by making use of
the way that our perception is biased towards not noticing slight changes
in brightness or very fine details. 

This is what various compression systems working
in the frequency domain such as DCT
and wavelet compressors take advantage of. 




The key to these systems is the concept of an
orthogonal basis or coordinate system in which areas of a picture can be
represented. 

In what follows let us initially consider just
monochrome images. 




A vector is a sequence of numbers. [4,3] for
example is vector with 2 elements. 

We are all familiar with how these can be used
to represent lines from the origin in 2d geometry. 




An orthogonal pair of vectors is a pair of
vectors at right angles. 

For example the vectors along the X and Y axes
[1,0], [0,1] are at right angles and thus orthogonal. 




Similarly the 45degree vectors [1,1], [1,1] are at right angles and
thus orthogonal. 





If in addition to being at right angles the
vectors are of length 1, then they are orthonormal. 

For example the pairs of vectors ([0,1],[1,0])
and 




Any pair of orthonormal vectors can substitute
for the x and y axes in defining the position of a point. The new axes
serve as a ‘basis’ for defining points. 





We can measure how far along an axis provided by
one of the basis vectors a given v is by using the inner product operator. 

Let v be defined in the x,y basis and let p,q be
and alternative orthonormal basis. The point v in terms of p,q is given by
the vector [p.v,q.v] 




Given v
of dimension n and orthonormal basis vectors b_{1},b_{2},..,b_{n
}we can express v in this new basis as w=v [b_{1},b_{2},..,b_{n
}] = [v.b_{1},v.b_{2},..,v.b_{n }] without
loss of information 






Pixels 

Haar wavelets 

DCT coefficients 




Consider the block of 4 pixels a,b,c,d 

These can be considered a 4d vector 

[a,b,c,d] expressed in the basis Space 




An alternative basis to express the picture
block would be Haar wavelets 

[1,1,1,1], [1,1,1,1], [1,1,0,0], [0,0,1,1] 





The Discrete Cosine Transform, uses a set of
discrete cosine waves of different frequency combinations to act as basis
vectors 




Divide each channel (Y, U, V) into 8x8 blocks
which are then 64 dimension vectors 

Multiply each block by each of 64 DCT basis
vectors. 

Each 8x8 block of pixel values becomes an 8x8
block of coefficients. 




The block of coefficients is read out in zig zag
order so that low frequency coefficients come before high frequency ones 




Clearly if we encode a coefficient to less
precision we need fewer bits to do so. 

A suitable technique for this is to divide the
coefficients by an integer and ignore the remainder on encoding and do the
reverse on decoding. 





Suppose we have the coefficient vector [ 11, 93,
102, 77] 

Encode 

We divide by a constant, (14 in this instance)
to obtain [0.78, 6.64,7.24,5.5]. 

Ignore remainder and we have [0, 6,7,5]. 

Decode 

Multiply by 14 again and we get [0,84,98,70]
which is close to the original vector 




Different quantization divisors are used in each
position in the block so that the low frequency information is less
quantized. 




This is done using quantization matrix 




The zig zag ordering plus quantization causes
means that most of the last set of coefficients are zero. The transmission
of an EOB, (end of block) code, indicates that there are no more
coefficients in the block. 







MPEG is a set of standards for the encoding of
moving pictures. It follows on from JPEG with which it shares the basic
features of 

DCT 

Zig Zag encoding 

Quantization 

Run length encoding 

It adds the new features of 

Key frames 

Motion compensation 




Since there is substantial sequential redundancy
in video ( one frame similar to the next) only Key Frames or Interframes
are fully encoded, using JPEG techniques 

Typically one frame in 15 is an Interframe 






Classical compression works by tranfering the
image into a vector space that better captures the redundancy of the image 

DCT or wavelet coefficients 

Motion compensation for video 




http://www.jpeg.org/public/jpeghomepage.htm 

ftp://ftp.uu.net/graphics/jpeg 










What did you not understand here? 

What would you like more information on? 
