Classical image compression

Paul Cockshott

Faraday partnership

Introduction

We will look at some or the currently well established methods of compressing images.

These are used for digital TV and for digital cameras

Summary of this section

Lossless compression

Quadtree compression

Lossy compression

Reduce to 8 bits

Frequency domain compression systems

Orthogonal bases

Representation of blocks using orthogonal bases

Quantization

JPEG

MPEG 1

Quadtree compression

Lossless compression system

Image is successively divided into 4 until all pixels in a square are the same

Quad Tree Flattening

e

Reduction to 8bit colour

Frequency domain compression systems

Further compression is possible by making use of the way that our perception is biased towards not noticing slight changes in brightness or very fine details.

This is what various compression systems working in the frequency domain such as DCT and wavelet compressors take advantage of.

Orthogonal Bases

The key to these systems is the concept of an orthogonal basis or coordinate system in which areas of a picture can be represented.

In what follows let us initially consider just monochrome images.

In 2 D Geometry

A vector is a sequence of numbers. [4,3] for example is vector with 2 elements.

We are all familiar with how these can be used to represent lines from the origin in 2d geometry.

Vectors at right angles

An orthogonal pair of vectors is a pair of vectors at right angles.

For example the vectors along the X and Y axes [1,0], [0,1] are at right angles and thus orthogonal.

Vectors at right angles

Similarly the 45degree vectors [1,1], [1,-1] are at right angles and thus orthogonal.

Vector Lengths

Orthonormal vectors

If in addition to being at right angles the vectors are of length 1, then they are orthonormal.

For example the pairs of vectors ([0,1],[1,0]) and

Change of basis

Any pair of orthonormal vectors can substitute for the x and y axes in defining the position of a point. The new axes serve as a ‘basis’ for defining points.

Extension to n dimensions

Inner product operations

We can measure how far along an axis provided by one of the basis vectors a given v is by using the inner product operator.

Let v be defined in the x,y basis and let p,q be and alternative orthonormal basis. The point v in terms of p,q is given by the vector [p.v,q.v]

Transforming to and From a Basis

Given v of dimension n and orthonormal basis vectors b₁,b₂,..,b_nwe can express v in this new basis as w=v [b₁,b₂,..,b_n] = [v.b₁,v.b₂,..,v.b_n] without loss of information

Representation of Blocks Using Orthogonal Bases

Pixels

Haar wavelets

DCT coefficients

Pixels

Consider the block of 4 pixels a,b,c,d

These can be considered a 4d vector

[a,b,c,d] expressed in the basis Space

Haar wavelets

An alternative basis to express the picture block would be Haar wavelets

[1,1,1,1], [1,1,-1,-1], [1,-1,0,0], [0,0,1,-1]

Harr wavelets

DCT coefficients

The Discrete Cosine Transform, uses a set of discrete cosine waves of different frequency combinations to act as basis vectors

Use of blocks in JPEG

Divide each channel (Y, U, V) into 8x8 blocks which are then 64 dimension vectors

Multiply each block by each of 64 DCT basis vectors.

Each 8x8 block of pixel values becomes an 8x8 block of coefficients.

Zig Zag

The block of coefficients is read out in zig zag order so that low frequency coefficients come before high frequency ones

Quantization

Clearly if we encode a coefficient to less precision we need fewer bits to do so.

A suitable technique for this is to divide the coefficients by an integer and ignore the remainder on encoding and do the reverse on decoding.

Quantization continued

Suppose we have the coefficient vector [ 11, 93, 102, 77]

Encode

We divide by a constant, (14 in this instance) to obtain [0.78, 6.64,7.24,5.5].

Ignore remainder and we have [0, 6,7,5].

Decode

Multiply by 14 again and we get [0,84,98,70] which is close to the original vector

JPEG quantization

Different quantization divisors are used in each position in the block so that the low frequency information is less quantized.

JPEG quantization

This is done using quantization matrix

Run length encoding

The zig zag ordering plus quantization causes means that most of the last set of coefficients are zero. The transmission of an EOB, (end of block) code, indicates that there are no more coefficients in the block.

MPEG

MPEG is a set of standards for the encoding of moving pictures. It follows on from JPEG with which it shares the basic features of

DCT

Zig Zag encoding

Quantization

Run length encoding

It adds the new features of

Key frames

Motion compensation

Key frames

Since there is substantial sequential redundancy in video ( one frame similar to the next) only Key Frames or Interframes are fully encoded, using JPEG techniques

Typically one frame in 15 is an Interframe

Motion compensation

Search for best match

Summary

Classical compression works by tranfering the image into a vector space that better captures the redundancy of the image

DCT or wavelet coefficients

Motion compensation for video

Where to get more information

http://www.jpeg.org/public/jpeghomepage.htm

ftp://ftp.uu.net/graphics/jpeg

Feedback

What did you not understand here?

What would you like more information on?


	We will look at some or the currently well established methods of compressing images.
	These are used for digital TV and for digital cameras


Lossless compression
	Quadtree compression

Lossy compression
	Reduce to 8 bits
	Frequency domain compression systems
		Orthogonal bases
		Representation of blocks using orthogonal bases
		Quantization
		JPEG
		MPEG 1


	Lossless compression system
	Image is successively divided into 4 until all pixels in a square are the same


	Further compression is possible by making use of the way that our perception is biased towards not noticing slight changes in brightness or very fine details.
	This is what various compression systems working in the frequency domain such as DCT and wavelet compressors take advantage of.


	The key to these systems is the concept of an orthogonal basis or coordinate system in which areas of a picture can be represented.
	In what follows let us initially consider just monochrome images.


	A vector is a sequence of numbers. [4,3] for example is vector with 2 elements.
	We are all familiar with how these can be used to represent lines from the origin in 2d geometry.


	An orthogonal pair of vectors is a pair of vectors at right angles.
	For example the vectors along the X and Y axes [1,0], [0,1] are at right angles and thus orthogonal.


	Similarly the 45degree vectors [1,1], [1,-1] are at right angles and thus orthogonal.


	If in addition to being at right angles the vectors are of length 1, then they are orthonormal.
	For example the pairs of vectors ([0,1],[1,0]) and


	Any pair of orthonormal vectors can substitute for the x and y axes in defining the position of a point. The new axes serve as a ‘basis’ for defining points.


	We can measure how far along an axis provided by one of the basis vectors a given v is by using the inner product operator.
	Let v be defined in the x,y basis and let p,q be and alternative orthonormal basis. The point v in terms of p,q is given by the vector [p.v,q.v]


	Given v of dimension n and orthonormal basis vectors b₁,b₂,..,b_nwe can express v in this new basis as w=v [b₁,b₂,..,b_n] = [v.b₁,v.b₂,..,v.b_n] without loss of information


	Consider the block of 4 pixels a,b,c,d
	These can be considered a 4d vector
	[a,b,c,d] expressed in the basis Space


	An alternative basis to express the picture block would be Haar wavelets
	[1,1,1,1], [1,1,-1,-1], [1,-1,0,0], [0,0,1,-1]


	The Discrete Cosine Transform, uses a set of discrete cosine waves of different frequency combinations to act as basis vectors


	Divide each channel (Y, U, V) into 8x8 blocks which are then 64 dimension vectors
	Multiply each block by each of 64 DCT basis vectors.
	Each 8x8 block of pixel values becomes an 8x8 block of coefficients.


	The block of coefficients is read out in zig zag order so that low frequency coefficients come before high frequency ones


	Clearly if we encode a coefficient to less precision we need fewer bits to do so.
	A suitable technique for this is to divide the coefficients by an integer and ignore the remainder on encoding and do the reverse on decoding.


	Suppose we have the coefficient vector [ 11, 93, 102, 77]
	Encode
		We divide by a constant, (14 in this instance) to obtain [0.78, 6.64,7.24,5.5].
		Ignore remainder and we have [0, 6,7,5].
	Decode
		Multiply by 14 again and we get [0,84,98,70] which is close to the original vector


	Different quantization divisors are used in each position in the block so that the low frequency information is less quantized.


	The zig zag ordering plus quantization causes means that most of the last set of coefficients are zero. The transmission of an EOB, (end of block) code, indicates that there are no more coefficients in the block.


	MPEG is a set of standards for the encoding of moving pictures. It follows on from JPEG with which it shares the basic features of
		DCT
		Zig Zag encoding
		Quantization
		Run length encoding
	It adds the new features of
		Key frames
		Motion compensation


	Since there is substantial sequential redundancy in video ( one frame similar to the next) only Key Frames or Interframes are fully encoded, using JPEG techniques
	Typically one frame in 15 is an Interframe


	Classical compression works by tranfering the image into a vector space that better captures the redundancy of the image
	DCT or wavelet coefficients
	Motion compensation for video


	http://www.jpeg.org/public/jpeghomepage.htm
	ftp://ftp.uu.net/graphics/jpeg


	What did you not understand here?
	What would you like more information on?