Information and Compression
Paul Cockshott
Faraday partnership

We will be looking at what information is
Methods of measuring it
The idea of compression
2 distinct ways of compressing images

Summary of Course
At the end of this lecture you should have an idea what information is, its relation to order, and why data compression is possible

Encoding systems
BCD versus decimal
Information efficiency
Shannons Measure
Chaitin, Kolmogorov Complexity Theory
Relation to Goedels theorem

Data coms channels bounded
Images take great deal of information in raw form
Efficient transmission requires denser encoding
This requires that we understand what information actually is

Simple encoding example shows different densities possible
Shannons information theory deals with transmission
Chaitin relates this to Turing machines and computability
Goedels theorem limits what we know about compression

lossless encoding,
lossy encoding

What is information
Information measured in bits
Bit equivalent to binary digit
Why use binary system not decimal?
Decimal would seem more natural
Alternatively why not use e, base of natural logarithms instead of 2 or 10 ?

Voltage encoding
Digital information encoded as ranges of continuous variables
Each digital value takes a range
isolation bands in between

Noise Immunity
2 values 0,1
1 isolation band
1.6 volt isolation band
Good noise imunity, 0.5 volt noise on power no problem
10 values 0..9
9 isolation bands
0.32 volt isolation bands
0.5 volt power noise would destroy readings

Decimal Computing
Great advantages of decimal for commerce, banking
Burroughs, ICL went on making decimal computers until late 1970s
Burroughs B8 series, ICL System 10
Used Binary Encoded Decimal BCD

BCD encoding of 1998
Reliable since encoded in binary
1998 represented as
0001 1001 1001 1000
In pure binary this is
0000 0111 1100 1110
BCD 13 sig digits
Binary 11 sig digits

BCD less dense
BCD requires 16 bits to represent dates up to 9999 BCD : 1001 1001 1001 1001
Binary requires 14 bits to go up to 9999 binary: 10 0111 0000 1111
Why ?
What is the relative efficiency of two encodings?

Nyquist Limit

Maxwells Demon
 Daemon opens door only for fast molecules to go from A to B.

End Result – Free Heat
Slow molecules in A, fast in B
Thus B hotter than A, and can be used to power a machine

Szilards response
Szilard pointed out that to decide which molecules to let through, the daemon must measure the speed of them. Szilard showed that these measurements ( bouncing photons off the molecules ) would use up more energy than was gained.

Topic Two: Shannon’s Theory
Shannon defines a bit as the amount of information required to decide between two equally probable outcomes
Example: a sequence of tosses of a fair coin can be encode 1 bit per toss, such that heads are 1 and tails 0.

Information as Surprise
According to Shannon the information content of a message is a function of how surprised we are by it.
The less probable a message the more information it contains
H = - log2 p
H = information also called entropy, p the probability of occurrence of a messages. The mean information content of an ensemble of messages is obtained by weighting the messages by their probability of occurence
-S p log2p

This explains inefficiency of BCD
Consider the frequency of occurrence of 1 and 0 in the different columns of a BCD digit.
Only the least significant digit decides between equiprobable outcomes
For most significant digit, 0 is four times more common than 1

Determining encoding efficiency
For the most significant digit of BCD we have
p= 0.2 of a 1, p=0.8 of a 0
Applying Shannon’s formula a 1 contributes 2.321 bits of information, but with a probability of 0.2 this contributes 0.464 bits
Sum of information for 0 and 1 is 0.721 bits

Mean info of a BCD digit

Topic Three: Algorithmic Information
Chaitin defines the information content of a number ( or sequence of binary digits ) to be the length of the shortest computer program capable of generating that sequence.
He uses Turing machines as cannonical computers Thus the information content of a sequence is the shortest Turing machine tape that would cause the machine to halt with the sequence on its output tape.

Randomness and p
We know from Shannon that 1 million tosses of fair coin generates 1 million bits of information.
On the other hand, from Chaitin we know that p to 1 million bit precision contains much less than 1 millon bits, since the program to compute p can be encoded in much less than 1 million bits.

Andrei Kolgomorov defined a random number as a number for which no formulae shorter than itself existed.
By Chaitin’s definition of information a random number is thus incompressible.
A Random Sequence Thus Contains the Maximum Information.

Randomness goal of compression
A fully compressed data sequence is indistinguishable from a random sequence of 0s and 1s.
This follows directly from Kolgomorov and Chaitin’s results.
From Shannon we also have the result that for each bit of the stream to have maximal information it must mimic the tossing of a fair coin : be unpredictable, random.

Information is not meaning
One million digits of p are more meaningful than one million random bits.
But they contain less information.
Meaningful sequences generally contain redundancy.

Information is not complexity
One million digits from p have less information than a million random bits but they are more complex.
The complexity of a sequence is measured by the number of machine cycles a computer would have to go through to generate it ( Bennet’s theorem of logical depth).

Topic Four: Limits to compression
We can not in principle come up with an algorithm for performing the maximum compression on any bit sequence.
3/7 is a rule of arithmetic, an algorithm that generates the sequence 0.42857142857
So this sequence is presumably less random than 0.32857142877 ( changed 2 digits)
But we can never be sure.

Consequence of Goedels theorem
Showed we can not prove completeness of a consistent set of arithmetic axioms. There will be true statements that can not be proven.
If there existed a general procedure to derive the minimal Turing machine program for any sequence, then we would have a procedure to derive any true proposition from a smaller set of axioms, contra Goedel.

Image sizes
 QCIF: 160 x 120 (used in video phones)
 MPEG: 352 x 288
 VGA: 640 x 480
 NTSC: 720 x 486 ( US Television)
 Workstation: 1280 x 1024
 HDTV: 1920 x 1080
 35mm slide: 3072 x 2048

Storage capacities
Floppy disk
 Floppy disk capacity = 1.44 MB
 A single 1280x1024x24 image = 3.9 MB
 A single 640x480x24 image = 922K
 Floppy disk holds only one 640x480x24 image.

Video on CD rom
 CD-ROM capacity = 600 MB
 A 160x120x16 image @30 fps (QCIF)= 1.15 MB/sec
 CD-ROM now holds 8.7 minutes of video
(still not enough), but only at about a quarter the image size of normal TV

Telecoms links
Ethernet = 10 meg bits /sec
ISDN = 128 K bits /sec
ADSL = max 2 meg bit /sec
Compute max frame rates for QCIF for each of these channels

Information = randomness = disorder = entropy.
Meaningful data is generally compressible.
Information measured by length of program to generate it.
We will look at two techniques to apply these ideas to images: Fractal compression, and Vector Quantization.

Where to get more information
‘The user Illusion’ by Tor Norretranders , Penguin 1998.
‘Information Randomness and Incompleteness’ by Chaitin
‘Information is Physical’ by Landauer, in Physics Today,May 1991.
‘Logical depth and physical complexity’ by Bennet, in The Universal Turing Machine a Half Century Survey, Oxford 1988.

What did you not understand here?
What would you like more information on?