Information and Compression
|
|
|
Paul Cockshott |
|
Faraday partnership |
Introduction
|
|
|
We will be looking at what information
is |
|
Methods of measuring it |
|
The idea of compression |
|
2 distinct ways of compressing images |
Summary of Course
|
|
|
At the end of this lecture you should
have an idea what information is, its relation to order, and why data
compression is possible |
Agenda
|
|
|
Encoding systems |
|
BCD versus decimal |
|
Information efficiency |
|
Shannons Measure |
|
Chaitin, Kolmogorov Complexity Theory |
|
Relation to Goedels theorem |
Overview
|
|
|
Data coms channels bounded |
|
Images take great deal of information
in raw form |
|
Efficient transmission requires denser
encoding |
|
This requires that we understand what
information actually is |
Connections
|
|
|
Simple encoding example shows different
densities possible |
|
Shannons information theory deals with
transmission |
|
Chaitin relates this to Turing machines
and computability |
|
Goedels theorem limits what we know
about compression |
Vocabulary
|
|
|
information, |
|
entropy, |
|
compression, |
|
redundancy, |
|
lossless encoding, |
|
lossy encoding |
What is information
|
|
|
Information measured in bits |
|
Bit equivalent to binary digit |
|
Why use binary system not decimal? |
|
Decimal would seem more natural |
|
Alternatively why not use e, base of
natural logarithms instead of 2 or 10 ? |
Voltage encoding
|
|
|
Digital information encoded as ranges
of continuous variables |
|
Each digital value takes a range |
|
isolation bands in between |
Noise Immunity
|
|
|
BINARY |
|
2 values 0,1 |
|
1 isolation band |
|
1.6 volt isolation band |
|
Good noise imunity, 0.5 volt noise on
power no problem |
|
DECIMAL |
|
10 values 0..9 |
|
9 isolation bands |
|
0.32 volt isolation bands |
|
0.5 volt power noise would destroy
readings |
Decimal Computing
|
|
|
Great advantages of decimal for
commerce, banking |
|
Burroughs, ICL went on making decimal
computers until late 1970s |
|
Burroughs B8 series, ICL System 10 |
|
Used Binary Encoded Decimal BCD |
BCD encoding of 1998
|
|
|
Reliable since encoded in binary |
|
1998 represented as |
|
0001 1001 1001 1000 |
|
In pure binary this is |
|
0000 0111 1100 1110 |
|
BCD 13 sig digits |
|
Binary 11 sig digits |
BCD less dense
|
|
|
BCD requires 16 bits to represent dates
up to 9999 BCD : 1001 1001 1001 1001 |
|
Binary requires 14 bits to go up to
9999 binary: 10 0111 0000 1111 |
|
Why ? |
|
What is the relative efficiency of two
encodings? |
Nyquist Limit
Maxwells Demon
|
|
|
Daemon opens door only for fast molecules to go from A to B. |
End Result – Free Heat
|
|
|
Slow molecules in A, fast in B |
|
Thus B hotter than A, and can be used
to power a machine |
Szilards response
|
|
|
Szilard pointed out that to decide
which molecules to let through, the daemon must measure the speed of them.
Szilard showed that these measurements ( bouncing photons off the molecules )
would use up more energy than was gained. |
Topic Two: Shannon’s Theory
|
|
|
Shannon defines a bit as the amount of
information required to decide between two equally probable outcomes |
|
Example: a sequence of tosses of a fair
coin can be encode 1 bit per toss, such that heads are 1 and tails 0. |
Information as Surprise
|
|
|
According to Shannon the information
content of a message is a function of how surprised we are by it. |
|
The less probable a message the more
information it contains |
|
H = - log2 p |
|
H = information also called entropy, p
the probability of occurrence of a messages. The mean information content of
an ensemble of messages is obtained by weighting the messages by their
probability of occurence |
|
-S p log2p |
This explains inefficiency
of BCD
|
|
|
Consider the frequency of occurrence of
1 and 0 in the different columns of a BCD digit. |
|
Only the least significant digit
decides between equiprobable outcomes |
|
For most significant digit, 0 is four
times more common than 1 |
Determining encoding
efficiency
|
|
|
For the most significant digit of BCD
we have |
|
p= 0.2 of a 1, p=0.8 of a 0 |
|
Applying Shannon’s formula a 1
contributes 2.321 bits of information, but with a probability of 0.2 this
contributes 0.464 bits |
|
Sum of information for 0 and 1 is 0.721
bits |
Mean info of a BCD digit
Topic Three: Algorithmic
Information
|
|
|
Chaitin defines the information content
of a number ( or sequence of binary digits ) to be the length of the shortest
computer program capable of generating that sequence. |
|
He uses Turing machines as cannonical
computers Thus the information content of a sequence is the shortest Turing
machine tape that would cause the machine to halt with the sequence on its
output tape. |
Randomness and p
|
|
|
We know from Shannon that 1 million
tosses of fair coin generates 1 million bits of information. |
|
On the other hand, from Chaitin we know
that p to 1 million bit precision contains much less than 1 millon bits,
since the program to compute p can be encoded in much less than 1 million
bits. |
Randomness
|
|
|
Andrei Kolgomorov defined a random
number as a number for which no formulae shorter than itself existed. |
|
By Chaitin’s definition of information
a random number is thus incompressible. |
|
A Random Sequence Thus Contains the
Maximum Information. |
Randomness goal of
compression
|
|
|
A fully compressed data sequence is
indistinguishable from a random sequence of 0s and 1s. |
|
This follows directly from Kolgomorov
and Chaitin’s results. |
|
From Shannon we also have the result
that for each bit of the stream to have maximal information it must mimic the
tossing of a fair coin : be unpredictable, random. |
Information is not meaning
|
|
|
One million digits of p are more
meaningful than one million random bits. |
|
But they contain less information. |
|
Meaningful sequences generally contain
redundancy. |
Information is not
complexity
|
|
|
One million digits from p have less
information than a million random bits but they are more complex. |
|
The complexity of a sequence is
measured by the number of machine cycles a computer would have to go through
to generate it ( Bennet’s theorem of logical depth). |
Topic Four: Limits to
compression
|
|
|
We can not in principle come up with an
algorithm for performing the maximum compression on any bit sequence. |
|
3/7 is a rule of arithmetic, an
algorithm that generates the sequence 0.42857142857 |
|
So this sequence is presumably less
random than 0.32857142877 ( changed 2 digits) |
|
But we can never be sure. |
Consequence of Goedels
theorem
|
|
|
Showed we can not prove completeness of
a consistent set of arithmetic axioms. There will be true statements that can
not be proven. |
|
If there existed a general procedure to
derive the minimal Turing machine program for any sequence, then we would
have a procedure to derive any true proposition from a smaller set of axioms,
contra Goedel. |
Image sizes
|
|
|
QCIF: 160 x 120 (used in video
phones) |
|
MPEG: 352 x 288 |
|
VGA: 640 x 480 |
|
NTSC: 720 x 486 ( US Television) |
|
Workstation: 1280 x 1024 |
|
HDTV: 1920 x 1080 |
|
35mm slide: 3072 x 2048 |
Storage capacities
|
|
|
Floppy disk |
|
Floppy disk capacity = 1.44 MB |
|
A single 1280x1024x24 image = 3.9 MB |
|
A single 640x480x24 image = 922K |
|
Floppy disk holds only one 640x480x24 image. |
Video on CD rom
|
|
|
CD-ROM capacity = 600 MB |
|
A 160x120x16 image @30 fps (QCIF)= 1.15 MB/sec |
|
CD-ROM now holds 8.7 minutes of video |
|
(still not enough), but only at about a
quarter the image size of normal TV |
Telecoms links
|
|
|
Ethernet = 10 meg bits /sec |
|
ISDN = 128 K bits /sec |
|
ADSL = max 2 meg bit /sec |
|
Compute max frame rates for QCIF for
each of these channels |
Summary
|
|
|
Information = randomness = disorder = entropy. |
|
Meaningful data is generally
compressible. |
|
Information measured by length of
program to generate it. |
|
We will look at two techniques to apply
these ideas to images: Fractal compression, and Vector Quantization. |
Where to get more
information
|
|
|
‘The user Illusion’ by Tor
Norretranders , Penguin 1998. |
|
‘Information Randomness and
Incompleteness’ by Chaitin |
|
‘Information is Physical’ by Landauer,
in Physics Today,May 1991. |
|
‘Logical depth and physical complexity’
by Bennet, in The Universal Turing Machine a Half Century Survey, Oxford
1988. |
Feedback
|
|
|
What did you not understand here? |
|
What would you like more information
on? |