Information and Compression

Paul Cockshott

Faraday partnership

Introduction

We will be looking at what information is

Methods of measuring it

The idea of compression

2 distinct ways of compressing images

Summary of Course

At the end of this lecture you should have an idea what information is, its relation to order, and why data compression is possible

Agenda

Encoding systems

BCD versus decimal

Information efficiency

Shannons Measure

Chaitin, Kolmogorov Complexity Theory

Relation to Goedels theorem

Overview

Data coms channels bounded

Images take great deal of information in raw form

Efficient transmission requires denser encoding

This requires that we understand what information actually is

Connections

Simple encoding example shows different densities possible

Shannons information theory deals with transmission

Chaitin relates this to Turing machines and computability

Goedels theorem limits what we know about compression

Vocabulary

information,

entropy,

compression,

redundancy,

lossless encoding,

lossy encoding

What is information

Information measured in bits

Bit equivalent to binary digit

Why use binary system not decimal?

Decimal would seem more natural

Alternatively why not use e, base of natural logarithms instead of 2 or 10 ?

Voltage encoding

Digital information encoded as ranges of continuous variables

Each digital value takes a range

isolation bands in between

Noise Immunity

BINARY

2 values 0,1

1 isolation band

1.6 volt isolation band

Good noise imunity, 0.5 volt noise on power no problem

DECIMAL

10 values 0..9

9 isolation bands

0.32 volt isolation bands

0.5 volt power noise would destroy readings

Decimal Computing

Great advantages of decimal for commerce, banking

Burroughs, ICL went on making decimal computers until late 1970s

Burroughs B8 series, ICL System 10

Used Binary Encoded Decimal BCD

BCD encoding of 1998

Reliable since encoded in binary

1998 represented as

0001 1001 1001 1000

In pure binary this is

0000 0111 1100 1110

BCD 13 sig digits

Binary 11 sig digits

BCD less dense

BCD requires 16 bits to represent dates up to 9999 BCD : 1001 1001 1001 1001

Binary requires 14 bits to go up to 9999 binary: 10 0111 0000 1111

Why ?

What is the relative efficiency of two encodings?

Nyquist Limit

Maxwells Demon

Daemon opens door only for fast molecules to go from A to B.

End Result – Free Heat

Slow molecules in A, fast in B

Thus B hotter than A, and can be used to power a machine

Szilards response

Szilard pointed out that to decide which molecules to let through, the daemon must measure the speed of them. Szilard showed that these measurements ( bouncing photons off the molecules ) would use up more energy than was gained.

Topic Two: Shannon’s Theory

Shannon defines a bit as the amount of information required to decide between two equally probable outcomes

Example: a sequence of tosses of a fair coin can be encode 1 bit per toss, such that heads are 1 and tails 0.

Information as Surprise

According to Shannon the information content of a message is a function of how surprised we are by it.

The less probable a message the more information it contains

H = - log₂p

H = information also called entropy, p the probability of occurrence of a messages. The mean information content of an ensemble of messages is obtained by weighting the messages by their probability of occurence

-S p log₂p

This explains inefficiency of BCD

Consider the frequency of occurrence of 1 and 0 in the different columns of a BCD digit.

Only the least significant digit decides between equiprobable outcomes

For most significant digit, 0 is four times more common than 1

Determining encoding efficiency

For the most significant digit of BCD we have

p= 0.2 of a 1, p=0.8 of a 0

Applying Shannon’s formula a 1 contributes 2.321 bits of information, but with a probability of 0.2 this contributes 0.464 bits

Sum of information for 0 and 1 is 0.721 bits

Mean info of a BCD digit

Topic Three: Algorithmic Information

Chaitin defines the information content of a number ( or sequence of binary digits ) to be the length of the shortest computer program capable of generating that sequence.

He uses Turing machines as cannonical computers Thus the information content of a sequence is the shortest Turing machine tape that would cause the machine to halt with the sequence on its output tape.

Randomness and p

We know from Shannon that 1 million tosses of fair coin generates 1 million bits of information.

On the other hand, from Chaitin we know that p to 1 million bit precision contains much less than 1 millon bits, since the program to compute p can be encoded in much less than 1 million bits.

Randomness

Andrei Kolgomorov defined a random number as a number for which no formulae shorter than itself existed.

By Chaitin’s definition of information a random number is thus incompressible.

A Random Sequence Thus Contains the Maximum Information.

Randomness goal of compression

A fully compressed data sequence is indistinguishable from a random sequence of 0s and 1s.

This follows directly from Kolgomorov and Chaitin’s results.

From Shannon we also have the result that for each bit of the stream to have maximal information it must mimic the tossing of a fair coin : be unpredictable, random.

Information is not meaning

One million digits of p are more meaningful than one million random bits.

But they contain less information.

Meaningful sequences generally contain redundancy.

Information is not complexity

One million digits from p have less information than a million random bits but they are more complex.

The complexity of a sequence is measured by the number of machine cycles a computer would have to go through to generate it ( Bennet’s theorem of logical depth).

Topic Four: Limits to compression

We can not in principle come up with an algorithm for performing the maximum compression on any bit sequence.

3/7 is a rule of arithmetic, an algorithm that generates the sequence 0.42857142857

So this sequence is presumably less random than 0.32857142877 ( changed 2 digits)

But we can never be sure.

Consequence of Goedels theorem

Showed we can not prove completeness of a consistent set of arithmetic axioms. There will be true statements that can not be proven.

If there existed a general procedure to derive the minimal Turing machine program for any sequence, then we would have a procedure to derive any true proposition from a smaller set of axioms, contra Goedel.

Image sizes

QCIF: 160 x 120 (used in video phones)

MPEG: 352 x 288

VGA: 640 x 480

NTSC: 720 x 486 ( US Television)

Workstation: 1280 x 1024

HDTV: 1920 x 1080

35mm slide: 3072 x 2048

Storage capacities

Floppy disk

Floppy disk capacity = 1.44 MB

A single 1280x1024x24 image = 3.9 MB

A single 640x480x24 image = 922K

Floppy disk holds only one 640x480x24 image.

Video on CD rom

CD-ROM capacity = 600 MB

A 160x120x16 image @30 fps (QCIF)= 1.15 MB/sec

CD-ROM now holds 8.7 minutes of video

(still not enough), but only at about a quarter the image size of normal TV

Telecoms links

Ethernet = 10 meg bits /sec

ISDN = 128 K bits /sec

ADSL = max 2 meg bit /sec

Compute max frame rates for QCIF for each of these channels

Summary

Information = randomness = disorder = entropy.

Meaningful data is generally compressible.

Information measured by length of program to generate it.

We will look at two techniques to apply these ideas to images: Fractal compression, and Vector Quantization.

Where to get more information

‘The user Illusion’ by Tor Norretranders , Penguin 1998.

‘Information Randomness and Incompleteness’ by Chaitin

‘Information is Physical’ by Landauer, in Physics Today,May 1991.

‘Logical depth and physical complexity’ by Bennet, in The Universal Turing Machine a Half Century Survey, Oxford 1988.

Feedback

What did you not understand here?

What would you like more information on?


	We will be looking at what information is
	Methods of measuring it
	The idea of compression
	2 distinct ways of compressing images


	At the end of this lecture you should have an idea what information is, its relation to order, and why data compression is possible


	Encoding systems
	BCD versus decimal
	Information efficiency
	Shannons Measure
	Chaitin, Kolmogorov Complexity Theory
	Relation to Goedels theorem


	Data coms channels bounded
	Images take great deal of information in raw form
	Efficient transmission requires denser encoding
	This requires that we understand what information actually is


	Simple encoding example shows different densities possible
	Shannons information theory deals with transmission
	Chaitin relates this to Turing machines and computability
	Goedels theorem limits what we know about compression


	information,
	entropy,
	compression,
	redundancy,
	lossless encoding,
	lossy encoding


	Information measured in bits
	Bit equivalent to binary digit
	Why use binary system not decimal?
	Decimal would seem more natural
	Alternatively why not use e, base of natural logarithms instead of 2 or 10 ?


	Digital information encoded as ranges of continuous variables
	Each digital value takes a range
	isolation bands in between


	BINARY
	2 values 0,1
	1 isolation band
	1.6 volt isolation band
	Good noise imunity, 0.5 volt noise on power no problem
	DECIMAL
	10 values 0..9
	9 isolation bands
	0.32 volt isolation bands
	0.5 volt power noise would destroy readings


	Great advantages of decimal for commerce, banking
	Burroughs, ICL went on making decimal computers until late 1970s
	Burroughs B8 series, ICL System 10
	Used Binary Encoded Decimal BCD


	Reliable since encoded in binary
	1998 represented as
	0001 1001 1001 1000
	In pure binary this is
	0000 0111 1100 1110
	BCD 13 sig digits
	Binary 11 sig digits


	BCD requires 16 bits to represent dates up to 9999 BCD : 1001 1001 1001 1001
	Binary requires 14 bits to go up to 9999 binary: 10 0111 0000 1111
	Why ?
	What is the relative efficiency of two encodings?


	Daemon opens door only for fast molecules to go from A to B.


	Slow molecules in A, fast in B
	Thus B hotter than A, and can be used to power a machine


	Szilard pointed out that to decide which molecules to let through, the daemon must measure the speed of them. Szilard showed that these measurements ( bouncing photons off the molecules ) would use up more energy than was gained.


	Shannon defines a bit as the amount of information required to decide between two equally probable outcomes
	Example: a sequence of tosses of a fair coin can be encode 1 bit per toss, such that heads are 1 and tails 0.


	According to Shannon the information content of a message is a function of how surprised we are by it.
	The less probable a message the more information it contains
	H = - log₂p
	H = information also called entropy, p the probability of occurrence of a messages. The mean information content of an ensemble of messages is obtained by weighting the messages by their probability of occurence
	-S p log₂p


	Consider the frequency of occurrence of 1 and 0 in the different columns of a BCD digit.
	Only the least significant digit decides between equiprobable outcomes
	For most significant digit, 0 is four times more common than 1


	For the most significant digit of BCD we have
	p= 0.2 of a 1, p=0.8 of a 0
	Applying Shannon’s formula a 1 contributes 2.321 bits of information, but with a probability of 0.2 this contributes 0.464 bits
	Sum of information for 0 and 1 is 0.721 bits


	Chaitin defines the information content of a number ( or sequence of binary digits ) to be the length of the shortest computer program capable of generating that sequence.
	He uses Turing machines as cannonical computers Thus the information content of a sequence is the shortest Turing machine tape that would cause the machine to halt with the sequence on its output tape.


	We know from Shannon that 1 million tosses of fair coin generates 1 million bits of information.
	On the other hand, from Chaitin we know that p to 1 million bit precision contains much less than 1 millon bits, since the program to compute p can be encoded in much less than 1 million bits.


	Andrei Kolgomorov defined a random number as a number for which no formulae shorter than itself existed.
	By Chaitin’s definition of information a random number is thus incompressible.
	A Random Sequence Thus Contains the Maximum Information.


	A fully compressed data sequence is indistinguishable from a random sequence of 0s and 1s.
	This follows directly from Kolgomorov and Chaitin’s results.
	From Shannon we also have the result that for each bit of the stream to have maximal information it must mimic the tossing of a fair coin : be unpredictable, random.


	One million digits of p are more meaningful than one million random bits.
	But they contain less information.
	Meaningful sequences generally contain redundancy.


	One million digits from p have less information than a million random bits but they are more complex.
	The complexity of a sequence is measured by the number of machine cycles a computer would have to go through to generate it ( Bennet’s theorem of logical depth).


	We can not in principle come up with an algorithm for performing the maximum compression on any bit sequence.
	3/7 is a rule of arithmetic, an algorithm that generates the sequence 0.42857142857
	So this sequence is presumably less random than 0.32857142877 ( changed 2 digits)
	But we can never be sure.


	Showed we can not prove completeness of a consistent set of arithmetic axioms. There will be true statements that can not be proven.
	If there existed a general procedure to derive the minimal Turing machine program for any sequence, then we would have a procedure to derive any true proposition from a smaller set of axioms, contra Goedel.


	QCIF: 160 x 120 (used in video phones)
	MPEG: 352 x 288
	VGA: 640 x 480
	NTSC: 720 x 486 ( US Television)
	Workstation: 1280 x 1024
	HDTV: 1920 x 1080
	35mm slide: 3072 x 2048


	Floppy disk
	Floppy disk capacity = 1.44 MB
	A single 1280x1024x24 image = 3.9 MB
	A single 640x480x24 image = 922K
	Floppy disk holds only one 640x480x24 image.


	CD-ROM capacity = 600 MB
	A 160x120x16 image @30 fps (QCIF)= 1.15 MB/sec
	CD-ROM now holds 8.7 minutes of video
	(still not enough), but only at about a quarter the image size of normal TV