Journey's End

Thoughts on entropy, compression and predictors

I will prefix this post by saying that none of the ideas here are new and its all well covered ground by now. Like much of this blog, I am mostly writing to clarify ideas and to serve as a reference for the future-me.

Entropy and Prediction

Entropy as considered in information theory, can be thought of as the number of possible values the each symbol can take on. A stream of perfectly random bits has maximal entropy because each symbol is equally likely to be a 1 or 0. If however we impose rules on such a binary stream, e.g. after 3x1s there must be a 0, then after a run of 3x1s, the next symbol must be a 0, and so the entropy of such a stream is reduced. This change in entropy can be considered information.

Viewed through this lens, entropy and our ability to predict the next symbol, based on what has come before, are related concepts. If we can perfectly predict the next symbol every time, then the entropy of the a data stream is the entropy of the predictor, i.e. how many bits do we need to make the predictions.

Compression and Predictors

To compress a data stream is to reduce the number of bits required to represent towards the limit dictated by the entropy of the data stream. This can be done by looking for repeated symbols and using more efficient encoding for those symbols as is typically done with Huffman codes. A further refinement of this technique is to use a predictor to predict the next symbol and encode the difference between the actual symbol that occurs and the predicted symbol.

To see why this is an improvement, consider a perfect predictor - the only symbol we would need to encode is 0 which makes for very high compression ratios. In practice this rarely happens, but predictors do generally reduce the number of possible symbols. Consider the sequence [10, 9, 8, 5, 3, 1]. This has 6 unique symbols we need to compress. However if we use a simple predictor that says "the next symbol is the same as the current symbol", then the difference is [10, -1, -1, -3, -2, -2], where we assume the first prediction is 0. This sequence of difference from prediction only uses 4 symbols and is easier to compress.

The use of predictors allows us to exploit correlation between symbols to achieve better compressions than simply using more efficient coding. Predictors can be continuously applied with a different predictor each time, to potentially exploit high levels of correlation. The use of predictors can be said to "decorrelate" the data because the ability to predict the next symbol in difference-from-prediction data stream is less than before. e.g. if we were apply the same predictor as before to [10, -1, -1, -3, -2, -2] we get [10, -11, 0, -2, 1, 0], and now we have 5 symbols instead of the 4 we started with, i.e. use of the predictor was detrimental. Where as the original data had elements that correlated to each other by difference of 1 or 2, the difference-from-prediction does not have this property. This also illustrates that the choice of predictor matters as a bad predictor can make things worse by trying to force correlations where non-exists.

Predictors can be specified ahead of time, or "trained" on the data and transmitted along with the compressed data so the receiver can decompress it.

ts=13:35 tags=[computer,science]

DFT and the Window Spectrum

In case I forget again why the window spectrum is computed when doing DFT, the answer is that discretisation leads to aliasing artefacts.

Consider the DFT of a x(t)=1. Its DFT is calculated by calculating the sum sin(2 pi f t)*x(t) and cos(2 pi …

ts=12:11 tags=[badmaths,science]

My First Galaxies (Kinda)

I finally managed to find some galaxies last night. This has been something that has eluded me due to my location, inexperience, and frankly inadequate equipment. (You in the back! Stop snickering!)

Last night, with my Canon 1000D I caught the barest glimpses of M83 and NGC 5128, aka the …

ts=07:53 tags=[photograph,astronomy,science]

Fourier fitting with gnuplot

gnuplot can be used to fit a fourier series to a series of data points. In my case it was an observation log for the star BE Lyncis from the Bossom Observatory, curtsey of Geir Klingenberg.

Firstly one needs to define a time function t(x) such that as x …

ts=10:05 tags=[computer,code,software,science,astronomy,oss]

Sports isn’t the only thing we are good at

From the website:

“StarStuff is ABC NewsRadio's flagship Astronomy, Cosmology, Space and Science program. “

It is hosted by Stuart Gary and it is the only public program on TV or radio which is dedicated to science.

And it is about to be axed to make way for a 30 minute …

ts=01:31 tags=[rant,australia,astronomy,science]

Science, its not part of economics

The Australia 2020 summit is a great idea, but at the same time it goes to show why Australia isn't a great country for developing revolutionary technologies or making breakthroughs in science. There are 10 special interest groups in australia2020, and not one of them dedicated to Science/Technology. Sure …

ts=06:02 tags=[politics,science]

A Rant about Cherries and Science.

There is a problem with science today. Actually, no. There is a problem with the people of today which makes them easily exploited by detractors of science.

Science generates a lot of data. Most of that data is easily accessible. It is also easily searchable. The consequence is that regardless …

ts=05:15 tags=[rant,science]

May 24
2025

Thoughts on entropy, compression and predictors

Entropy and Prediction

Compression and Predictors

May 17
2012

DFT and the Window Spectrum

Mar 11
2010

My First Galaxies (Kinda)

Sep 21
2009

Fourier fitting with gnuplot

Jun 20
2008

Sports isn’t the only thing we are good at

Mar 02
2008

Science, its not part of economics

Nov 13
2007

A Rant about Cherries and Science.

May 242025

Entropy and Prediction

Compression and Predictors

May 172012

Mar 112010

Sep 212009

Jun 202008

Mar 022008

Nov 132007

May 24
2025

May 17
2012

Mar 11
2010

Sep 21
2009

Jun 20
2008

Mar 02
2008

Nov 13
2007