Journey's End

May 24
2025

Thoughts on entropy, compression and predictors

I will prefix this post by saying that none of the ideas here are new and its all well covered ground by now. Like much of this blog, I am mostly writing to clarify ideas and to serve as a reference for the future-me.

Entropy and Prediction

Entropy as considered in information theory, can be thought of as the number of possible values the each symbol can take on. A stream of perfectly random bits has maximal entropy because each symbol is equally likely to be a 1 or 0. If however we impose rules on such a binary stream, e.g. after 3x1s there must be a 0, then after a run of 3x1s, the next symbol must be a 0, and so the entropy of such a stream is reduced. This change in entropy can be considered information.

Viewed through this lens, entropy and our ability to predict the next symbol, based on what has come before, are related concepts. If we can perfectly predict the next symbol every time, then the entropy of the a data stream is the entropy of the predictor, i.e. how many bits do we need to make the predictions.

Compression and Predictors

To compress a data stream is to reduce the number of bits required to represent towards the limit dictated by the entropy of the data stream. This can be done by looking for repeated symbols and using more efficient encoding for those symbols as is typically done with Huffman codes. A further refinement of this technique is to use a predictor to predict the next symbol and encode the difference between the actual symbol that occurs and the predicted symbol.

To see why this is an improvement, consider a perfect predictor - the only symbol we would need to encode is 0 which makes for very high compression ratios. In practice this rarely happens, but predictors do generally reduce the number of possible symbols. Consider the sequence [10, 9, 8, 5, 3, 1]. This has 6 unique symbols we need to compress. However if we use a simple predictor that says "the next symbol is the same as the current symbol", then the difference is [10, -1, -1, -3, -2, -2], where we assume the first prediction is 0. This sequence of difference from prediction only uses 4 symbols and is easier to compress.

The use of predictors allows us to exploit correlation between symbols to achieve better compressions than simply using more efficient coding. Predictors can be continuously applied with a different predictor each time, to potentially exploit high levels of correlation. The use of predictors can be said to "decorrelate" the data because the ability to predict the next symbol in difference-from-prediction data stream is less than before. e.g. if we were apply the same predictor as before to [10, -1, -1, -3, -2, -2] we get [10, -11, 0, -2, 1, 0], and now we have 5 symbols instead of the 4 we started with, i.e. use of the predictor was detrimental. Where as the original data had elements that correlated to each other by difference of 1 or 2, the difference-from-prediction does not have this property. This also illustrates that the choice of predictor matters as a bad predictor can make things worse by trying to force correlations where non-exists.

Predictors can be specified ahead of time, or "trained" on the data and transmitted along with the compressed data so the receiver can decompress it.

ts=13:35 tags=[computer,science]

May 17
2012

DFT and the Window Spectrum

In case I forget again why the window spectrum is computed when doing DFT, the answer is that discretisation  leads to aliasing artefacts.

Consider the DFT of a x(t)=1. Its DFT is calculated by calculating the sum sin(2 pi f t)*x(t) and cos(2 pi …

ts=12:11 tags=[badmaths,science]

Mar 11
2010

My First Galaxies (Kinda)

I finally managed to find some galaxies last night. This has been something that has eluded me due to my location, inexperience, and frankly inadequate equipment. (You in the back! Stop snickering!)

Last night, with my Canon 1000D I caught the barest glimpses of M83 and NGC 5128, aka the …

Sep 21
2009

Jun 20
2008

Mar 02
2008

Nov 13
2007

A Rant about Cherries and Science.

There is a problem with science today. Actually, no. There is a problem with the people of today which makes them easily exploited by  detractors of science.

Science generates a lot of data. Most of that data is easily accessible. It is also  easily searchable. The consequence is that regardless …

ts=05:15 tags=[rant,science]