Journey's End

May 24
2025

Thoughts on entropy, compression and predictors

I will prefix this post by saying that none of the ideas here are new and its all well covered ground by now. Like much of this blog, I am mostly writing to clarify ideas and to serve as a reference for the future-me.

Entropy and Prediction

Entropy as considered in information theory, can be thought of as the number of possible values the each symbol can take on. A stream of perfectly random bits has maximal entropy because each symbol is equally likely to be a 1 or 0. If however we impose rules on such a binary stream, e.g. after 3x1s there must be a 0, then after a run of 3x1s, the next symbol must be a 0, and so the entropy of such a stream is reduced. This change in entropy can be considered information.

Viewed through this lens, entropy and our ability to predict the next symbol, based on what has come before, are related concepts. If we can perfectly predict the next symbol every time, then the entropy of the a data stream is the entropy of the predictor, i.e. how many bits do we need to make the predictions.

Compression and Predictors

To compress a data stream is to reduce the number of bits required to represent towards the limit dictated by the entropy of the data stream. This can be done by looking for repeated symbols and using more efficient encoding for those symbols as is typically done with Huffman codes. A further refinement of this technique is to use a predictor to predict the next symbol and encode the difference between the actual symbol that occurs and the predicted symbol.

To see why this is an improvement, consider a perfect predictor - the only symbol we would need to encode is 0 which makes for very high compression ratios. In practice this rarely happens, but predictors do generally reduce the number of possible symbols. Consider the sequence [10, 9, 8, 5, 3, 1]. This has 6 unique symbols we need to compress. However if we use a simple predictor that says "the next symbol is the same as the current symbol", then the difference is [10, -1, -1, -3, -2, -2], where we assume the first prediction is 0. This sequence of difference from prediction only uses 4 symbols and is easier to compress.

The use of predictors allows us to exploit correlation between symbols to achieve better compressions than simply using more efficient coding. Predictors can be continuously applied with a different predictor each time, to potentially exploit high levels of correlation. The use of predictors can be said to "decorrelate" the data because the ability to predict the next symbol in difference-from-prediction data stream is less than before. e.g. if we were apply the same predictor as before to [10, -1, -1, -3, -2, -2] we get [10, -11, 0, -2, 1, 0], and now we have 5 symbols instead of the 4 we started with, i.e. use of the predictor was detrimental. Where as the original data had elements that correlated to each other by difference of 1 or 2, the difference-from-prediction does not have this property. This also illustrates that the choice of predictor matters as a bad predictor can make things worse by trying to force correlations where non-exists.

Predictors can be specified ahead of time, or "trained" on the data and transmitted along with the compressed data so the receiver can decompress it.

ts=13:35 tags=[computer,science]

Jul 21
2022

Using SubShapeBinders and Assembly4

Assembly4 and SubShapeBinders make it possible to assemble Parts, then using their position in the assembly to generate new Parts. However because new Parts depend on Parts in the assembly, if you try to add them to the assembly FreeCAD 0.19 will complain about cyclic redundancy, which makes sense …

Jul 21
2022

Kopia: Handling Directories That No Longer Exist

Scenario

  1. There exists a policy that snapshots directory ABC 
  2. Directory ABC now no longer exists
  3. You want to keep existing snapshots
  4. You want to stop kopia from attempting to take any more snapshots

In order to achieve (4):

  1. Set the policy to manual only
  2. Disable inheritance from parent/global policy …

Mar 05
2022

Adventures with Astroberry/KStars/Ekos

In astrophotography the combination of a goto mount, a camera and plate-solving is a powerful one. It allows you to do all kind of neat things, like polar-alignment without having a clear view of the south, extremely accurate goto functionality, and automated capture of multiple predefined targets.

There are roughly …

Jan 02
2022

Goodbye SpiderOak

I have been using SpiderOak One since Edward Snowden recommended it in 2013, almost a decade ago. I have over 1.5T of deduplicated data spread across 6 or so devices. This year, in 2022, I will not be renewing my subscriptions.

The main issues:

  1. Lack of updates: the last …
ts=08:57 tags=[computer,software]

Nov 24
2021

Quick Survey of Some Lightweight Linux Distributions

On an ASUS E203M

Busenlabs Lithium

Installed fine but kernel (4.9?) would immediately shutdown after decrypting the root volume due to incorrect thermal readings. Was not able to disable this via thermal.nocrt=1 boot param

AntiX 21

Live system did not have working mouse. Did not proceed to …

ts=05:35 tags=[computer,linux]

Jun 24
2019

T102HAAS.303 Considered Harmful For Linux

T102HAAS.303 Breaks Suspend

After upgrading my Asus Transformer Mini's BIOS to T102HAAS.303 suspend was completely broken. Closing the "lid" would cause some kind of suspend that cannot be disabled in software and from which Linux 4.15.0 cannot successfully resume from (actually it does resume but the …

ts=02:20 tags=[computer,linux]

Jun 24
2019

Ubuntu 18.04 on Asus Transformer Mini (T102H)

Installation

Create a USB drive with a single FAT32 partition then extract the installation ISO into it using 7z:

7z x /path/to/ISO -o/path/to/usb

Note that there is no space between -o and the following path.

Plug in the USB drive then turn on the …

ts=02:20 tags=[computer,linux]

Aug 05
2018

IPython Notebook on GPS Timing and CDMA

In \[1\]:
%matplotlib inline import matplotlib import numpy as np import matplotlib.pyplot as plt
# GPS Timing Carrier-phase detection is suppose to yield better timing information than tracking the pseudorandom code stream. The reason for this is supposedly that the higher frequency carrier allows for more accurate measurements of the …

Mar 23
2016

Notes on Migrating a Linux Install

Some issues I ran into when trying to move a Linux install from a 500 GB drive to a smaller 120 G SSD:

  • When duplicating the filesystem, make sure that /proc exists, otherwise when you boot the new drive the kernel will complaint that /proc is missing and it can't …

Mar 23
2016

Thinkpad Mute Buttons and Xubuntu

I recently acquired a used Lenovo X220 for use as a linux laptop, and needed the following in my openbox configuration XML to make the hardware speaker and mic mute buttons work:

  <!-- Modified for X220 -->
  <keybind key="XF86AudioMute">
    <action name="Execute">
      <command>pactl set-sink-mute 0 toggle</command>
    </action>
  </keybind>
  <keybind …

Oct 04
2015

Nov 06
2013

Aug 19
2012

Jul 21
2012

Feb 12
2012

PSA: Replacing the SuperDrive With A Hard Disk

Replacing the SuperDrive with a hard disk seems to be a Bad Idea at least in my 13" mid-2010 MBP: my 320GB WD Scorpio Blue took over a minute to copy 60MB. That is so 80s.

Interestingly, when I put the hard drive back to where it belongs, and put …

ts=13:41 tags=[computer,PSA]

Mar 26
2011

Broken OS X Installers and How to Fix Them

The installers Ralink provides for some of their chipsets, like the RT2770 does something retarded: they attempt to unload a kext in a pre-install script, and when it fails the script fails and the entire install fails.

This means on a new machine, or one that never had the kext …

Feb 25
2011

Endianess, Thy art Sneaky

I took delivery of my Linksys WUSB600N V2 today and was very excited to get my G4 Mac Mini online. The many online sources suggests all I need to do is change a few values in an Info.plist. However as it turns out, that wasn't enough. In order for …

ts=05:12 tags=[osx,computer]

Jan 27
2011

iPhone 4 layer export script for Gimp

I often mockup iPhone interfaces on the Gimp, then exporting each layer for use as backgrounds. For iOS 4 each image needs to be exported twice: one at full resolution with @2x in its filename, and once at half-resolution without the @2x. e.g. nav_bg@2x.png and nav_bg.png …

Jun 15
2010

Next → Page 1 of 2