Journey's End

May 24
2025

Thoughts on entropy, compression and predictors

I will prefix this post by saying that none of the ideas here are new and its all well covered ground by now. Like much of this blog, I am mostly writing to clarify ideas and to serve as a reference for the future-me.

Entropy and Prediction

Entropy as considered in information theory, can be thought of as the number of possible values the each symbol can take on. A stream of perfectly random bits has maximal entropy because each symbol is equally likely to be a 1 or 0. If however we impose rules on such a binary stream, e.g. after 3x1s there must be a 0, then after a run of 3x1s, the next symbol must be a 0, and so the entropy of such a stream is reduced. This change in entropy can be considered information.

Viewed through this lens, entropy and our ability to predict the next symbol, based on what has come before, are related concepts. If we can perfectly predict the next symbol every time, then the entropy of the a data stream is the entropy of the predictor, i.e. how many bits do we need to make the predictions.

Compression and Predictors

To compress a data stream is to reduce the number of bits required to represent towards the limit dictated by the entropy of the data stream. This can be done by looking for repeated symbols and using more efficient encoding for those symbols as is typically done with Huffman codes. A further refinement of this technique is to use a predictor to predict the next symbol and encode the difference between the actual symbol that occurs and the predicted symbol.

To see why this is an improvement, consider a perfect predictor - the only symbol we would need to encode is 0 which makes for very high compression ratios. In practice this rarely happens, but predictors do generally reduce the number of possible symbols. Consider the sequence [10, 9, 8, 5, 3, 1]. This has 6 unique symbols we need to compress. However if we use a simple predictor that says "the next symbol is the same as the current symbol", then the difference is [10, -1, -1, -3, -2, -2], where we assume the first prediction is 0. This sequence of difference from prediction only uses 4 symbols and is easier to compress.

The use of predictors allows us to exploit correlation between symbols to achieve better compressions than simply using more efficient coding. Predictors can be continuously applied with a different predictor each time, to potentially exploit high levels of correlation. The use of predictors can be said to "decorrelate" the data because the ability to predict the next symbol in difference-from-prediction data stream is less than before. e.g. if we were apply the same predictor as before to [10, -1, -1, -3, -2, -2] we get [10, -11, 0, -2, 1, 0], and now we have 5 symbols instead of the 4 we started with, i.e. use of the predictor was detrimental. Where as the original data had elements that correlated to each other by difference of 1 or 2, the difference-from-prediction does not have this property. This also illustrates that the choice of predictor matters as a bad predictor can make things worse by trying to force correlations where non-exists.

Predictors can be specified ahead of time, or "trained" on the data and transmitted along with the compressed data so the receiver can decompress it.

ts=13:35 tags=[computer,science]

May 24
2025

Adventures in Compressed DNG

I have been working with industrial cameras and frame-grabbers that simply gives me a dump of their sensor data. To make that data useful it needs to undergo demosaicing, ideally using one of the modern demosaic algorithms like RCD rather than simple interpolation or the rather outdated VNG. Unfortunately the …

Apr 10
2025

Python One-liner for Local QR Code Decoding

Because KeepassXC lacks the ability to decode QR code for TOTP setup and not all MFA implementations show the required secret in text format, I use the following snipet to read the QR code information locally:

  python -c 'from cv2 import QRCodeDetector; from PIL.ImageGrab import grabclipboard; import numpy as …
ts=03:07 tags=[python,software,oss]

Aug 11
2024

PIO and Interrupts

Introduction

I want to be able to raise system level interrupts from PIO programs and there aren’t too many examples of this, so here is my simple contribution.

PIO program

First, the PIO program, which simply waits for a pin to change and then immediately raises PIO IRQ 0 …

Aug 11
2024

PIO ASM and PlatformIO

Introduction

PIO

The RP2040 has two programmable IO (PIO) blocks, each capable of executing 4 programs in parallel. This allows the RP2040 great flexibility in mix and matching different protocols. Want 3x CANBUS and 1xSPI? RP2040 got you. Want 5xI2C instead? That’s cool too. It is a very flexible …

ts=03:25

Jan 23
2024

Estimating a Supernova's Brightness as an Amateur

Introduction

2023rve is the designation given to a Type II supernova discovered on 2023-09-08 by Mohammad Odeh from UAE. It appears to be part of NGC 1097, a galaxy 45 million light years away. Due to it’s brightness, 2023rve was visible to budget amateur astronomers/astrophotographers like myself.

Below …

ts=12:11 tags=[astronomy]

Dec 23
2023

On NS*DateFormatters

There now exists NSDateFormatter and NSISO8601DateFormatter and their behaviour when a timezone is not set is not the same: NSDateFormatter produces strings formatted for local time while NSISO8601DateFormatter produces string formatted for UTC.

As an example, given a date-time of 2000-12-31T23:00:00Z

NSDateFormatter with no timezone set and full …

ts=01:16 tags=[objective-c,code]

Oct 27
2023

DIY EQMod Direct USB Cable

I am in the process of resurrecting my Eq5 Pro so it is compatible with ASIAir Mini. The first step is to build an EQMod Direct USB cable. While it is possible to buy one pre-made, since I had all the parts I decided to build it myself.

Based on …

Sep 17
2023

Build Log: SCD41 CO2 Sensor

CO2 concentration is a proxy for how well a space, especially an enclosed one, is ventillated. To this end I build a quick-n-dirty CO2 sensor using a SCD41 breakout board from pimoroni and bits and pieces I had in the workshop.

Took the opportunity to practice making enclosures, especially for …

Jul 28
2023

Mar 28
2023

Unexpected Latency in Tinyproxy

I love tinyproxy, it is simple to setup and more than enough for my needs. Recently however I noticed that some clients were experiencing unexpected long delays when doing something as simple as curl google.com. After some sleuthing, I tracked it down to the fact I had

LogLevel Info …
ts=20:58 tags=[software,oss]

Mar 28
2023

Conda and Slow Proxies

The default conda connect timeout is 9.15 seconds (no idea why this is, seems to have been the case from day one). If the proxy is slow, e.g. it is in another country or AWS region, this timeout can be triggered and conda will prematurely close the connection …

ts=19:39 tags=[software,oss]

Feb 04
2023

Jan 10
2023

Verifying Local iOS Backups

Backups are a great idea, and encrypted backups are a even better idea. When backing up iOS devices on macOS, it is possible to set an encryption key to protect local files. There is however no straightforward GUI method to verify the encryption key.

Enter mvt:

conda create -n mvt …
ts=14:40 tags=[software,osx,ios,macos]

Dec 26
2022

Dec 20
2022

Exif Tag Types

Went on a journey trying to figure out what valid exif tag types are, in both JPEG and TIFF files. In summary:

TYPE # Name Exif 2.32 Tiff 6.0 Adobe TN[1]
1 BYTE X
X
2 ASCII X X
3 SHORT X X
4 LONG
X X
5 …
ts=04:37 tags=[software]

Dec 20
2022

FSCK'ing Veracrypt Volumes on macOS

Due to unexpected power loss or disconnection of the underlying storage, the filesystem inside Veracrypt volumes is desired to prevent future data loss.

On macOS I tried using 

diskutil verifyDisk /dev/diskXXX

This didn't  work for some reason, most likely because of some interaction with veracrypt. Same results via Disk …

ts=04:33 tags=[software,macos]

Jul 30
2022

TigerVNC and Systemd

A quick systemd service definition file for tigervnc on Ubuntu 20.04

[Unit]
Description=tigervnc

[Service]
Type=forking
ExecStart=/usr/bin/vncserver -localhost no -depth 24 -geometry 1920x1080 -noreset
User=steve

[Install]
WantedBy=multi-user.target

Key components:

  1. Type=forking so systemd correctly identifies the real VNC process, not the perl …
ts=01:51 tags=[software,linux,oss]

Jul 21
2022

Using SubShapeBinders and Assembly4

Assembly4 and SubShapeBinders make it possible to assemble Parts, then using their position in the assembly to generate new Parts. However because new Parts depend on Parts in the assembly, if you try to add them to the assembly FreeCAD 0.19 will complain about cyclic redundancy, which makes sense …

Jul 21
2022

Kopia: Handling Directories That No Longer Exist

Scenario

  1. There exists a policy that snapshots directory ABC 
  2. Directory ABC now no longer exists
  3. You want to keep existing snapshots
  4. You want to stop kopia from attempting to take any more snapshots

In order to achieve (4):

  1. Set the policy to manual only
  2. Disable inheritance from parent/global policy …
Next → Page 1 of 27