From Hartley to Shannon

- March 08, 2026

Shannon asked: “How much information can a channel carry?”

He wanted a function that measures “how surprised will I be by the outcome?” For a fair coin, you’re more surprised than a coin that always lands heads. He wrote down some reasonable properties any such function should have and derived the only formula that satisfies them.

Start with the simplest case: equally likely outcomes, each with probability $\frac{1}{s.}$

Call the uncertainty $A (s)$ . We just need to know: what properties must $A (s)$ have?

Property 1: Monotonicity. More choices = more uncertainty. So

A(s_1) < A(s_2) \quad \text{if} \quad s_1 < s_2

Property 2: Consistency.

Imagine choosing 1 letter from an alphabet of $s^m$ symbols. That should be equivalent to choosing $m$ letters one at a time from an alphabet of $s$ symbols. So:

A(s^m) = m \cdot A(s)

This single equation forces $A(s)$ to be a logarithm. Here’s why:

If $A(s) = k \log s$ , let’s try it:

A (s^{m}) = k \log (s^{m}) = k m \log s = m \cdot k \log s = m \cdot A (s)

It works! And it’s the only monotonic function that works. So entropy must be logarithmic.

The goal is to prove that A(t) = k log t works for any real number t, not just numbers like 4, 8, 27 that happen to be perfect powers of some base.

$A(t) = k\log t$ for any $t$ , not just perfect powers.

The trick: for any $s, t$ , you can always find integers $m, n$ such that

s^m \leq t^n < s^{m+1}

Taking logs:

m \log s \leq n \log t < (m+1) \log s

Dividing by $nlogs$ :

\frac{m}{n} \leq \frac{\log t}{\log s} < \frac{m}{n} + \frac{1}{n}

As $n \to \infty$ , the gap $\frac{1}{n} \to 0$ , so $\frac{\log t}{\log s}$ is pinned down exactly. This means $A (t)$ has no wiggle room — it’s the unique solution.

Now the harder case: what if outcomes have different probabilities $p_i$ ?

We use a decision tree argument. Here’s the key idea:

Imagine $n$ equally likely outcomes split into $k groups of sizes$ $n_1, n_2, \dots, n_k$ , where

n_{1} + n_{2} + \dots + n_{k} = n

Two ways to count the uncertainty:

Way 1 — All at once:
uncertainty $= k \log n$ (uniform over $n$ outcomes)

Way 2 — In two stages:

• First, pick which group you’re in: uncertainty $= H (p_{1}, \dots, p_{k})$

• Then, pick within the group: $k \log n_i$ for each group $i$ , weighted by $p_{i}$

Setting them equal:

k \log n = H(p_1, \ldots, p_k) + k \sum p_i \log n_i

Rearranging:

H(p_1, \ldots, p_k) = k \log n - k \sum p_i \log n_i

Since

p_i = \frac{n_i}{n}

we substitute

\log n_i = \log(p_i n) = \log p_i + \log n

which gives

H (p_{1}, \dots, p_{k}) = - k \sum p_{i} \log p_{i​}

That’s Shannon entropy. The $k$ is just a constant that depends on your choice of log base (base 2 gives bits, base $e$ gives nats).

Search This Blog

Ashika Jayanthy’s Blog

From Hartley to Shannon

Comments

Post a Comment

Popular posts from this blog

Why Information is Logarithmic: Hartley’s 1928 Insight

An interview with a lawyer on Public Policy and Law

my family! Guest post by 7yo niece Part III