From Hartley to Shannon
- Get link
- X
- Other Apps
Shannon asked: “How much information can a channel carry?”
He wanted a function that measures “how surprised will I be by the outcome?” For a fair coin, you’re more surprised than a coin that always lands heads. He wrote down some reasonable properties any such function should have and derived the only formula that satisfies them.
Start with the simplest case: equally likely outcomes, each with probability
Call the uncertainty . We just need to know: what properties must have?
Property 1: Monotonicity. More choices = more uncertainty. So
Property 2: Consistency.
Imagine choosing 1 letter from an alphabet of symbols. That should be equivalent to choosing letters one at a time from an alphabet of symbols. So:
This single equation forces to be a logarithm. Here’s why:
If A(s)=klogs, let’s try it:
It works! And it’s the only monotonic function that works. So entropy must be logarithmic.
Next:A(t) for any , not just perfect powers.
The trick: for any , you can always find integers such that
Taking logs:
Dividing by :
As , the gap , so is pinned down exactly. This means has no wiggle room — it’s the unique solution.
Now the harder case: what if outcomes have different probabilities ?
We use a decision tree argument. Here’s the key idea:
Imagine equally likely outcomes split into , where
Two ways to count the uncertainty:
Way 1 — All at once:
uncertainty (uniform over outcomes)
Way 2 — In two stages:
• First, pick which group you’re in: uncertainty
• Then, pick within the group: for each group , weighted by
Setting them equal:
Rearranging:
Since
we substitute
which gives
That’s Shannon entropy. The is just a constant that depends on your choice of log base (base 2 gives bits, base gives nats).
- Get link
- X
- Other Apps
Comments
Post a Comment