Etymology, naming things, and off-by-one errors
Have you ever heard of this programming joke?
- Why can't programmers tell Halloween and Christmas apart?
- It's because they think
OCT31is the same asDEC25
Explaining the joke
Always a good sign when you have to explain the joke, isn't it?
First off, in case that wasn't obvious: Halloween and Christmas are two holidays that take place on October 31st and December 25th respectively.
Next up, we have to talk about numbers. What are they and how do we write them down?
A number is a quantity: how many fingers on a typical human hand, how many days in the month of January, how
many times the word "pony" appears in the Django source code. To represent those numbers, most places in the
world typically use a base ten system (also called decimal) where a number is written down using a sequence of
one or more characters and where those characters are selected from 0123456789 (we call those
"digits"). With this system, our numbers look like this: 10, 31, 235 (that's right, I counted them 🐴).
The base 10 system is quite convenient for us humans and it's no big surprise. It's very likely that this system came to be because of the numbers of fingers most of us have (do you know what the Latin word for "finger" is? digitus). But other than that there's nothing so special about the number 10, and indeed there are times when other systems are more convenient. Another such system that's commonly used (especially with computers) is the base 8 one, also called octal (the Latin word for eight is octō, a root you might recognize from the word octopus 🐙).
If you hadn't already understood the joke, maybe you start to see what's going on by now?
I won't go into the mathematics too much here (if that's your thing, the
Wikipedia article goes pretty deep on the
subject), but for our purpose today just know that you can use Python's int() function to convert
the representation of a number (str) in any base into an actual number (int).
>>> DEC25 = int("25", base=10) # DEC for decimal = base 10
>>> OCT31 = int("31", base=8) # OCT for octal = base 8
>>> DEC25 == OCT31
True
So yeah that's it. The joke is that the number twenty five is written as 25 in the decimal
system, and as 31 in the octal system. Hilarious right? Isn't it a fun coincidence that the "dec"
in "December" looks like the "dec" in "decimal", and that the same goes for "October" and "octal". Well as it
turns out, it's no coincidence at all!
Month names
If you speak another European language there's a good chance that most english month names are familiar to you. This is because most languages in Europe (but far from all *) inherited their month names from Latin. Just for fun I used Django's translation machinery to get a list of month names in a few different languages (well except for Latin since Django doesn't ship with a Latin translation ... yet?):
English (en) |
French (fr) |
Norwegian (nn) |
Romanian (ro) |
Latin (la) |
|---|---|---|---|---|
| January | janvier | januar | Ianuarie | Ianuarius |
| February | février | februar | Februarie | Februarius |
| March | mars | mars | Martie | Martius |
| April | avril | april | Aprilie | Aprilis |
| May | mai | mai | Mai | Maius |
| June | juin | juni | Iunie | Iunius |
| July | juillet | juli | Iulie | Iulius |
| August | août | august | August | Augustus |
| September | septembre | september | Septembrie | September |
| October | octobre | oktober | Octombrie | October |
| November | novembre | november | Noiembrie | November |
| December | décembre | desember | Decembrie | December |
A 2000 year old off-by-one error
Alright, so a bunch of European languages use basically the same sets of month names. If you're like me, you might find this interesting but I can also understand that it's not exactly mind-blowing. But look at the month names starting from September. Do you notice a weird pattern? Maybe it helps if I show you the numbers from 1 to 10 in Latin?
- ūnus
- duo
- trēs
- quattuor
- quīnque
- sex
- septem
- octō
- novem
- decem
Did that help? No? Well how about if I put those side-by-side with the months?
| January | Ianuarius | |
| February | Februarius | |
| March | Martius | ūnus |
| April | Aprilis | duo |
| May | Maius | trēs |
| June | Iunius | quattuor |
| July | Iulius | quīnque |
| August | Augustus | sex |
| September | September | septem |
| October | October | octō |
| November | November | novem |
| December | December | decem |
Remember the joke that started this post? How "October" looked like "octal" and "December" like "decimal"? Well as it turns out that wasn't a coincidence at all. The name "September" literally means "month number 7", "October" is 8, "November" is 9 and "December" 10. To put it another way, the name of the 9th month of the calendar is "month #7", the 10th is #8, the 11th #9 and the 12th is #10. Lovely isn't it?
What happened was that the old Roman calendar used to have 10 months, with March being the first one. January and February were added later by a guy called Julius Caesar who even got a month named after him (and who ironically would probably not call March his number 1 month). But when they added those new months they didn't really bother reindexing the names and so they kept them even though they now were off by two.
It's often said that naming things and off-by-one errors are one of the hardest challenges with programming. Well here we have a great example where a naming issue left to an off-by-two error that's been in production for more than 2000 years. It puts things in perspective doesn't it?