Here’s why rot13 text looks so cool.

To avoid spoilers, I posted some text in rot13:

V yvxrq gung ovg arne gur ortvaavat jurer Qnavry Penvt gnyxrq nobhg tbvat gb gur raq bs gur envaobj jurer gurer vf gehgu, naq gura jnvgvat sbe gur riragf bs gur fgbel gb trg gurer. Guvf frrzf gb zr gb qrfpevor n ybg bs jung erfrnepu srryf yvxr, naq vg nyfb svg va jvgu gur fcrpvsvpf bs gur fgbel fhpu nf uvf abgvpvat gur oybbq ba Nan qr Neznf’f fubr. Nyfb gur ovg ng gur raq jurer ur fnlf gung fur jba ol abg cynlvat gurve tnzr. Nyfb V yvxrq gur srry bs gur raqvat, ubj lrf fur tbg gur zbarl ohg vg jnfa’g cerfragrq nf fbzr sha pncre gung fur chyyrq bss. Gurer jnf fnqarff, cnegyl sebz Rqv Cnggrefba qlvat naq nyfb orpnhfr vg’f abg pyrne jung jvyy unccra gb Nan’f zbgure, ohg nyfb vg jnf whfg pbafvfgrag jvgu gur gurzr bs gur zbivr gung gur fvkgl zvyyvba qbyyne sbeghar jnf zber bs n ubg cbgngb guna n znthssva.

It looked really cool, like Elvish runes or something.

Then this made me wonder: why does it look cool? Why “cool” rather than “gibberish”?

I came up with a theory, which goes like this. The rot13 text looks like human language, because it has the punctuation, spacing, and word length of English. It also has the letter frequency distribution of English. So it doesn’t just look like a bunch of random letters. So it’s a language. But it’s no language you’ve ever seen before.

“Srry bs gur raqvat,” indeed.

25 thoughts on “Here’s why rot13 text looks so cool.

  1. “But it’s no language you’ve ever seen before.”

    It is not a new language. It is just English written with an alternative alphabet. The Japanese, with their three scripts for writing the language, often mashed up in the same sentence, are a lot more used to this notion than we are.

  2. This is the same reason that playing music backwards or upside down (e.g., https://dantepfer.com/bachupsidedown/) sounds really cool. There’s a lot of built-in structure to a well-written piece of music that comes along for the ride when you transform it; it gets turned into something much less recognizable and semantically meaningful on the surface, but the listener can tell that some sort of crafted structure is still present.

  3. It also has about the right frequency of vowels for some reason. If an English word doesn’t have a vowel, it looks like gibberish. This is luck. The letters that map to vowels could have all been low-frequency letters like z and q.

  4. You listed frequency as ‘also’ but it seems to be toward the beginning of the causal chain. If the substitutions are same distributed, then they will replicate the same word length if you include space as a letter, which certainly is true in language. When you do that, you can develop punctuation the same way; it’s multiple letters, so commas have a frequency. It’s a neat example of ordinal and cardinal counting, with the connection to the higher order meanings of speech to a first order simulation. If you add layers of the distribution of letters, you approach contextual speech simulation, like restrict the frequencies to technical language. I love this stuff.

    It also looks like what I must sound like to the cat: a whole bunch of noise, which he tunes in and out based on tone and occasional recognized bits.

    My first thought was this is what Trump talks like: a simulation of English.

    • ” If the substitutions are same distributed, then they will replicate the same word length if you include space as a letter, which certainly is true in language. When you do that, you can develop punctuation the same way; it’s multiple letters, so commas have a frequency.”

      ROT13 only encodes letters though so word length, position of spaces, and punctuation are all unchanged.

  5. I don’t see that letter-frequency is very noticeable at all. Scanning a hundred or so letters gives the reader little feel for relative frequencies. On the other hand, the presence of a vowel in a word is very noticeable. Perhaps the intuition is more precisely stated by saying that the letter-frequencies of _some specific letters_ match the letter frequencies of those letters in real English. A lot of r’s, and g’s, and f’s.

    It is also very noticeable that there are a lot of y’s. Which helps because y can count as a vowel. But, there are way too many y’s. This makes the text look Icelandic or Viking or some other cold, northern, horn-helmeted, 1000-year-old-script. Kind of Beowulfish.

    The q’s without u’s also slap the reader in the face and make the text look antique. Even a single q without a u is such a strong violation of the rules of English that the reader quickly realizes the text is not modern English. This reinforces the Beowulf interpretation because dropping u’s is just the sort of thing savages that put horns on their helmets would do.

  6. This post makes me ask questions about English that I never thought of before. Why does the alphabet need to have an order? Why this particular order? Did English always have an order or did someone invent it? It seems possible to learn English without knowledge of the alphabetical order; are there such people? Linguists on this blog – please help!

    • P.S. My mother tongue is character-based, and there is no natural ordering of characters or even parts of characters. There is a forced ordering by number of strokes for a dictionary lookup but that order has no relevance beyond that.

    • I doesn’t have to *need* an order but the order is highly beneficial. Well worth the few days it takes in grade 1 or whatever to learn it.

      Hey! Kaiser: Do you know the secret word in the alphabet song? I’m pretty sure most Americans know what I’m talking about and maybe some Canadians but I don’t think an Englishman would know. I think you have to sing the song in “American” to find the secret word.

    • The English alphabet and alphabetic order is a direct derivative of the Latin alphabet and alphabetic order, which is a direct derivative of the Greek (and Greek-derived Etruscan) alphabet and alphabetic order, which is a direct derivative of the Phoenician alphabet and northern Phoenician alphabetic order. (The Phoenician alphabet then derives from the poorly-known Proto-Sinaitic alphabet, which was a wholesale taking and adaptation of phonetic indicator marks from Egyptian hieroglyphics.)

      It’s not clear when or how alphabetic order was first invented, but there are two different alphabetic orders that are attested to in examples of Phoenician (ABGDE, associated with the north, and HMĦLQ associated with the south), both of which survived in some descendant alphabets (Greek, Latin, Cyrillic, and Hebrew all keeping fairly close to the northern order, Ethiopian’s Ge’ez being the most notable example of one that kept the southern order). Several descendant alphabets lost this original notion of alphabetic ordering, generally developing a new standard ordering for teaching people to read/write based on either similarities of shape (Arabic) or pronunciation (Brahmi).

  7. It also has familiar letter n-gram and word n-gram frequency distributions.

    Can we concoct a code that likewise preserves the letter n-gram distribution, but not the words? Maybe preserve word-length n-grams while we’re at it, esp. that big words occur among smaller ones.

    And how would that look?

Leave a Reply to Jukka Cancel reply

Your email address will not be published. Required fields are marked *