(mirrored from my blog)
This post is a little late for Pi Day, but it’s never a bad time for discourse related to everyone’s favourite mathematical constant. Twas on Pi Day of this year that I somehow came across this site, which describes the Constrained Writing task of Pilish, in which the length of each word in letters corresponds to the digits of pi:
The first word in this sentence has 3 letters, the next word 1 letter, the next word 4 letters, and so on, following the first fifteen digits of the number π. A longer example is this poem with ABAB rhyme scheme from Joseph Shipley’s 1960 book Playing With Words:
But a time I spent wandering in gloomy night;
Yon tower, tinkling chimewise, loftily opportune.
Out, up, and together came sudden to Sunday rite,
The one solemnly off to correct plenilune.
Trying to write under such constraints can feel extremely awkward, but this made me wonder: How often would strings of words adhering to the constraints of Standard Pilish occur unintentionally? Afterall, with the amount of text out there – the sheer rate at which words are being put together by people all over the world every second of every day – it is to be expected that these things should occur with some frequency p > 0. Such is the Law of Large Numbers.
In order to determine this, I would need a large data set. Luckily, such things are readily available. I settled upon the Project Gutenberg ebook catalog – specifically the union of the July 2006 DVD (17,000 books) and the March 2007 Science Fiction Bookshelf CD (most of PG’s Sci-Fi titles). Altogether, this gave me almost 9GB of text (although I later discovered this contained many duplicates, it’s still a hell of alot of words!)
Next I hacked together a small python script which would find, for each file, the longest string of Standard Pilish. Code for this can be checkedout from my SVN repository: http://svn.nfitz.net/pilish
Somewhat disappointingly, the longest of any Pilish string was 8 digits of pi. The vast majority of books had a longest Pilish string of around 3-5 words. See the histogram below (note the logarithmic scale in the y-axis).
Five books achieved this 8-digit benchmark, listed below, with the section of Pilish text bolded:
- The Winning of Barbara Worth, by Harold B Wright (1872-1944)
Dismounting and throwing the reins over his horse’s head he came to her smiling, sombrero in hand. “Buenas dias, Senorita. Please may I have a drink?”
“Certainly, Mr. Holmes ; help yourself.” She pointed to the olla hanging in the shade of the ramada.
- Humphrey Bold- A Story of the Times of Benbow, by Herbert Strang
I was weary of the humdrum life of idling on shore or aimless sailing up and down the channel. The admiral’s was a peaceful mission, and no fighting was expected, but I felt a great curiosity to behold new scenes.
- Captain Cook’s Journal During the First Voyage Round the World, by James Cook (1728-1779)
And I have a great Objection to firing with powder only amongst People who know not the difference, for by this they would learn to despise fire Arms and think their own Arms superior, and if ever such an Opinion prevailed they would certainly attack you, the Event of which might prove as unfavourable to you as them.
- Lectures on Modern History, by Baron John Emerich Edward Dalberg Acton (1834-1902)
One was part of the empire, the other was enclosed in Poland, and they were separated by Polish territory. They did not help each other, and each was a source of danger for the other. They could only hope to exist by becoming stronger. That has been, for two centuries and a half, a fixed tradition at Berlin with the rulers and the people. They could not help being aggressive, and they worshipped the authority that could make them successful aggressors.
- Anne Bradstreet and Her Time, by Helen Campbell (1839-1918)
With the most ambitious of the longer poems–”The Four Monarchies”– and one from which her readers of that day probably derived the most satisfaction, we need not feel compelled to linger. To them its charm lay in its usefulness. There were on sinful fancies; no trifling waste of words, but a good, straightforward narrative of things it was well to know, and Tyler’s comment upon it will be echoed by every one who turns the appallingly matter-of-fact pages…
That last one is the only of the five to have one word of double-digit length, thus covering two digits of pi (‘straightforward = 15 letters = ’15′).
I would like to do a similar analysis of an even larger dataset of more modern language. One possibility is a full archive of Wikipedia. I wonder what is the longest string of unintentional Pilish ever produced?
Another interesting question is how the maximum length of Pilish sections in a document scales with the length of the document, and how well this can be modelled with a simple statistical model such as a Markov Chain.