Tuesday, May 16, 2006

Etaoin Shrdlu

So, in the interest of refreshing my C programming skills, I've decided to do the exercises in the classic K&R book. One of these exercises was to create a histogram of character frequencies in a given input file.

I wrote the program, tested it on some files small enough to count by hand, and then ran it on the entire contents of my blog (just my original posts, not the comments).

According to the Fun With Words site: the ordering of the letters, based on "the frequency of letters as they appear in speech and writing" in English, is as follows:

etaoin shrdlu cmfgyp wbvkxj qz

(The first two chunks are an old mnemonic for typesetters. You knew the title of this post looked somehow familiar, right?)

My writing produced the following results (about 150,000 letters):

etoansirhldmucpgyfwbkvxqjz (lowercase only) etoainsrhldpmcugywfbkvjxqz (case-insensitive) etaoinshrdlucmfgypwbvkxjqz (ref. cited above)

Note how I jumps up a few positions when case is not a factor -- that's likely from the use of the first person pronoun.

But I do wonder: have I got something against U?

2 comments:

bjkeefe said...

The careful reader is now wondering where the histogram is.

Since it is only a text-based histogram, made up of asterisks, it's nothing special. If I make it into something slick-looking, I'll post a picture.

Bet you can't wait for that, huh?

Anonymous said...

http://www.missouriskies.org/rainbow/february_rainbow_2006.html

ShareThis