| Comparing Letter Frequency in Text versus Dictionary List of Words For US English | |||||
| by Tom Zurinskas | |||||
| Data are from the 15.4M words counted for the top 5k words of | |||||
| English by Collins Cobuild - about 90% of words in typical text pages | |||||
| Frequency of Letters for the top 5,000 words of US English compared to 15.4M word count ot same words in print | |||||
| (counting word popularity) | |||||
| Results | |||||
| 1 | Letter E is mostly popular in both the dictionay list and count of printed words | ||||
| 2 | Letter H has the highest differential between print and list frequencies. | ||||
| 3 | Most frequent word "the" had 1M instances for g a big impact on T, H, and E | ||||
| Letters in | Letters in | P minus L | |||
| rank P | Print (P) | % | Dict List (L) | rank L | rank diff |
| 1 | E | 12.70% | 11.00% | 1 | 0 |
| 2 | T | 9.10% | 6.70% | 7 | 5 |
| 3 | A | 8.20% | 7.80% | 4 | 1 |
| 4 | O | 7.50% | 6.10% | 8 | 4 |
| 5 | I | 7.00% | 8.60% | 3 | -2 |
| 6 | N | 6.70% | 7.20% | 6 | 0 |
| 7 | S | 6.30% | 8.70% | 2 | -5 |
| 8 | H | 6.10% | 2.30% | 16 | 8 |
| 9 | R | 6.00% | 7.30% | 5 | -4 |
| 10 | D | 4.30% | 3.80% | 11 | 1 |
| 11 | L | 4.00% | 5.30% | 9 | -2 |
| 12 | C | 2.80% | 4.00% | 10 | -2 |
| 13 | U | 2.80% | 3.30% | 12 | -1 |
| 14 | M | 2.40% | 2.70% | 15 | 1 |
| 15 | W | 2.40% | 0.91% | 22 | 7 |
| 16 | F | 2.20% | 1.40% | 19 | 3 |
| 17 | G | 2.00% | 3.00% | 13 | -4 |
| 18 | Y | 2.00% | 1.60% | 18 | 0 |
| 19 | P | 1.90% | 2.80% | 14 | -5 |
| 20 | B | 1.50% | 2.00% | 17 | -3 |
| 21 | V | 0.98% | 1.00% | 20 | -1 |
| 22 | K | 0.77% | 0.97% | 21 | -1 |
| 23 | J | 0.15% | 0.21% | 25 | 2 |
| 24 | X | 0.15% | 0.27% | 24 | 0 |
| 25 | Q | 0.10% | 0.19% | 26 | 1 |
| 26 | Z | 0.07% | 0.44% | 23 | -3 |
| ~100% | ~100% | ||||