Notes

Chapter 8: Implications for Everyday Systems

Section 8: Financial Systems


Zipf's law

To a fairly good approximation the nth most common word in a large sample of English text occurs with frequency 1/n, as illustrated in the first picture below. This fact was first noticed around the end of the 1800s, and was attributed in 1949 by George Zipf to a general, though vague, Principle of Least Effort for human behavior. I suspect that in fact the law has a rather simple probabilistic origin. Consider generating a long piece of text by picking at random from k letters and a space. Now collect and rank all the "words" delimited by spaces that are formed. When k = 1, the nth most common word will have frequency c-n. But when k 2, it turns out that the nth most common word will have a frequency that approximates c/n. If all k letters have equal probabilities, there will be many words with equal frequency, so the distribution will contain steps, as in the second picture below. If the k letters have non-commensurate probabilities, then a smooth distribution is obtained, as in the third picture. If all letter probabilities are equal, then words will simply be ranked by length, with all km words of length m occurring with frequency pm. The normalization of probabilities then implies p = 1/(2k), and since the word at rank roughly km then has probability 1/(2k)m, Zipf's law follows.



Image Source Notebooks:

From Stephen Wolfram: A New Kind of Science [citation]  

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy