How We Talk

Ever wonder what we talk about here? I just finished a word frequency study of this site. I counted the number of pages that contained any particular word. These are the top ten frequently occurring words of at least six letters.

 10844 category
  7225 people
  6411 because
  6300 should
  6200 software
  5815 things
  5803 programming
  5796 something
  5460 really
  5349 system
The count is the number of pages upon which the words can be found. Here are all 354 words that appear on at least a thousand pages.

Interesting, eh? -- WardCunningham

Have WikiNames been filtered from this list? -- StijnSanders

No, their parts are treated as separate words so ProgrammingLanguage counts as one for Programming and one for Language.


It looks like they confirm the topic of the Wiki as PeopleProjectsAndPatterns. Structures, systems, patterns, problems all score highly.


Following the example of WikiWordStatistics, we find in this list some of the ExtremeProgramming practices:

some AntiPatterns: an observation: finally, for the SmugSmalltalkWeenies:


This was just crying out for a bit of PoemWiki. http://downlode.org/wiki/wikiwordspoetry.cgi -- EarleMartin


I've looked into BNC (British National Corpus, which is one of the largest English corpus in the world) word frequency list at http://www.itri.brighton.ac.uk/~Adam.Kilgarriff/bnc-readme.html (BrokenLink) try (http://www.natcorp.ox.ac.uk/) and chose those with at least six letters and listed the top ten:

  128393 should
  125430 people
  103003 because
   91141 between
   75588 through
   67219 become
   66894 government
   61912 system
   60607 number
   60498 however
Compare this list with the one above. (One of the interesting things to notice is that "system" is a very common word in general English written or spoken.)

-- JuneKim
Search Corpus (BNC)
See WikiStatistics, WikiWordStatistics
CategoryWikiStructure

EditText of this page (last edited September 19, 2011) or FindPage with title or text search