Learn English – all the Latin words

etymologyhistorylatin

It's often said that Latin and French each contribute about 29% of the English lexicon, with Germanic words an additional 26%. Wikipedia has a list of English words derived from Latin, however, a random spot check of many words shows that they have often come via French (I can't check 29% of all English words myself). Also, many of the words on that list are quite obscure, compared to the many French derived words that are in everyday use.

I like looking up the etymologies of words, and in my experience, it's very rare to find a word that's come directly from Latin. Some words like ad hoc don't seem fully naturalised into English. I know scientific language is very highly influenced by Latin, but I consider that to be specialised terminology, as is legal terminology which quite often uses Latin phrases.

If Latin and French really each contributed 29% of our modern lexicon, with French words not counting towards Latin words, why are Latin-only derived words so difficult to find?

Best Answer

These figures are almost always the numbers for the top N words in a corpus. The results can vary considerably depending on what corpus is used and what N is (as you can see in this paper).

"The English language is a lot more French than we thought, here’s why" (by Andreas Simons, on Medium), summarizes one of the sources Wikipedia quotes for the 29% Latin figure:

The latest research was done in 1975 by Joseph M. Williams, where he examined the 10,000 most frequently used words in English, based on a rather small sample size of corporate letters. Here are my issues with his research:

  • the research carries a bias towards French and Latin, as companies are more likely to use academic language
  • proper names were not removed, possibly diluting the results for an etymological composition
  • he used the 10 000 most common words in that corpus of letters, not really “core vocabulary”

Because of these problems, the author found numbers of his own by taking a list of the 5,000 most common English words (which will be about "85% of all words in any English source") and scraping etymology sites (mostly Etymonline) to see what languages were mentioned in the first few words of each word's etymology:

Note that sometimes a word of Latin origin will return “French” using my method. This is because Etymonline always mentions French before Latin if the word entered English through French and the word changed sufficiently from the root. A word such as “origin” (from “origo”) will therefore return French, whereas a word such as “provide” (from “providere — provideo”) will return Latin.

I'm not sure how much I trust the results, but this is the most transparent analysis I found so far — the code used to generate the numbers is linked to in the article. This code can be modified to output words it classifies as Latin. I haven't run the code myself but it looks pretty simple to make these edits. Lines 220-240 in the original Sorter.py are:

    for word in words1:
        print(word)
        origin = scrape_and_interpret(word)
        if origin == "french":
            count_french += 1
            list_french.append(word)
        elif origin == "latin":
            count_latin += 1
            list_latin.append(word)
        elif origin == "old_english":
            count_old_english += 1
            list_old_english.append(word)
        elif origin == "germanic":
            count_germanic += 1
            list_germanic.append(word)
        elif origin == "greek":
            count_greek += 1
            list_greek.append(word)
        elif origin == "other":
            list_other.append(word)
            count_other += 1

Change two lines and get this:

    for word in words1:
        #print(word) # Comment out print statement that prints all words
        origin = scrape_and_interpret(word)
        if origin == "french":
            count_french += 1
            list_french.append(word)
        elif origin == "latin":
            count_latin += 1
            print(word) # Add `print` so that it prints out words of Latin origin
            list_latin.append(word)
        elif origin == "old_english":
            count_old_english += 1
            list_old_english.append(word)
        elif origin == "germanic":
            count_germanic += 1
            list_germanic.append(word)
        elif origin == "greek":
            count_greek += 1
            list_greek.append(word)
        elif origin == "other":
            list_other.append(word)
            count_other += 1

Alternatively, if you have access to the online OED, it's pretty easy to get a list by searching for current words of Latin origin, sorted by frequency. Note that many of these words also turn up when you search instead for words of French origin, since so many words have multiple etymological influences (it would be a bit strange to count them in only one direction or the other). I'm sure most people who know at least some English will recognize the top 1000 words on said list of Latin-origin words, and most educated people will recognize at least most of the next 1000 words, if not more.