How many English words do you know?

    Assessing the number of learned and memorized words of a foreign language is primarily interesting for understanding how far a person has advanced in the “passive” perception of information: texts, speech, films, etc. I propose to get acquainted with several methods that I used, found on the network and "self-made". Below is a couple of tests for assessing vocabulary, a technique for finding important words that have not yet been caught in the brain, a few arguments and a few links.


    Online tests


    Of the many tests for evaluating the number of words, I liked two. A couple of years ago I met a fairly simple Test Your Vocabulary . Going through three screens with words, you tick off those that (it seems to you) you know, after which you get an estimate of the total number of words learned. Many of my friends complained about his inadequacy - they received less than "the one about whom I know for sure that he knows worse." But there may be a different kind of error during the passage - it seems that you know the word, but actually already forgot it. They say that the hand itself reaches out to put a tick next to a word that seems vaguely familiar, so you can subconsciously overestimate your overall rating.

    Another interesting test is my.vocabularysize.comfrom New Zealand University of Victoria at Wellington. You can even choose the Russian interface. After 140 questions to choose one of 4-5 definitions, an assessment of your vocabulary is given. There is also a test for knowledge of parts of words.
    Test authors refer to the pdf 2 articles 1990 and 2006, years in which to describe the so-called lists of words-relatives (word-family lists).

    Your results

    You know at least 10,500 English word families!

    What do my results mean?

    In general, there is no minimum vocabulary size. Language ability is related to vocabulary size, so the more words you know, the more you will be able to understand. However, if you want to set a learning goal, Paul Nation's (2006) research suggests that the following sizes might be useful:

    How large a vocabulary is needed for reading and listening?
    Skill Size estimate Notes
    Reading 8,000 - 9,000 word families Nation (2006)
    Listening 6,000 - 7,000 word families Nation (2006)
    Native speaker 20,000 word families Goulden, Nation, & Read (1990)
    Zechmeister, Chronis, Cull, D'Anna, & Healy (1995)

    What is a word family?

    There are many different forms of a word, so this test measures your knowledge of the most basic form of a word and assumes that you can recognize the other forms. For example, nation, a noun, can also be an adjective (national), a verb (nationalize), or an adverb (nationally). There are also forms which can be made with an affix such as de- or -ing which also modify the way that the word is used or adds to the basic meaning. For a test of receptive vocabulary knowledge such as this one, word families are considered to be the most accurate way of counting words.


    Frequency Dictionaries


    After registering at www.wordfrequency.info, you can download the exelnik of the frequency dictionary of American English. There is also a text option .

    Like this:

    Rank Word Part of speech Frequency Dispersion

    1 the - a 22038615 0.98
    2 be - v 12545825 0.97
    3 and - c 10741073 0.99
    4 of - i 10343885 0.97
    5 a - a 10144200 0.98
    6 in - i 6996437 0.98
    7 to - t 6332195 0.98
    8 have - v 4303955 0.97
    ...
    ...
    4996 immigrant - j 0.97
    4997 kid - v 5094 0.92
    4998 middle-class - j 5025 0.93
    4999 apology - n 4972 0.94
    5000 till - i 5079 0.92

    The file contains 5,000 English words sorted by frequency of occurrence. The frequency was counted on a huge heterogeneous array of English texts. Recently, I saw how my friend was looking for words unknown to him, checking his vocabulary. Looking through the first 500, I did not find the unknown. He showed an extract in his smartphone - about a dozen words from the second thousand (that is, from 1000 to 2000) and about 20 from the third. It's funny that, walking along the list, you come across sequences of words that successfully add up in phrases or even short sentences. The logic is very simple - if the word is very common in statistics, and you do not know it, then it is better to learn and see examples of use.

    After reading the list of words unknown to him (already with a translation), I saw the following thing. I knew about 50-60% of these words unknown to him, but some of the meanings of the translations written there were unknown to me, there were several words that were completely unknown to me.
    In general, the site is trying to be commercial, they sell lists of more than 5,000 in length, but this is not so interesting.

    So far, this friend of mine is writing a program with a convenient interface for searching for unknown words - for training purposes. I suggested that he not use this list, but thinned out for global assessment: every seventh word from the total list of 60,000 words is listed. In fact, even watching the first couple of thousands is discouraging, not everyone will get to 5000. Although I don’t presume to say for all 100, the thinned dictionary will probably show at least one word from the “family”, and the time will be spent 7 or 10 times less, respectively (depending on the frequency of thinning).
    By the way, such frequency dictionaries of the Russian language contain about 160 thousand words, including abbreviations and abbreviations. There are several different similar “corpuses” of English words from different organizations.

    I am interested in another question: how accurate are the tests that give an estimate of the number of words you know? It is possible that this could be determined just by checking the frequency dictionary, as well as by comparing the list of selected unknown words — their number and occurrence in different “families”.

    There are general laws of remembering and forgetting. One of the main things: if a person has learned something and does not repeat, does not use it - the information is forgotten exponentially from time to time. On the other hand, several repetitions lengthens, stretches the falling exponent to an acceptable level. I was very surprised when a friend who worked part-time as a tutoring for schoolchildren said that there is a sequencetime intervals for deep memorization: say, after 20 minutes, then after 8 hours, another day, etc., after which the information is firmly planted in the brain. That is, a statistically maximum level of the excitation signal is provided in the brain when it encounters this information.

    image
    The Ebbinghaus Curve, from Wikipedia.

    How I taught words at the institute.


    Without taking into account the standard course, where the requirements for the first three years were rather stringent, I tried to read fiction. The first big book was Conan Doyle's old Soviet edition of The Lost World. I don’t know how much it was adapted, but there were a lot of Victorian words and expressions in the text, and this dragged on the progress to the end ... Of course, I could look into Lingvo from my computer, but I didn’t like to read from the computer, but to run back and forth for every new word quickly fed up. Tablets weren’t common then, a pocket electronic translator is an expensive rarity, so I developed a paper system for myself. In a thick 96-sheet notebook, the spread was divided into 6 columns. Now I tried to find a notebook - I was lost. Will have to describe in words. Divided the alphabet into groups of letters, for example - a..d, e..f, g..j, k..n, o..q, r..t, u .. w, x..z. Approximately, by eye I estimated the statistical percentage of words that begin with these letters and divided the columns in the spread into rectangles. For example, the group a..d gave 2/3 of the first column, and so on. The x..z group was assigned the last remaining smallest piece in column 6. Then everything is simple. Met an unknown word - enter with the translation in the desired rectangle. Nothing inside the block is alphabetical - it’s not long to find. To get a translation while lying on the bed, you need to get into the book dictionary. That is, the value of receiving a translation is quite large, more than now look into Lingua or an online translator like z was the last remaining smallest piece in column 6. Then everything is simple. Met an unknown word - enter with the translation in the desired rectangle. Nothing inside the block is alphabetical - it’s not long to find. To get a translation while lying on the bed, you need to get into the book dictionary. That is, the value of receiving a translation is quite large, more than now look into Lingua or an online translator like z was the last remaining smallest piece in column 6. Then everything is simple. Met an unknown word - enter with the translation in the desired rectangle. Nothing inside the block is alphabetical - it’s not long to find. To get a translation while lying on the bed, you need to get into the book dictionary. That is, the value of receiving a translation is quite large, more than now look into Lingua or an online translator likemultitran . Enter with a pen too long. But the brain remembers better, because it’s not very pleasant to break away from an interesting plot and climb into the dictionary. Sometimes the word appeared again in the notebook at the next spread, and then after two more, it’s just a cost. In the course of reading in the notebook had to climb less and less. Then it turned out that one can guess from the context the meaning of a very considerable percentage of new words.

    It will be interesting to listen to what other approaches are there. In my opinion, the best way is a long full immersion in the environment, but it is not available to everyone.

    Interesting links


    The British BNC word base
    Learn English with Anki
    From the lingvo forum
    Lexiconer review Russian
    dictionaries for download

    UPD It turns out that there was a big interesting article on vocabulary on the hub that I did not notice

    Also popular now: