We open access to the tool for compiling lists of English words from films, books and articles



    Skyeng shares with Habr a link to the internal application that our methodologists use.

    We at Skyeng School are convinced that the faster a student gets a tangible effect from a lesson or training, the higher his motivation and the more effective the learning itself. The traditional methodology of learning languages ​​promises a concrete result only after a long time - a year, two, i.e. It requires investment of considerable effort, time and money without immediate effect. We believe that it is quite possible to get a “return on investment” quickly if we set ourselves small specific tasks and solve them. Today we will talk about one of our office tools, designed just for this, and give readers the opportunity to try it in practice, make their own word lists, the most interesting of which will be offered to all users of Aword!

    If you need to cook Irish stew according to the original recipe in English, the traditional school will offer you to learn 200 items of kitchen utensils and 300 items of various products. We suggest immediately learning words that are directly related to the task - i.e. found in the recipes is the Irish stew. To read professional literature, a design engineer does not have to take lessons about London from the Capital and the environment: he only needs knowledge of basic and highly specialized vocabulary.

    To solve such specific problems, we are preparing thematic word sets that users of our mobile application Aword can memorize. And to prepare these sets, we use the Wordset Generator tool, which creates an ordered list of words for memorizing from a text or a set of texts that a student wants to read.


    The result of processing the Douglas Adams book “The Hitchhiker's Guide to the Galaxy”


    Words found in 5 seasons of the game of thrones superimposed on the student’s knowledge curve. The coordinates of each point (word) are utility from the word number. On the right are the 25 words most useful to such a student from the series.

    The creation of the Wordset Generator was made possible thanks to our tools for ranking words and determining the vocabulary of a particular student (inIn one of the previous articles, we talked about why we made these tools, and did not use ready-made cases). For each word, the effective utility can be calculated: how much the study of this word will increase the coefficient of understanding of the text. With the help of the Wordset Generator, we can recommend the student to study first of all the most common words unknown to him or, conversely, the most important in his professional activity.

    Algorithm


    - A list of all words used in the text is compiled, indicating the number of occurrences.
    - Cut off (sent to a separate list) all the words that are not in our dictionary. As a rule, these are words, names, and names invented by the author.
    - The thematicity of each word in the list is determined, for which the frequency of occurrence of a word in the analyzed text is compared with the frequency of occurrence of this word in the corpus of English texts (its prevalence). The number means how many times more often the word is present in the analyzed text.

    Next, a semi-automatic adjustment of the list for specific needs is carried out (using the specified parameters or moving the sliders).

    - The level of knowledge of the student ("complexity") is set. In this case, the words with which the student is most likely already familiar are cut off.
    - Themed weights and local frequencies are selected. Thematic is important if we are preparing a list of professional terms for use at work. In the case of analysis of fiction, frequency is more important.
    - Finally, the algorithm can calculate the likelihood that a particular word in a given text is a proper name (in the web version, such words are highlighted in different intensities in red). The slider “Proper names” allows you to delete such words in accordance with a given probability; in most cases, manual intervention is required here, especially when it comes to fiction.

    Not just a car


    The Wordset Generator tool greatly facilitated the work of our content department, but, of course, did not take it upon itself. Methodists still play an important role in compiling thematic sets of words for memorization.

    First, they need to prepare a corpus of texts from which words will be extracted. If with a specific book or film this task is more or less simple, then in the case of thematic sets such as “At the airport”, a rather significant amount of information needs to be shoveled in order to collect a good representative sample: classic texts from textbooks, articles from guidebooks, airline rules, reviews in blogs (usually complaints), etc. It is important that these texts are modern and vibrant, because we want to teach students the language that Americans and British speak and write today.

    Secondly, you need to configure the correct settings for complexity, thematic and others. All this is done only by manually dragging the sliders, since it greatly depends on the purpose of the set, the level of preparation of the student, the specifics of the topic, etc.

    Thirdly, serious work with the obtained set of words is required. It is necessary to find out the exact meaning of the word in this context. In addition, often the necessary term does not consist of one word, but of several, they also need to be found and put the list in order. So, in the case of airport vocabulary, we found the word metal among the frequently encountered: in fact, it was a metal detector. Such phrases often consist of simple words that the tool discards - they must be found and returned to their place.

    Finally, you still need to choose pictures for all words - so that they correspond to the desired meaning. This is also a special person.

    Application


    The most obvious use of the Wordset Generator tool for our students is to create lists of words to memorize for specific books or films. If you analyze the text of the book, make a list of hundreds of words and teach it in a mobile application - it will be much easier to read, you do not have to climb into the dictionary every five minutes.

    Thanks to the tool, we can quickly prepare sets of words for a specific event: the presentation of the next iPhone, the football championship, a loud premiere, or some kind of media scandal. Our students can make such a request to us, and we ourselves try to track potentially demanded “perishable” topics in order to promptly offer users of the mobile application a set of words for them.



    An analysis of fiction helps methodologists prepare recommendation lists for each level of students. The fewer “difficult” words the program gives, the more accessible the text is for students in the middle of the language learning path. For high levels, such texts are not difficult and do not have educational benefits - they need to look for richer lexical works. For example, in the randomly chosen detective Agatha Christie (After the Funeral) there are less than 300 “complex” words; James Ulysses list goes back to 2000.

    The Wordset Generator tool is very useful in our work with corporate clients, who often need to study and memorize special vocabulary. So, for one of the corporate clients working in the aerospace industry, we have prepared word lists based on the analysis of dozens of articles in professional journals. It is important that in high-tech areas vocabulary is constantly updated; Using our tool and selecting the most recent materials allows you to create lists containing the most relevant terms.

    To business!


    We decided to give Habr's readers the opportunity to play with Wordset Generator on their own - here it is: http://tools.skyeng.ru/sandbox/wordset-generator/

    It is more or less intuitive, although it is worth considering that this is our internal tool, not intended for the general public, and therefore its interface is very ascetic and unkempt.

    In the open version there is a limit on the size of the text - no more than 80 thousand characters, including spaces and line breaks. Practice shows that this is the optimal value for the useful use of the tool “in everyday life”. Take what you intend to read in the near future: a couple of chapters, ten pages or several articles. You will receive a compact set that you can train in a mobile application throughout the day, and in the evening consolidate what you have learned in context (while enjoying the book along the way). For example:



    before you is the result of the parsing of the first chapter of “Hitchhiker's Guide to the Galaxy” by Adams. Compare with the screenshot at the beginning of the article, which shows the result of the analysis of the entire book with the same parameters. These words are also there, but somewhere in the third or fourth hundred, and here they are presented, as on a platter.

    The resulting words can be added to the application manually using the built-in dictionary. And Habr’s readers can create their own list of words, export it to CSV and share the link to the received file in the comments to this post. In a week we will select the most interesting sets proposed by Habr and include them in our application in the special category “Sets from Habrovsk”.

    The Aword application itself can be taken from the App Store . Soon it will be available on Google Play, and in November - in the Web version!

    Have a nice word learning!

    And traditionally we remind you that we will be happy to see valuable specialists in our team !

    Also popular now: