Linguistic aspects of what3words and technical analysis of dictionaries

    Thank you for your attention and comments on our first welcome post on Habré! Your reaction has helped to identify the most interesting issues that we will address in future publications.

    As you rightly noted in the comments, despite the fact that the use of words instead of numbers has a number of undeniable advantages, this approach has nuances that must be taken into account. Robert Barr, a professor at the University of Manchester, conducted a technical analysis of what3words and our dictionaries. Below we present the results of its independent evaluation:

    While what3words seems like a random collection of words, it was carefully designed to achieve specific goals.

    • 40 thousand words of the English dictionary used for w3w addresses is enough to index all 3-meter by 3-meter squares with three-word combinations.
    • Each of the 40 thousand words can be used in each of the three positions of the w3w address, which allows words to be repeated occasionally.
    • In other languages, in addition to English, 25 thousand words are involved, which are enough to cover all the land with their combinations. English is the only language from which 40 thousand words are involved, which allows you to cover both the ocean and the land. The practical consequences of this decision are that if you set the Portuguese language in the settings, you will receive combinations of three Portuguese words until you translate the mark into the sea (probably a few hundred meters from the coast), after which the address will be displayed on English language.

    • Dictionaries are optimized so that the "best" words are used for addresses in those areas in which they are likely to use native speakers of a particular language. The “best” words are short words that are most common in the language. The balance during spreading combinations around the world is achieved using two independent ranking systems:

    1. The best words are given to the most densely populated (urban) areas. The following category of words is used for addresses in rural areas, and the least good words are used for seas.
    2. In countries for which a particular language is native or common, the best words from the dictionary of that language are used for addresses. For example, the best words of the French version of w3w are primarily used in France, Senegal and Cameroon, and then scattered to other countries.

    • The use of homophones, words that are spelled differently but sound the same, is avoided. Only one word is used, or the entire combination is avoided (homophones usually have the same “soundex” code, which is used to match words to avoid mistakes). Sorting and selection of words for dictionaries is carried out using a multi-stage process, which also includes the procedure for eliminating offensive words.
    • When similar combinations of words appear, they are distributed in such a way that locations with these addresses are unlikely to be in the same country.

    atoms.atoms.hike in north London.
    atom.atoms.hike in Quint, New York.

    Despite the fact that w3w addresses correspond to the style of Internet addresses for locations of three integers, the linguistic aspects of using words instead of numbers have been the subject of careful analysis and optimization.

    The w3w system has been optimized in order to simplify the use and storage of addresses, while minimizing possible errors. The only error correction mechanism built into the system checks the likelihood of an action. When the w3w address is entered from a device whose current location is known, the distance to the entered address is checked. If the distance is too large, and more than to the alternative sounding or spelling alternate addresses, the user is offered automatic correction.

    By minimizing errors with this correction mechanism, w3w has the potential to become a more reliable replacement for alphanumeric codes. Even with UK postal codes that have been in use for more than 50 years, people make mistakes in writing in more than 10% of cases. Moreover, the indices are rather checked only for existence, and are not checked for location.

