Programmers misconceptions about names - with examples

Original author: Tony Rogers
In 2010, Patrick Mackenzie wrote the famous article “Programmers Misconceptions about Names,” listing 40 factoids that are not always true of human names.

Do you think the programmers sat down, thought and changed the processing of names in computer systems? Unfortunately, not really. We are still universally asked to fill out online forms that require the presence of a first and last name (and in that order). These systems still assume that our names can always be written in alphabetic characters, often only ASCII.

I suspect that the article by Patrick did not have enough impact on the industry, including because it lacked examples of every misconception. But as a former employee of the IBM Global Name Management project, I can assure you that everything said in it is true.

Do not believe? In this article I will list all 40 misconceptions, giving an example (or two) of my experience in this field. Ready? Go!

1. Each person has one canonical full name.
It seems some people believe that you get a name, and it never changes. But even in Western countries, a person can change his last name when entering into marriage. In the Catholic tradition, a person can get a second name when confirming.

2. Each person has one full name that he uses.
The well-known science fiction writer John Wyndham (author of The Day of the Triffids) was born with the name John Wyndham Parks Lucas Beynon Harris, and he published books under the names John Beynon and Lucas Parks, as well as John Wyndham.

3. At this point in time, each person has one canonical full name.
An actor may have a stage pseudonym completely different from the name on the birth certificate, he may even have a passport for the stage pseudonym.

4. At this point in time, each person has one full name that he uses.
This is not true. Even in Western countries, a woman can keep her maiden name at work (where she is already known by that name) and use her husband's name in communication or in legal documents such as mortgages and loans.

5. Each person has exactly N names, regardless of the value of N.
An English name traditionally contains two names (they are often called a name and a second name) and a surname, but not necessarily everything that way. A person may not have a second name or there may be several. For example, the Portuguese have one or two first names and up to four last names (up to six in the case of a married woman), and these last names can be phrases, such as da Silva or dos Santos, or even Costa i Silva.

6. Names fit in a certain number of characters.
In the famous artist, who is usually called just Picasso, the full name was Pablo Diego José Francisco de Paula Juan Nepomuceno Maria de los Remedios Cipriano de la Santísima Trinidad Martir Patricio Ruiz and Picasso. Try to fit it into a 30 character form ...

7. Names do not change.
We have already mentioned girls who change their name when they marry, so this is clearly wrong. In addition, Catholics can take a second name at the time of confirmation. Also, a person often adds a name or completely changes it when converting to another religion - remember, after the conversion to Islam, Kat Stevens became Yusuf Islam, and Cassius Clay turned into Mohammed Ali.

8. Names change, but only in certain limited cases.
For some Thais, the usual thing is to change the name to ward off failure. This can happen without a special reason. Sometimes a person changes a name when someone else with the same name becomes famous or notorious: a remarkable example when a lot of people refused the name Hitler.

9. Names recorded in ASCII.
Explicit delusion, if only because ASCII does not contain accented characters from French, Portuguese names. This set of characters does not include the Greek alphabet used in Greek names, Cyrillic characters for Russian names. There are scripts like Devanagari for Indian names, Chinese characters (hanzi), Japanese characters (kanji), and much more.

10. Names are written in any one encoding.
Some names have mixed encodings. For example, Kanji with Latin characters or Hanzi with Latin characters, or Korean Hangul with Latin characters. In many cases, this happens because a person has a “Western name” for the benefit of those who cannot pronounce his name in his native language.

11. All names correspond to Unicode code points.
Unicode developers continue to add code points to more and more rare characters to the standard. The vast majority of the names already correspond to them, but there are still exceptions, such as the symbol "artist, formerly known as Prince . " Even if you exclude such oddities, several scripts have not yet entered Unicode. Perhaps the most realistic example is Aymara, a script for a language spoken by more than a million people in South America. Less realistic examples are the Klingon language or symbols invented by Tolkien for Middle-earth. In addition, Unicode includes only a portion of the Chinese and Japanese characters, and some of the missing characters are used in the names.

To further complicate the situation, in some languages ​​there are no written characters - they can not be written. And for such languages ​​there are no code points in Unicode. Names in these languages ​​can be written phonetically, but this is not particularly useful because most people are not familiar with the phonetic alphabet.

12. Names are case sensitive.
Many character sets are not case sensitive: for example, Chinese and Japanese. For them, the idea of ​​uppercase and lowercase letters is simply not applicable.

13. Names are not sensitive to case change.
Some scripts are case sensitive: for example, Latin. More importantly, in some languages, characters can be written in lower case, but not in upper case, so it is impossible to translate them from one register to another.

The correct register can be very important for some people, such as carriers of the names Mackenzie and Mackenzie.

In addition, the correct register is important for names such as Van Gogh, du Barry, da Costa, O'Brien and D'Agostino, and names such as Jean-Pierre.

14. Sometimes there are prefixes or suffixes in names, but you can safely ignore them.
Nothing could be further from the truth. The Dutch name Peter van der Meer is not the same as Peter Meer, although van der is a prefix.

You can think of the “junior” as a suffix in the name of Robert Downey Jr., but if you omit it, you will call his father, not him.

In Arabic names, the suffix al-Din means "faith" or "religion" - such names as Taj al-Din ("crown of faith") or Saif al-Din ("sword of religion") will not remain the same if you let the suffix. The Italian name Di Stefano is not the same as Stefano.

The Spanish woman with the surname "Víuda de de la Cruz" is the widow of a man with patronymic de la Cruz. Missing prefixes changes the meaning of the name.

15. Names do not contain numbers.
Even if you ignore dynasties (for example, Turston Howell III), in some cases the number becomes part of someone's legal name. For example, Jennifer 8 Lee chose the middle name 8, because 8 is associated with luck.

16. Names can not be written in CAPITAL letters.
In some countries (especially francophone) it is customary to write the person’s last name in capital letters so that it is clear which part is the last name. This convention is entrenched to such an extent that the spelling of the family name in lower case can be considered impolite.

17. Names cannot be written in lowercase letters.
Poet uh er Cummings preferred his name in lower case. As singer kd lang . Politely follow the spelling that the owner of the name prefers.

There is an Irish / British surname French , which is traditionally written in lower case letters, although this tradition suffers from bad software, which forces you to specify the first capital.

18. There is order in names. Selecting one of the record ordering schemes will automatically lead to a constant order among all systems if they all use the same ordering scheme.
In the Netherlands, the name of Vincent Van Gogh will be indexed and sorted by letter G as Gog; in Belgium, the same name will be indexed in B for Van Gogh. It is impossible to accept a single name system, which will lead to the generally accepted order. In many libraries, the system is based on the place of birth of a person (I would not want such a rule applied in software).

19. Name and surname necessarily different.
Australian businessman and politician Benjamin Benjamin died in 1905. Jerome K. Jerome is an English writer known for his work “Three in a Boat, Not counting Dogs”. Owen Owen is a Welshman who founded Owen Owen Ltd, a company that manages a network of department stores. And we will not even touch the athletes and actors who have adopted such pseudonyms.

20. People have a surname or something similar, common to relatives.
In Java, it was customary to give a person only a name without a last name. For example, the presidents of Indonesia Suharto and Sukarno did not have a surname.

21. The name of the person is unique.
Say it to anyone named John Smith! I have a slightly less common name, but I found a person with the same name and surname working in the same industry in the same country (Australia).

22. Name of personalmost unique.
Even taking into account non-standard spelling, it is usually easy to find people with the same full name: try to google your own.

23. Okay, okay, but the names are quite rare, so there are not a million people with the same name and surname.
The Chinese name Zhang Wei is reportedly carried by more than a quarter of a million people.

If we restrict ourselves to surnames, then about 20% of the population of South Korea have the surname Kim. About 10% of North China’s population is Wang, and more than 10% of South China’s population is Chen. In second place, and there, and there is the name of Lee, which makes her the most popular in the country. And about 40% of Vietnamese have the name Nguyen.

Names are also far from unique.

24. My system will never deal with names from China.
Migration distributed the names of each culture to (almost) every country. The days have almost passed when new names were given to immigrants when entering the country (although, for example, Vietnam still requires the applicant for citizenship to adopt a Vietnamese name). It is unrealistic to expect a complete absence of names from other countries, although you can see them in transliterated form.

So, the Chinese name like on your system may appear as Chow Yun-Fat, and Chow Yun-fat, or even Yun Fat Chow (Chow - a name).

25. Or Japan.
see above.

26. Or Korea.
see above.

27. Or Ireland, Great Britain, the USA, Spain, Mexico, Brazil, Peru, Sweden, Botswana, South Africa, Trinidad, Haiti, France, the Klingon Empire — all of these use “weird” schemes for names.
see above.

28. The Klingon Empire was a joke, right?
It is hard to find examples of people officially using Klingon names, but why not? If we implement a system with support for other cultures (for example, a built-in apostrophe for O'Brien), then we will be able to support Klingon names without additional work.

29. To hell with cultural relativism! People in my society , at least, have the same idea of ​​a generally accepted standard for names.
Will your software work only with people who have received names in your community?

30. There is an algorithm that converts names to one and the other without loss. (Yes, yes, you can do it, if the algorithm at the output returns the same as at the entrance, take yourself a medal).
There is no algorithm (other than memorizing the original format) that converts a name in a guaranteed reversible way.

31. I can confidently assume that this dictionary of obscene words does not contain surnames.
This is a common mistake: many “bad words” are not bad in other languages, and some are used in names. In addition, not every society restricts which words can be used in a name: it is quite possible that someone’s name has been assigned in such jurisdiction.

32. Names are given to people at birth.
Births are recorded in most countries, but the effectiveness of this system is not the same.

Exact rules vary by jurisdiction, but some delay in birth registration is always allowed. The permissible delay varies from three weeks (Scotland) to two months (Australia), but there are more.

The baby's name can be recorded at the time of birth registration, but this does not always happen (in some places children are still registered with names like Baby Boy or Baby Girl, when parents have problems with choosing a name or if the child is a foundling, for example).

33. OK, maybe not at birth, but rather soon after it.

34. Okay, okay, for a year or so.

35. Five years?

36. You're kidding, right?
There are cultures in which an adult does not give a person a name until puberty. Before that, the child may have a “milk” or temporary name.

37. Two different systems in which the name of the same person is indicated will use the same name for it.
If this were the case, then there would not be a software market for reconciling different databases.

In my personal case, some systems contained my official name, including the middle name, and others only the name and surname or an abbreviated name and surname. And this is another simple case. My wife in some systems is listed by maiden name, and in others with her husband's name, with or without a full name, with or without a second name, and with either of the two spellings of her short name.

38. Two different data entry operators, if they are given the name of a person, will necessarily enter the same character set, if the system is well designed.
Imagine what happens when a person enters a name that he hears on the phone. For example, Thomson and Thompson; or Johnson, Johnston, Johnstone and Johnsson.

39. People whose names break my system are strange aliens. They should have normal, acceptable names, like 田中 太郎.
No, your system is poorly designed.

In particular, the above name is often found as the name of a foreigner in the anime (and manga). There were real people with that name.

40. People have names.
For this case, it is perhaps most difficult to cite convincing examples. There was an isolated culture in which no one had any names: they called each other relative terms, such as "my mother's older sister."


So, we did it: we found examples (almost) for all forty points from Patrick Mackenzie ’s article “Programmers' Misconceptions about Names” . If you feel some oversupply of information, then let's summarize. Here is the most important thing when developing a system that processes names:

  • Do not use terms such as “first name” or “Christian name”: simply “given name” (given name) is the most common term.
  • Keep in mind that half the world first indicates the last name.
  • In many cultures, a different system is used than a single last name, common to all family members. Some use patronyms or matronims (sometimes not one); others have no family name at all.
  • Punctuation can be a vital part of a name: the Irish last name O'Hara does not match the Japanese last name Ohara. The name Jean-Pierre does not coincide with the name Jean-Pierre, nor with the name Jean Pierre, and Jean-Pierre is one name, and Jean Pierre is two separate names.
  • Spaces do not necessarily separate parts of the name and surname: de la Cruz is one surname, not three separate ones; Chinese names in Hanzi are written without spaces between the name and the surname.
  • The use of capital letters is not so obvious: the surname van der Meer can be written with a capital letter when used without a name, but with a lowercase after the name.
  • Use the entire name instead of breaking it apart. For example, do not try to refer to a person as "Mr. last-word-in-name": there may be errors in different cases:
    • If the last name is written before the first name (for example, Chinese).
    • If it is correct to use the patronymic, but he is not the last.
    • If the last name consists of more than one word, for example, the Spanish last name de la Torre
    • If the name contains a suffix, for example, "younger".

Finally, I strongly recommend that you read the small naming guide in an article from the W3C .

