Laurel / Yanny: an audio version of a blue and gold dress

    Three years ago, a dress was already discussed here , which the Internet shared . The other day, a similar, even more interesting and more complicated explainable illusion was discovered. What name do you hear on this audio tape: “Yenny” or “Laurel”?


    As it turned out, the results not only differ from person to person, but even for one person can depend on the used audio equipment. All week, linguists argue about the causes of illusion, staring at the spectrogram of this two-second fragment. Here she is:

    For those who see the spectrogram of sound for the first time: time is plotted on the horizontal axis, frequencies are plotted on the vertical axis, the brightness of the point corresponds to the amplitude with which the “imaginary tuning fork” of the corresponding frequency vibrates at the corresponding time moment. On the spectrogram of speech, " formants " are always visible - dark horizontal lines, winding and broken; each formant corresponds to one of the resonant frequencies of the speech apparatus, and their vertical vibrations - respectively, to changes in these resonant frequencies in the process of speech.

    As explainsSuzy Styles, in the low-frequency region up to 5 KHz, there are three formants in human speech, which are usually enough to recognize pronounced sounds. These three formants correspond to the vertical (F1) and horizontal (F2) position of the tongue, and the position of the lips (F3). Suzy gives a link to the Max Planck Society video , where the announcer in the MRI camera takes turns pronouncing all the vowels and all consonants, so that the position of his speech organs during the pronunciation of each sound can be monitored directly.

    And with the selection of formants, according to Suzy, there are problems: the dark areas on the yanni / laurel spectrogram form a pattern of more than three bands that branch and intersect:

    In particular, the lower band (F1) can be recognized as either "hump up" or "hump down":

    The first line corresponds to the sequence of vowels "high - low - high", i.e. [jæ-ɪ-]; the second is "low - high - medium", i.e. [ao-ə-]. (In Suzy's picture, there is an obvious mistake: [u] is a high vowel, and cannot be at the end of the second sequence.) From F2 it is clear that the vowel sequence should be “front - middle - front”, i.e. again [jæ-ɪ-]. But if the listener's audio system suppresses frequencies between 2 and 3 KHz, then the listener “thinks” F2 based on F1, and receives a sequence of back-middle vowels, i.e. [-o-ə-]:

    Suzy summarizes her analysis: instead of three clear formants, we see a confusion of dark spots that can be deciphered in one of two ways:

    A slightly different analysis results Carolyn MakGettigan. When it became known that the “ambiguous sound” was not designed by insidious linguists to mock normal people, but was taken from an online dictionary site , passed through not very high-quality speakers, and recorded with a not very high-quality microphone, Carolyn compared the spectrograms of the original sound from the site , and the resulting "sound-illusion":



    In the first sound, F1 and F2 are clearly visible, but very close; in the second, in addition to adding little noise, F1 and F2 merged into one formant, and the original F3 began to be perceived as F2. Carolyn notes that the “hump down” in F3 is a hallmark of the English sound [ɹ]; and in the resulting sound, he instead began to be perceived as a "hump down" in F2, i.e. as a sequence of vowels "front - middle - front" - the notorious [jæ-ɪ-].

    In addition to these two explanations for the illusion, linguists have suggested a few more. Benjamin Musson noted that at higher frequencies (5-9, 9-13, 13-17 KHz) weaker F1-F3 repeats are contained:

    There are no such “repeating formants” in human speech, so Benjamin blames them for the illusion. (Most likely, this is an artifact of the audio compression used for the “ambiguous sound.”)
    NY Times - discussion of the illusion has even reached the point ! - Also accuses the amplification of high frequencies that occurred during the dubbing as an illusion:


    Moreover, in their note, they implemented an “interactive illusion” - a frequency filter, the settings of which can be smoothly changed with a slider so that anyone can be sure: if you amplify low frequencies and suppress high ones, then the sound turns into Laurel, if vice versa, then into Yanny.

    Taking this opportunity, I’ll also mention here my own acoustic-phonetic interactive piece , written on my knee inspired by a long-standing quest from Meklon'a. (I’m never the front-end, and I will gladly accept PR with a more friendly UI.) This interactive thing allows you to draw on the spectrogram and directly in real time listen to what the sound is; in particular, you can take an existing sound and try to circle its formants, or add new ones, or selectively erase a frequency range.

    Only registered users can participate in the survey. Please come in.

    What do you hear?

    • 55.8% Yanny 945
    • 48.6% Laurel 824

    Also popular now: