Vision modeling. Part one. Eye tour
1 Excursion into the eye - 2 Perception - 3 Geometry of vision - 4 Eye tracking - 5 How to catch a look - 6 Modeling eye tracking
To understand how a person perceives an image, you have to start with the organ of vision - from the eye. An important point for further understanding besides the anatomical structure of the eye is the limitations of the resolution of the eye, which I will describe here. If you know all this, then you can only skim through the selected pieces of text and immediately proceed to the second part.
And the way the eye is arranged, you probably all remember from the biology textbook, but here I will tell you some really amazing things that for some reason they don’t tell at school. But first, I ’ll still remind you of the structure of the eye (illustrations taken from David Hubell’s book “Eye, Brain, Vision”): The
eye is held in the orbit by a group of 6 muscles, they turn it up and down left and right to “squint your eyes” if necessary .
The luminous flux entering the retina passes through the cornea (it provides about 70% of the refraction of light), then through the pupil, which is an analogue of the camera’s diaphragm and controlled by a group of radial and ring muscles that change its size, then it enters the lens, which provides the final focusing on visible objects. The lens is a gelatinous pillow, which is compressed by radial muscles. During compression, the lens changes its shape, thus changing the degree of refraction of light and the focal distance.
Actually, all this is necessary to create a projection of the visible world on the retina. The retina is the part of the brain that separated from it in the early stages of development, but is tightly connected with it by the bundle of the optic nerve:
The retina is not just an analog of a camera sensor that converts light signals into electrical impulses. The retina performs the primary processing of the incoming image before it enters the visual cortex.
The retina itself consists of three layers of nerve cells, and the actual photosensors that receive light (rods and cones) make up the third, outer layer on its back surface:
Thus, in order to reach the sensors, light first passes through two layers of nerve cells. The photoreceptors on the back are coated with melanin (black pigment), which plays the same role as blackening the inner body of the camera. If there was no blackening, then the light passing through a layer of rods and cones would go further to the brain, would be reflected from it and come back, in every way spoiling our picture of the world.
Photoreceptors come in two forms:
Since the conversation is about the analysis of images on a computer monitor, I will not talk more about scotopic vision and sticks, focusing all my attention on photopic vision and cones.
So, cones are divided into three types according to the type of pigment in them, and each of these types is responsible for the perception of its spectral band. Conventionally, the types of cones are called "blue", "green" and "red", but in fact the spectrum of their perception goes beyond the scope of these colors. Therefore, many parts of the spectrum, for example, yellow, are immediately perceived by two types of cones, and the combined sensation from the excitation of “red” and “green” cones creates a yellow sensation in us .
(more accurate plots of cone sensitivity distribution can be found here )
There are approximately the same number of cones of all three species, but since the number of cones perceiving it in the yellow spectrum is twice as large, the yellow one seems much more intense than blue . The same applies to some other colors (see diagram below).
All photoreceptors are combined by ganglion cells into blocks called receptive fields - each ganglion cell is associated with one receptive field. One photoreceptor can consist of several receptive fields at once, therefore the fields of two neighboring ganglion cells overlap by 70-80%. In the simplest case, the receptive field is similar to the pixel analog of the photosensitive matrix in the camera. But not so simple!
First, the receptive fields vary in size depending on the area of the retina : so in the area of the eye fossa - the area of the retina corresponding to the area of greatest visual acuity, the receptive field is 1-2 mm (which corresponds to 2-3 angular minutes), and when shifted to the periphery - already up to 5 mm!
Secondly, the “pixels” of the receptive fields differ from the pixels of the camera’s matrix in that there is a separation among them - some respond to “on”, others to “turn off”. Those. some fields react exclusively to changes in illumination from darker to brighter, others - vice versa .
Thirdly, the fields differ in size depending on their types (they are conventionally denoted by P fields and M fields). M fields are larger, have a high reaction rate, but because of their size, the picture transmitted by them will be in a much worse resolution than the picture transmitted by P fields. P fields are smaller, so the picture transmitted by them is more accurate, but their reaction rate is not high. Thus,M fields record the movement and answer the question “where is the object (perception of movement and depth)”, and P fields deal with color, shape and details of the visual display, or answer the question “what are the objects (color vision and sharpness)”. M fields deal with unsharp contrast, and P with sharp contrast. The longer the gaze remains in place, the more the role of P fields increases.
Fourthly, the situation with these “pixels” is not so simple: if the pixels of a photomatrix fix exclusively color changes, then receptive fields are also able to respond to lines, stripes, various rectangular segments with clear edges! Those.in addition to the information that there is a change in color / illumination, these “pumped pixels” also transmit information that there is a straight line, information about its length (so long that it goes beyond the limits of vision or the end of the segment is visible) and even about the direction ( angle) of a straight line !
1 Eye tour - 2 Perception - 3 Geometry of vision - 4 Eye tracking - 5 How to catch a look - 6 Modeling eye tracking
To understand how a person perceives an image, you have to start with the organ of vision - from the eye. An important point for further understanding besides the anatomical structure of the eye is the limitations of the resolution of the eye, which I will describe here. If you know all this, then you can only skim through the selected pieces of text and immediately proceed to the second part.
Anatomy of the eye
And the way the eye is arranged, you probably all remember from the biology textbook, but here I will tell you some really amazing things that for some reason they don’t tell at school. But first, I ’ll still remind you of the structure of the eye (illustrations taken from David Hubell’s book “Eye, Brain, Vision”): The
eye is held in the orbit by a group of 6 muscles, they turn it up and down left and right to “squint your eyes” if necessary .
The luminous flux entering the retina passes through the cornea (it provides about 70% of the refraction of light), then through the pupil, which is an analogue of the camera’s diaphragm and controlled by a group of radial and ring muscles that change its size, then it enters the lens, which provides the final focusing on visible objects. The lens is a gelatinous pillow, which is compressed by radial muscles. During compression, the lens changes its shape, thus changing the degree of refraction of light and the focal distance.
Actually, all this is necessary to create a projection of the visible world on the retina. The retina is the part of the brain that separated from it in the early stages of development, but is tightly connected with it by the bundle of the optic nerve:
The retina is not just an analog of a camera sensor that converts light signals into electrical impulses. The retina performs the primary processing of the incoming image before it enters the visual cortex.
The retina itself consists of three layers of nerve cells, and the actual photosensors that receive light (rods and cones) make up the third, outer layer on its back surface:
Thus, in order to reach the sensors, light first passes through two layers of nerve cells. The photoreceptors on the back are coated with melanin (black pigment), which plays the same role as blackening the inner body of the camera. If there was no blackening, then the light passing through a layer of rods and cones would go further to the brain, would be reflected from it and come back, in every way spoiling our picture of the world.
Photoreceptors
Photoreceptors come in two forms:
- The sticks are receptors that are sensitive to very low light (mainly in the yellow-green part of the spectrum) and perceive very subtle differences in illumination, so at high levels of illumination they become useless and are responsible for night, scotopic vision;
- Cones - receptors that require much more light to work, perceive stronger gradations in lighting and are responsible for daytime, photopic vision.
Since the conversation is about the analysis of images on a computer monitor, I will not talk more about scotopic vision and sticks, focusing all my attention on photopic vision and cones.
So, cones are divided into three types according to the type of pigment in them, and each of these types is responsible for the perception of its spectral band. Conventionally, the types of cones are called "blue", "green" and "red", but in fact the spectrum of their perception goes beyond the scope of these colors. Therefore, many parts of the spectrum, for example, yellow, are immediately perceived by two types of cones, and the combined sensation from the excitation of “red” and “green” cones creates a yellow sensation in us .
(more accurate plots of cone sensitivity distribution can be found here )
There are approximately the same number of cones of all three species, but since the number of cones perceiving it in the yellow spectrum is twice as large, the yellow one seems much more intense than blue . The same applies to some other colors (see diagram below).
Retinal Pixels
All photoreceptors are combined by ganglion cells into blocks called receptive fields - each ganglion cell is associated with one receptive field. One photoreceptor can consist of several receptive fields at once, therefore the fields of two neighboring ganglion cells overlap by 70-80%. In the simplest case, the receptive field is similar to the pixel analog of the photosensitive matrix in the camera. But not so simple!
First, the receptive fields vary in size depending on the area of the retina : so in the area of the eye fossa - the area of the retina corresponding to the area of greatest visual acuity, the receptive field is 1-2 mm (which corresponds to 2-3 angular minutes), and when shifted to the periphery - already up to 5 mm!
Secondly, the “pixels” of the receptive fields differ from the pixels of the camera’s matrix in that there is a separation among them - some respond to “on”, others to “turn off”. Those. some fields react exclusively to changes in illumination from darker to brighter, others - vice versa .
Thirdly, the fields differ in size depending on their types (they are conventionally denoted by P fields and M fields). M fields are larger, have a high reaction rate, but because of their size, the picture transmitted by them will be in a much worse resolution than the picture transmitted by P fields. P fields are smaller, so the picture transmitted by them is more accurate, but their reaction rate is not high. Thus,M fields record the movement and answer the question “where is the object (perception of movement and depth)”, and P fields deal with color, shape and details of the visual display, or answer the question “what are the objects (color vision and sharpness)”. M fields deal with unsharp contrast, and P with sharp contrast. The longer the gaze remains in place, the more the role of P fields increases.
Fourthly, the situation with these “pixels” is not so simple: if the pixels of a photomatrix fix exclusively color changes, then receptive fields are also able to respond to lines, stripes, various rectangular segments with clear edges! Those.in addition to the information that there is a change in color / illumination, these “pumped pixels” also transmit information that there is a straight line, information about its length (so long that it goes beyond the limits of vision or the end of the segment is visible) and even about the direction ( angle) of a straight line !
1 Eye tour - 2 Perception - 3 Geometry of vision - 4 Eye tracking - 5 How to catch a look - 6 Modeling eye tracking