
We write a shader on AGAL

This article will focus on the language of writing shaders - AGAL (Adobe Graphics Assembly Language). It is assumed that the reader is familiar with the basic principles of modern realtime 3D graphics, and ideally has experience with OpenGL or Direct3D . For the rest, I will conduct a short excursion:
- in every frame everything is rendered again, approaches with partial redrawing the screen are extremely undesirable
- 2D - a special case of 3D
- the video card is able to rasterize triangles and nothing but
- triangles are built on tops
- each vertex contains attributes (coordinate, normal, weight, etc.)
- the order of specifying the vertices in the triangle is determined by the indices
- vertex and index data are stored in vertex and index buffers respectively
- shader - a program executed by a video card
- each vertex passes through a vertex shader, and each pixel when rasterized through a fragment (pixel)
- the video card does not know how to work with integers, but works fine with 4D vectors
Syntax
The current AGAL implementation uses the Shader Model 2.0 crop, i.e. iron phycellist limited to 2005. But it is worth remembering that this is a limitation only of the capabilities of the shader program, but not of the hardware performance. It is possible that in future versions of Flash Player the bar will be raised to SM 3.0, and we will be able to render several textures at once and make texture sampling directly from the vertex shader, but given the Adobe policy, this will not happen until the next generation of mobile devices.
Any AGAL program is essentially a low-level assembly language. The language itself is very simple, but requires a fair amount of care. The shader code is represented by a set of instructions of the form:
opcode [dst], [src1], [src2]
which in the free interpretation means "execute the opcode command with the parameters src1 and src2, returning the value to dst". A shader can contain up to 256 instructions. The names of the registers are dst, src1, and src2: va, vc, fc, vt, ft, op, oc, v, fs. Each of these registers, with the exception of fs, is a four-dimensional (xyzw or rgba) vector. It is possible to work with individual components of the vector, including swizzling (a different order):dp4 ft0.x, v0.xyzw, v0.yxww
Consider each of the types of registers in more detail.
Output register
As a result of the calculation, the vertex shader is required to write the value of the window position of the vertex in the op (output position) register , and the fragment shader - in oc (output color) the value of the final pixel color. In the case of the fragment shader, it is possible to cancel processing by the kil instruction , which will be described below.
Case attribute
A vertex can contain up to 8 vector attributes, which are accessed from the shader via the va registers , the position of which in the vertex buffer is set by the Context3D.setVertexBufferAt function . Attribute data can be in the formats FLOAT_1, FLOAT_2, FLOAT_3, FLOAT_4 and BYTES_4. The number in the name indicates the number of components of the vector. It should be noted that in the case of BYTES_4, the values of the components are normalized, i.e. divided by 255.
Register Interpolator
In addition to writing to the op register , the vertex shader can transfer up to 8 vectors to the fragment shader through the v registers . The values of these vectors will be linearly interpolated over the entire polygon area during rasterization. We illustrate the operation of interpolators using the example of a triangle, at the vertices of which an attribute is displayed, displayed by the fragment shader:
// vertex
mov op, va0 // первый атрибут - позиция
mov v0, va1 // второй атрибут передаём в шейдер как интерполятор
// fragment
mov oc, v0 // возвращаем полученный интерполятор в качестве цвета

Register variable
In the vertex and fragment shaders, up to 8 vt and ft registers are available for storing intermediate calculation results. For example, in the fragment shader, it is necessary to calculate the sum of four vectors received from the vertex program (v0..v3 registers):
add ft0, v0, v1 // ft0 = v0 + v1
add ft0, ft0, v2 // ft0 = ft0 + v2
add ft0, ft0, v3 // ft0 = ft0 + v3
As a result, ft0 will store the amount we need, and everything seems to be great, but at first glance there is an optimization possibility that is not obvious at first glance, which is directly related to the architecture of the video card software pipeline and is partly the reason for its high performance.
The shaders are based on the concept of ILP (Instruction-level parallelism), which, judging by the name, allows you to execute several instructions simultaneously. The main condition for activating this mechanism is the independence of instructions from each other. For the example above:
add ft0, v0, v1 // ft0 = v0 + v1
add ft1, v2, v3 // ft1 = v2 + v3
add ft0, ft0, ft1 // ft0 = ft0 + ft1
The first two instructions will be executed simultaneously, because work with independent registers. It follows from this that the key role in the performance of your shader is played not so much by the number of instructions as their independence from each other.
Register constant
The storage of numerical constants directly in the shader code is not allowed, i.e. all the constants necessary for work must be transferred to the shader before calling Context3D.drawTriangles , and will be available in the registers vc (128 vectors) and fc (28 vectors). It is possible to access the register by its index using square brackets, which is very convenient when implementing skeletal animation or indexing materials. It is important to remember that the operation of setting shader constants is relatively expensive and should be avoided if possible. For example, it makes no sense to transfer the projection matrix to the shader before rendering each object, if it does not change in the current frame.
Register-sampler
You can transfer up to 8 textures to the fragment shader with the Context3D.setTextureAt function , which are accessed through the corresponding fs registers , which are used exclusively in the tex operator. Let's change the example with the triangle a bit, and as the second attribute of the vertex we pass the texture coordinates, and in the fragment shader we make texture sampling by these already interpolated coordinates:
// vertex
mov op, va0 // позиция
mov v0, va1 // второй атрибут - текстурная координата
// fragment
tex oc, v0, fs0 <2d,linear> // выборка из текстуры

Operators
Currently (October 2011), AGAL implements the following operators:
mov dst = src1
neg dst = -src1
abs dst = abs(src1)
add dst = src1 + src2
sub dst = src1 – src2
mul dst = src1 * src2
div dst = src1 / src2
rcp dst = 1 / src1
min dst = min(src1, src2)
max dst = max(src1, src2)
sat dst = max(min(src1, 1), 0)
frc dst = src1 – floor(src1)
sqt dst = src1^0.5
rsq dst = 1 / (src1^0.5)
pow dst = src1^src2
log dst = log2(src1)
exp dst = 2^src1
nrm dst = normalize(src1)
sin dst = sine(src1)
cos dst = cosine(src1)
slt dst = (src1 < src2) ? 1 : 0
sge dst = (src1 >= src2) ? 1 : 0
dp3 скалярное произведение
dst = src1.x*src2.x + src1.y*src2.y + src1.z*src2.z
dp4 скалярное произведение всех четырёх компонент вектора
dst = src1.x*src2.x + src1.y*src2.y + src1.z*src2.z + src1.w*src2.w
crs векторное произведение
dst.x = src1.y * src2.z – src1.z * src2.y
dst.y = src1.z * src2.x – src1.x * src2.z
dst.z = src1.x * src2.y – src1.y * src2.x
m33 умножение вектора на матрицу 3х3
dst.x = dp3(src1, src2[0])
dst.y = dp3(src1, src2[1])
dst.z = dp3(src1, src2[2])
m34 умножение вектора на матрицу 3х4
dst.x = dp4(src1, src2[0])
dst.y = dp4(src1, src2[1])
dst.z = dp4(src1, src2[2])
m44 умножение вектора на матрицу 4х4
dst.x = dp4(src1, src2[0])
dst.y = dp4(src1, src2[1])
dst.z = dp4(src1, src2[2])
dst.w = dp4(src1, src2[3])
kil отмена обработки фрагмента
прекращает выполнение фрагментного шейдера, если значение src1
меньше нуля, обычно используется для реализации alpha-test,
когда нет возможности сортировки порядка полупрозрачных объектов.
tex выборка значения из текстуры
заносит в dst значение цвета в координатах src1 из текстуры src2
также принимает дополнительные параметры, перечисленные
через запятую, например:
tex ft0, v0, fs0 <2d,repeat,linear,miplinear>
данные параметры нужны для обозначения:
формата текстуры 2d, cube
фильтрации nearest, linear
мипмаппинга nomip, miplinear, mipnearest
тайлинга clamp, repeat
Other operators, including conditional jumps and loops, are planned to be implemented in future versions of Flash Player. But this does not mean that now you can not even use the usual if, the slt and sge instructions are quite suitable for these tasks.
Effects
We got acquainted with the basics, now the most interesting part of the article is the practical application of new knowledge. As stated at the very beginning, the ability to write a shader completely unties the hands of the graphics programmer, i.e. actual limitations are only in the developer’s imagination and mathematical savvy. Previously, it was possible to verify that the assembler language itself was simple, but behind the simplicity lies the complexity of “tasting” into already forgotten code. Therefore, I highly recommend commenting on key areas of the shader code in order to quickly navigate it if necessary.
Billet
The starting point for all subsequent examples will be a small “blank” in the form of a teapot. Unlike the example with a triangle, we need a matrix of projection and transformation of the camera to create the effect of perspective and rotation around the object. We will pass it to constant registers. It is important to remember that a 4x4 matrix occupies exactly 4 registers, and when it is written to the vc0 register, v0..v3 will be occupied. Also, a constant vector from the numbers often used in the shader (0.0, 0.5, 1.0, 2.0) is useful to us.
In total, the basic shader code will look like this:
// vertex
m44 op, va0, vc0 // применяем viewProj матрицу
// fragment
mov ft0, fc0.xxxz // занесём в ft0 чёрный непрозрачный цвет
mov oc, ft0 // вернём ft0 в качестве цвета пикселя

Texture mapping
Overlay of up to 8 textures is possible in a shader, with an almost unlimited number of samples. This means that this limit does not really matter when using atlases or cubic textures. We will improve our example and, instead of setting the color in the fragment shader, we will get it from the texture using the texture coordinates-interpolators received from the vertex shader:
// vertex
...
mov v0, va1 // передаём в фрагментный шейдер текстурную координату
// fragment
tex ft0, v0, fs0 <2d,repeat,linear,miplinear>

Lambert shading
The most primitive lighting model that simulates the real. It is based on the position that the intensity of light falling on the surface linearly depends on the cosine of the angle between the incidence vectors and the normal to the surface. From the school course of mathematics, we recall that the scalar product of unit vectors gives the cosine of the angle between them, therefore, our Lambert lighting formula will look like:
Lambert = Diffuse * (Ambient + max (0, dot (LightVec, Normal)))
Color = Lambert
where Diffuse is the color of the object at the point (taken from the texture for example),
Ambient is the color of the background lighting
LightVec is the unit vector from the point to the light source
Normal- perpendicular to the surface
Color - the final color of the pixel. The
shader will take two new constant parameters: the position of the source and the value of the background light:
// vertex
...
mov v1, va2 // v1 = normal
sub v2, vc4, va0 // v2 = lightPos - vertex (lightVec)
// fragment
...
nrm ft1.xyz, v1 // normal ft1 = normalize(lerp_normal)
nrm ft2.xyz, v2 // lightVec ft2 = normalize(lerp_lightVec)
dp3 ft5.x, ft1.xyz, ft2.xyz // ft5 = dot(normal, lightVec)
max ft5.x, ft5.x, fc0.x // ft5 = max(ft5, 0.0)
add ft5, fc1, ft5.x // ft5 = ambient + ft5
mul ft0, ft0, ft5 // color *= ft5

Phong shading
Introduces the concept of glare from a light source into the Lambert lighting model. It implies that the intensity of the flare is determined by the power-law function with respect to the cosine of the angle between the vector to the source and the direction resulting from the reflection of the observer's vector relative to the normal to the surface.
Phong = pow ( max (0, dot (LightVec, reflect (-ViewVec, Normal))), SpecularPower) * SpecularLevel
Color = Lamber + Phong
where ViewVec is the observer's gaze vector
SpecularPower is the degree determining the size of the
specularLevel flare is the level of glare intensity or its color
reflect- reflection calculation function f (v, n) = 2 * n * dot (n, v) - v
For complex models it is customary to use Specular and Gloss cards that determine the color / intensity (SpecularLevel), as well as the specular power (SpecularPower) on different parts of the texture space of the model. In our case, we will manage with constant values of degree and intensity. We will pass a new parameter to the vertex shader - the observer position for the subsequent calculation of ViewVec:
// vertex
...
sub v3, va0, vc5 // v3 = vertex - viewPos (viewVec)
// fragment
...
nrm ft3.xyz, v3 // viewVec ft3 = normalize(lerp_viewVec)
// расчёт вектора отражения reflect(-viewVec, normal)
dp3 ft4.x, ft1.xyz ft3.xyz // ft4 = dot(normal, viewVec)
mul ft4, ft1.xyz, ft4.x // ft4 *= normal
add ft4, ft4, ft4 // ft4 *= 2
sub ft4, ft3.xyz, ft4 // reflect ft4 = viewVec - ft4
// phong
dp3 ft6.x, ft2.xyz, ft4.xyz // ft6 = dot(lightVec, reflect)
max ft6.x, ft6.x, fc0.x // ft6 = max(ft6, 0.0)
pow ft6.x, ft6.x, fc2.w // ft6 = pow(ft6, specularPower)
mul ft6, ft6.x, fc2.xyz // ft6 *= specularLevel
add ft0, ft0, ft6 // color += ft6

Normal mapping
A relatively simple method for simulating a surface topography using a normal texture. The direction of the normal in such a texture is usually set in the form of RGB values obtained from reducing its coordinates to the range 0..1 (xyz * 0.5 + 0.5). Normals can be represented both in the object space (Object Space), and in relative space (Tangent Space), built on the basis of texture coordinates and the normal to the vertex. The first has a number of sometimes significant drawbacks in the form of a large memory consumption for textures due to the impossibility of tiling and mirror-texturing, but it allows you to save on the number of instructions. In the example, we will use a more flexible and general version with Tangent Space, for which, in addition to the normal, two additional additional base vectors Tangent and Binormal are required.
// vertex
...
// transform lightVec
sub vt1, vc4, va0 // vt1 = lightPos - vertex (lightVec)
dp3 vt3.x, vt1, va4
dp3 vt3.y, vt1, va3
dp3 vt3.z, vt1, va2
mov v2, vt3.xyzx // v2 = lightVec
// transform viewVec
sub vt2, va0, vc5 // vt2 = vertex - viewPos (viewVec)
dp3 vt4.x, vt2, va4
dp3 vt4.y, vt2, va3
dp3 vt4.z, vt2, va2
mov v3, vt4.xyzx // v3 = viewVec
// fragment
tex ft1, v0, fs1 <2d,repeat,linear,miplinear> // ft1 = normalMap(v0)
// 0..1 to -1..1
add ft1, ft1, ft1 // ft1 *= 2
sub ft1, ft1, fc0.z // ft1 -= 1
nrm ft1.xyz, ft1 // normal ft1 = normalize(normal)
...

Toon shading
A kind of non-photorealistic lighting model that simulates a cartoon drawing of shading. It is implemented in a variety of ways, the simplest of which is to select a color from a 1D texture by the cosine of the angle from the Lambert model. In our case, for example, use a 16x1 texture:

// fragment
...
dp3 ft5.x, ft1.xyz, ft2.xyz // ft5 = dot(normal, lightVec)
tex ft0, ft5.xx, fs3 <2d,nearest> // color = toonMap(ft5)

Sphere mapping
The easiest option to simulate reflection, often used for the effect of chrome plating of metal. Represents the environment in the form of a texture with a spherical distortion of the fisheye type, as shown below:

The main task is to transform the coordinates of the reflection vector into the corresponding texture coordinates:
uv = (xy / sqrt (x ^ 2 + y ^ 2 + (z + 1) ^ 2)) * 0.5 + 0.5
Multiplication and a shift by 0.5 are needed to bring the normalized result to the space of texture coordinates 0..1. In the simple case for an ideally reflecting surface, the influence of the map is additive, and for more complex cases when a diffuse component is required, it is customary to use the approximation of Fresnel formulas. Also for complex models Reflection maps are often used, indicating the intensity of reflection of different parts of the texture of the model.
// fragment
...
add ft6, ft4, fc0.xxz // ft6 = reflect (x, y, z + 1)
dp3 ft6.x, ft6, ft6 // ft6 = ft6^2
rsq ft6.x, ft6.x // ft6 = 1 / sqrt(ft6)
mul ft6, ft4, ft6.x // ft6 = reflect / ft6
mul ft6, ft6, fc0.y // ft6 *= 0.5
add ft6, ft6, fc0.y // ft6 += 0.5
tex ft0, ft6, fs2 <2d,nearest> // color = reflect(ft6)

This is probably the end. The examples presented here, for the most part, describe the properties of the material of the object, but shaders find their application in other tasks, such as skeletal animation, shadows, water, and other relatively complex tasks (including non-visual). And with proper pumping of skills, they allow you to implement quite complex things by type in a short time:
Conclusion
Flash Games - It's Easy! example to the article .