XProger October 15, 2011 at 13:33

We write a shader on AGAL

It's no secret that Flash Player 11 has GPU support for graphics acceleration. The new version introduces the Molehill API, allowing you to work with the video card at a fairly low level, which on the one hand gives full play to the imagination, and on the other requires a deeper understanding of the principles of modern 3D graphics.

This article will focus on the language of writing shaders - AGAL (Adobe Graphics Assembly Language). It is assumed that the reader is familiar with the basic principles of modern realtime 3D graphics, and ideally has experience with OpenGL or Direct3D . For the rest, I will conduct a short excursion:

in every frame everything is rendered again, approaches with partial redrawing the screen are extremely undesirable
2D - a special case of 3D
the video card is able to rasterize triangles and nothing but
triangles are built on tops
each vertex contains attributes (coordinate, normal, weight, etc.)
the order of specifying the vertices in the triangle is determined by the indices
vertex and index data are stored in vertex and index buffers respectively
shader - a program executed by a video card
each vertex passes through a vertex shader, and each pixel when rasterized through a fragment (pixel)
the video card does not know how to work with integers, but works fine with 4D vectors

Syntax

The current AGAL implementation uses the Shader Model 2.0 crop, i.e. iron phycellist limited to 2005. But it is worth remembering that this is a limitation only of the capabilities of the shader program, but not of the hardware performance. It is possible that in future versions of Flash Player the bar will be raised to SM 3.0, and we will be able to render several textures at once and make texture sampling directly from the vertex shader, but given the Adobe policy, this will not happen until the next generation of mobile devices.

Any AGAL program is essentially a low-level assembly language. The language itself is very simple, but requires a fair amount of care. The shader code is represented by a set of instructions of the form:

opcode [dst], [src1], [src2]

which in the free interpretation means "execute the opcode command with the parameters src1 and src2, returning the value to dst". A shader can contain up to 256 instructions. The names of the registers are dst, src1, and src2: va, vc, fc, vt, ft, op, oc, v, fs. Each of these registers, with the exception of fs, is a four-dimensional (xyzw or rgba) vector. It is possible to work with individual components of the vector, including swizzling (a different order):

dp4 ft0.x, v0.xyzw, v0.yxww

Consider each of the types of registers in more detail.

Output register

As a result of the calculation, the vertex shader is required to write the value of the window position of the vertex in the op (output position) register , and the fragment shader - in oc (output color) the value of the final pixel color. In the case of the fragment shader, it is possible to cancel processing by the kil instruction , which will be described below.

Case attribute

A vertex can contain up to 8 vector attributes, which are accessed from the shader via the va registers , the position of which in the vertex buffer is set by the Context3D.setVertexBufferAt function . Attribute data can be in the formats FLOAT_1, FLOAT_2, FLOAT_3, FLOAT_4 and BYTES_4. The number in the name indicates the number of components of the vector. It should be noted that in the case of BYTES_4, the values of the components are normalized, i.e. divided by 255.

Register Interpolator

In addition to writing to the op register , the vertex shader can transfer up to 8 vectors to the fragment shader through the v registers . The values of these vectors will be linearly interpolated over the entire polygon area during rasterization. We illustrate the operation of interpolators using the example of a triangle, at the vertices of which an attribute is displayed, displayed by the fragment shader:

// vertex
	mov op, va0	// первый атрибут - позиция
	mov v0, va1	// второй атрибут передаём в шейдер как интерполятор
// fragment
	mov oc, v0	// возвращаем полученный интерполятор в качестве цвета

Register variable

In the vertex and fragment shaders, up to 8 vt and ft registers are available for storing intermediate calculation results. For example, in the fragment shader, it is necessary to calculate the sum of four vectors received from the vertex program (v0..v3 registers):


	add ft0, v0, v1	// ft0 = v0 + v1
	add ft0, ft0, v2	// ft0 = ft0 + v2
	add ft0, ft0, v3	// ft0 = ft0 + v3

As a result, ft0 will store the amount we need, and everything seems to be great, but at first glance there is an optimization possibility that is not obvious at first glance, which is directly related to the architecture of the video card software pipeline and is partly the reason for its high performance.

The shaders are based on the concept of ILP (Instruction-level parallelism), which, judging by the name, allows you to execute several instructions simultaneously. The main condition for activating this mechanism is the independence of instructions from each other. For the example above:


	add ft0, v0, v1	// ft0 = v0 + v1
	add ft1, v2, v3	// ft1 = v2 + v3
	add ft0, ft0, ft1	// ft0 = ft0 + ft1

The first two instructions will be executed simultaneously, because work with independent registers. It follows from this that the key role in the performance of your shader is played not so much by the number of instructions as their independence from each other.

Register constant

The storage of numerical constants directly in the shader code is not allowed, i.e. all the constants necessary for work must be transferred to the shader before calling Context3D.drawTriangles , and will be available in the registers vc (128 vectors) and fc (28 vectors). It is possible to access the register by its index using square brackets, which is very convenient when implementing skeletal animation or indexing materials. It is important to remember that the operation of setting shader constants is relatively expensive and should be avoided if possible. For example, it makes no sense to transfer the projection matrix to the shader before rendering each object, if it does not change in the current frame.

Register-sampler

You can transfer up to 8 textures to the fragment shader with the Context3D.setTextureAt function , which are accessed through the corresponding fs registers , which are used exclusively in the tex operator. Let's change the example with the triangle a bit, and as the second attribute of the vertex we pass the texture coordinates, and in the fragment shader we make texture sampling by these already interpolated coordinates:

// vertex
	mov op, va0	// позиция
	mov v0, va1	// второй атрибут - текстурная координата
// fragment
	tex oc, v0, fs0 <2d,linear>	// выборка из текстуры

Operators

Currently (October 2011), AGAL implements the following operators:

	mov	dst = src1
	neg	dst = -src1
	abs	dst = abs(src1)
	add	dst = src1 + src2
	sub	dst = src1 – src2
	mul	dst = src1 * src2
	div	dst = src1 / src2
	rcp	dst = 1 / src1
	min	dst = min(src1, src2)
	max	dst = max(src1, src2)
	sat	dst = max(min(src1, 1), 0)
	frc	dst = src1 – floor(src1)
	sqt	dst = src1^0.5
	rsq	dst = 1 / (src1^0.5)
	pow	dst = src1^src2
	log	dst = log2(src1)
	exp	dst = 2^src1
	nrm	dst = normalize(src1)
	sin	dst = sine(src1)
	cos	dst = cosine(src1)
	slt	dst = (src1 < src2) ? 1 : 0
	sge	dst = (src1 >= src2) ? 1 : 0
	dp3	скалярное произведение
		dst = src1.x*src2.x + src1.y*src2.y + src1.z*src2.z
	dp4	скалярное произведение всех четырёх компонент вектора
		dst = src1.x*src2.x + src1.y*src2.y + src1.z*src2.z + src1.w*src2.w
	crs	векторное произведение
		dst.x = src1.y * src2.z – src1.z * src2.y
		dst.y = src1.z * src2.x – src1.x * src2.z
		dst.z = src1.x * src2.y – src1.y * src2.x
	m33	умножение вектора на матрицу 3х3
		dst.x = dp3(src1, src2[0])
		dst.y = dp3(src1, src2[1])
		dst.z = dp3(src1, src2[2])
	m34	умножение вектора на матрицу 3х4
		dst.x = dp4(src1, src2[0])
		dst.y = dp4(src1, src2[1])
		dst.z = dp4(src1, src2[2])
	m44	умножение вектора на матрицу 4х4
		dst.x = dp4(src1, src2[0])
		dst.y = dp4(src1, src2[1])
		dst.z = dp4(src1, src2[2])
		dst.w = dp4(src1, src2[3])	
	kil	отмена обработки фрагмента
		прекращает выполнение фрагментного шейдера, если значение src1
		меньше нуля, обычно используется для реализации alpha-test,
		когда нет возможности сортировки порядка полупрозрачных объектов.
	tex	выборка значения из текстуры
		заносит в dst значение цвета в координатах src1 из текстуры src2
		также принимает дополнительные параметры, перечисленные
		через запятую, например:
			tex ft0, v0, fs0 <2d,repeat,linear,miplinear>
		данные параметры нужны для обозначения:
		формата текстуры	2d, cube
		фильтрации		nearest, linear
		мипмаппинга		nomip, miplinear, mipnearest
		тайлинга		clamp, repeat

Other operators, including conditional jumps and loops, are planned to be implemented in future versions of Flash Player. But this does not mean that now you can not even use the usual if, the slt and sge instructions are quite suitable for these tasks.

Effects

We got acquainted with the basics, now the most interesting part of the article is the practical application of new knowledge. As stated at the very beginning, the ability to write a shader completely unties the hands of the graphics programmer, i.e. actual limitations are only in the developer’s imagination and mathematical savvy. Previously, it was possible to verify that the assembler language itself was simple, but behind the simplicity lies the complexity of “tasting” into already forgotten code. Therefore, I highly recommend commenting on key areas of the shader code in order to quickly navigate it if necessary.

Billet

The starting point for all subsequent examples will be a small “blank” in the form of a teapot. Unlike the example with a triangle, we need a matrix of projection and transformation of the camera to create the effect of perspective and rotation around the object. We will pass it to constant registers. It is important to remember that a 4x4 matrix occupies exactly 4 registers, and when it is written to the vc0 register, v0..v3 will be occupied. Also, a constant vector from the numbers often used in the shader (0.0, 0.5, 1.0, 2.0) is useful to us.
In total, the basic shader code will look like this:

// vertex
	m44 op, va0, vc0	// применяем viewProj матрицу
// fragment
	mov ft0, fc0.xxxz	// занесём в ft0 чёрный непрозрачный цвет
	mov oc, ft0		// вернём ft0 в качестве цвета пикселя

Texture mapping

Overlay of up to 8 textures is possible in a shader, with an almost unlimited number of samples. This means that this limit does not really matter when using atlases or cubic textures. We will improve our example and, instead of setting the color in the fragment shader, we will get it from the texture using the texture coordinates-interpolators received from the vertex shader:

// vertex
	...
	mov v0, va1	// передаём в фрагментный шейдер текстурную координату
// fragment
	tex ft0, v0, fs0 <2d,repeat,linear,miplinear>

Lambert shading

The most primitive lighting model that simulates the real. It is based on the position that the intensity of light falling on the surface linearly depends on the cosine of the angle between the incidence vectors and the normal to the surface. From the school course of mathematics, we recall that the scalar product of unit vectors gives the cosine of the angle between them, therefore, our Lambert lighting formula will look like:
Lambert = Diffuse * (Ambient + max (0, dot (LightVec, Normal)))
Color = Lambert
where Diffuse is the color of the object at the point (taken from the texture for example),
Ambient is the color of the background lighting
LightVec is the unit vector from the point to the light source
Normal- perpendicular to the surface
Color - the final color of the pixel. The

shader will take two new constant parameters: the position of the source and the value of the background light:

// vertex
	...
	mov v1, va2		// v1 = normal
	sub v2, vc4, va0	// v2 = lightPos - vertex (lightVec)
// fragment
	...
	nrm ft1.xyz, v1		// normal ft1 = normalize(lerp_normal)
	nrm ft2.xyz, v2		// lightVec ft2 = normalize(lerp_lightVec)
	dp3 ft5.x, ft1.xyz, ft2.xyz	// ft5 = dot(normal, lightVec)
	max ft5.x, ft5.x, fc0.x	// ft5 = max(ft5, 0.0)
	add ft5, fc1, ft5.x		// ft5 = ambient + ft5
	mul ft0, ft0, ft5		// color *= ft5

Phong shading

Introduces the concept of glare from a light source into the Lambert lighting model. It implies that the intensity of the flare is determined by the power-law function with respect to the cosine of the angle between the vector to the source and the direction resulting from the reflection of the observer's vector relative to the normal to the surface.
Phong = pow ( max (0, dot (LightVec, reflect (-ViewVec, Normal))), SpecularPower) * SpecularLevel
Color = Lamber + Phong
where ViewVec is the observer's gaze vector
SpecularPower is the degree determining the size of the
specularLevel flare is the level of glare intensity or its color
reflect- reflection calculation function f (v, n) = 2 * n * dot (n, v) - v

For complex models it is customary to use Specular and Gloss cards that determine the color / intensity (SpecularLevel), as well as the specular power (SpecularPower) on different parts of the texture space of the model. In our case, we will manage with constant values of degree and intensity. We will pass a new parameter to the vertex shader - the observer position for the subsequent calculation of ViewVec:

// vertex
	...
	sub v3, va0, vc5		// v3 = vertex - viewPos  (viewVec)
// fragment
	...
	nrm ft3.xyz, v3		// viewVec ft3 = normalize(lerp_viewVec)
	// расчёт вектора отражения reflect(-viewVec, normal)
	dp3 ft4.x, ft1.xyz ft3.xyz	// ft4 = dot(normal, viewVec)
	mul ft4, ft1.xyz, ft4.x	// ft4 *= normal
	add ft4, ft4, ft4		// ft4 *= 2
	sub ft4, ft3.xyz, ft4	// reflect ft4 = viewVec - ft4
	// phong
	dp3 ft6.x, ft2.xyz, ft4.xyz	// ft6 = dot(lightVec, reflect)
	max ft6.x, ft6.x, fc0.x	// ft6 = max(ft6, 0.0)
	pow ft6.x, ft6.x, fc2.w	// ft6 = pow(ft6, specularPower)
	mul ft6, ft6.x, fc2.xyz	// ft6 *= specularLevel
	add ft0, ft0, ft6		// color += ft6

Normal mapping

A relatively simple method for simulating a surface topography using a normal texture. The direction of the normal in such a texture is usually set in the form of RGB values obtained from reducing its coordinates to the range 0..1 (xyz * 0.5 + 0.5). Normals can be represented both in the object space (Object Space), and in relative space (Tangent Space), built on the basis of texture coordinates and the normal to the vertex. The first has a number of sometimes significant drawbacks in the form of a large memory consumption for textures due to the impossibility of tiling and mirror-texturing, but it allows you to save on the number of instructions. In the example, we will use a more flexible and general version with Tangent Space, for which, in addition to the normal, two additional additional base vectors Tangent and Binormal are required.

// vertex
	...
	// transform lightVec
	sub vt1, vc4, va0	// vt1 = lightPos - vertex (lightVec)
	dp3 vt3.x, vt1, va4
	dp3 vt3.y, vt1, va3
	dp3 vt3.z, vt1, va2
	mov v2, vt3.xyzx	// v2 = lightVec
	// transform viewVec
	sub vt2, va0, vc5	// vt2 = vertex - viewPos (viewVec)
	dp3 vt4.x, vt2, va4
	dp3 vt4.y, vt2, va3
	dp3 vt4.z, vt2, va2
	mov v3, vt4.xyzx	// v3 = viewVec
// fragment
	tex ft1, v0, fs1 <2d,repeat,linear,miplinear>	// ft1 = normalMap(v0)
	// 0..1 to -1..1
	add ft1, ft1, ft1	// ft1 *= 2
	sub ft1, ft1, fc0.z	// ft1 -= 1
	nrm ft1.xyz, ft1	// normal ft1 = normalize(normal)
	...

Toon shading

A kind of non-photorealistic lighting model that simulates a cartoon drawing of shading. It is implemented in a variety of ways, the simplest of which is to select a color from a 1D texture by the cosine of the angle from the Lambert model. In our case, for example, use a 16x1 texture:

// fragment
	...
	dp3 ft5.x, ft1.xyz, ft2.xyz		// ft5 = dot(normal, lightVec)
	tex ft0, ft5.xx, fs3 <2d,nearest>	// color = toonMap(ft5)

Sphere mapping

The easiest option to simulate reflection, often used for the effect of chrome plating of metal. Represents the environment in the form of a texture with a spherical distortion of the fisheye type, as shown below:

The main task is to transform the coordinates of the reflection vector into the corresponding texture coordinates:
uv = (xy / sqrt (x ^ 2 + y ^ 2 + (z + 1) ^ 2)) * 0.5 + 0.5
Multiplication and a shift by 0.5 are needed to bring the normalized result to the space of texture coordinates 0..1. In the simple case for an ideally reflecting surface, the influence of the map is additive, and for more complex cases when a diffuse component is required, it is customary to use the approximation of Fresnel formulas. Also for complex models Reflection maps are often used, indicating the intensity of reflection of different parts of the texture of the model.

// fragment
	...
	add ft6, ft4, fc0.xxz	// ft6 = reflect (x, y, z + 1)
	dp3 ft6.x, ft6, ft6		// ft6 = ft6^2
	rsq ft6.x, ft6.x		// ft6 = 1 / sqrt(ft6)
	mul ft6, ft4, ft6.x		// ft6 = reflect / ft6
	mul ft6, ft6, fc0.y		// ft6 *= 0.5
	add ft6, ft6, fc0.y		// ft6 += 0.5
	tex ft0, ft6, fs2 <2d,nearest>	// color = reflect(ft6)

This is probably the end. The examples presented here, for the most part, describe the properties of the material of the object, but shaders find their application in other tasks, such as skeletal animation, shadows, water, and other relatively complex tasks (including non-visual). And with proper pumping of skills, they allow you to implement quite complex things by type in a short time:

Conclusion

Flash Games - It's Easy! example to the article .

Tags: