Lua + FFI vs. Javascript

    Smallpic

    This short article does not pretend to be an article.

    Last time, I compared LuaJIT 2.0 Beta 5 and JavaScript in various browsers using a simple ray tracer as an example. Comparison result: JavaScript in Chrome scored 20,000 RPS and took 1st place, and LuaJIT - 5,000 RPS and last place.

    With the release of LuaJIT 2.0 Beta 6, the situation has changed: Lua easily came out on top of Chrome. Let's see how it turned out.



    Imagine that you have a large array that needs to be filled with numbers. How do you do this? Here is an example implementation on Lua:

    a = {}
    for i = 1, n do
      a[i] = i*i - 0.5
    end
    


    For large n, this works very slowly: Lua does not know in advance what size the array will be and therefore is forced to increase the size of this array dynamically. Lua does not even know that the indices of the array are numbers in the range 1..n, and the values ​​are integers, so he has to rely on the worst case scenario when they once write to the array like this:

    a['qqq'] = {red = 1, green = 0.5, blue = 0.8}
    


    This versatility slows down the program. I would like to inform Lua somehow that we have an array of the form “double a [n]”. You cannot do this with standard Lua tools, but you can add an extension to Lua - the language allows this - and get what you need. This extension is called FFI. Here's how to solve the array problem:

    ffi = require'ffi'
    a = ffi.new('double[?]', 1 + n)
    for i = 1, n do
      a[i] = i*i
    end
    


    This simple code change increases the speed many times and reduces the memory many times. Just what you need for a ray tracer.

    The previous ray tracer kept in memory a table consisting of flowers - small tables with three fields. A ray was launched through each pixel, its color was calculated, and this color fell into the table. It looked something like this:

    pixels = {}
    for x = 1, width do
    	for y = 1, height do
    		local color = raytrace(x, y)
    		pixels[y*width + x] = color
    	end
    end
    


    During operation, this pixel table grew, the time for adding a new element also increased, and the speed of the ray tracer fell. The result is 5,000 RPS (rays per second) and last place.

    With the advent of FFI, it became possible to represent the pixels table as an array, having previously allocated memory. The algorithm has become like this:

    ffi = require'ffi'
    pixels = ffi.new('float[?]', width*height*3)
    i = 0
    for y = 1, height do
    	for x = 1, width do
    		local color = raytrace(x, y)
    		pixels[i + 0] = color[1]
    		pixels[i + 1] = color[2]
    		pixels[i + 2] = color[3]
    		i = i + 3
    	end
    end
    


    The code has become a little longer than before, but in other places the code has been simplified: for example, saving such an array in a BMP file is easier. This simple optimization does three things:

    1. The amount of memory is reduced to 25 megabytes and does not grow during operation.
    2. The speed of the ray tracer does not depend on the size of the resulting image.
    3. Speed ​​increases to 40,000 RPS


    For comparison: the best result of the previous comparison - JavaScript + Chrome - received 20,000 RPS and spent 150 MB of memory.

    Below the test results are partially taken from the previous comparison. Raytrace programs sent the same scene to a screen of 1000 × 1000 pixels by passing 3 beams through each pixel.

    Luajit40,000 RPS25 Mb
    Chrome20,400 RPS150 Mb
    Opera15,700 RPS
    Firefox9,300 RPS
    Explorer9,000 RPS


    It remains to say that I wrote the Lua raster tracer in a straightforward manner and with each operation on vectors (addition, multiplication by a number) it creates a new vector with the result. This bunch of constantly-created vectors does the job of the garbage collector. If you do not create extra vectors, then the speed of the ray tracer will increase further.

    The ray tracer of which I spoke lies here . Run the command "luajit main.lua". Luajit version at least 2.0 Beta 6.

    Also popular now: