GraalVM: mixed in a bunch of C and Scala

  • Tutorial

I don’t know how about you, but I have been recently impressed with articles about new Java technologies - Graal, Truffle and all-all-all. It looks as if you had invented a language before, wrote an interpreter, rejoiced at which language is good and sad, which is slow, the native compiler and / or JIT wrote to it, but you still need a debugger ... LLVM is there, and thanks for that. After reading this articlethere was a (somewhat grotesque) impression that, after writing a special kind of interpreter, the work can, in principle, be completed. The feeling that now the "Make a Hurt" button has become available to compiler programmers. No, of course, JIT languages ​​start slowly, they need time to warm up. But, in the end, the time and qualifications of a programmer are also not free - in what world of information technology would we live if we still wrote everything in assembler? No, maybe, of course, everything would have flown (if the programmer correctly laid out the instructions), but as for the total complexity of actively used programs, I have some doubts ...


In general, I understand very well that in the dilemma “the time spent by the programmer vs the ideality of the resulting product (“ manual work ”), the border can be moved to the end of the century, so let's just try today to use the traditional SQLite library without loading the native code in its purest form. We will use the ready-made truffle implementation of the language for LLVM IR, called Sulong.


Disclaimer: this article should not be considered as a story to pros for beginners, but as a kind of laboratory work of the same beginner who is only trying to get used to the technology. And one more thing: LLVM IR cannot be considered completely platform independent.


So, we will need to take, in fact, the sources of SQLite, write the linking code in JavaScala (well, excuse me ...), as well as get GraalVM with binding and Clang (using it we will compile SQLite into LLVM IR, which we will load into our Scala code).


Immediately make a reservation that everything will happen on Ubuntu 18.04 LTS (64 bit). With Mac OS X, big problems, I want to believe, also will not arise, but whether Graal and all its necessary components for Windows are there, I'm not sure. However, even if not now, they will probably appear later.


Training


  1. Downloading our experimental rabbit SQLite (in fact, the repository attached to the article already has everything).
  2. Read the official SQLite In 5 Minutes Or Less article . Since SQLite in this case is used only as an example, this is exactly what you need. How To Compile SQLite comes in handy too.
  3. Download GraalVM Community Edition from here and unpack it. I would not recommend to succumb to provocations add it PATH- why do we need nodeand lliflavorings?
  4. Install clang - in my case it is Clang 6 from the regular Ubuntu repository

Also in my test project will use the sbt build system . For editing the project, I personally prefer IntelliJ Idea Community with a standard Scala plugin.


And here I personally started the first rake: the GraalVM website says that this is just a directory with JDK. Well, if so, then I’ll add it to the Idea as a simple JDK. “1.8,” said Idea. Hm strange. We go to the console in the directory with the Grail, we say bin/javac -version- really 1.8. Well, eight, so eight is not scary. The terrible thing is that org.graalthe Idea does not see packages and all that, and we need them. Well, let's go to File -> Other Settings -> Default Project Structure..., there we see in the JDK settings that the Classpath contains jar files from jre/liband jre/lib/ext. Whether all - did not check. And here is what we presumably need:


Hidden text
trosinenko@trosinenko-pc:~/tmp/graal/graalvm-1.0.0-rc1/jre/lib$ find . -name '*.jar'
./truffle/truffle-dsl-processor.jar
./truffle/truffle-api.jar
./truffle/truffle-nfi.jar
./truffle/locator.jar
./truffle/truffle-tck.jar
./polyglot/polyglot-native-api.jar
./boot/graaljs-scriptengine.jar
./boot/graal-sdk.jar
./management-agent.jar
./rt.jar
./jsse.jar
./resources.jar
./jvmci/jvmci-hotspot.jar
./jvmci/graal.jar
./jvmci/jvmci-api.jar
./installer/installer.jar
./ext/cldrdata.jar
./ext/sunjce_provider.jar
./ext/nashorn.jar
./ext/sunec.jar
./ext/zipfs.jar
./ext/sunpkcs11.jar
./ext/jaccess.jar
./ext/localedata.jar
./ext/dnsns.jar
./jce.jar
./svm/builder/objectfile.jar
./svm/builder/svm.jar
./svm/builder/pointsto.jar
./svm/library-support.jar
./graalvm/svm-driver.jar
./graalvm/launcher-common.jar
./graalvm/sulong-launcher.jar
./graalvm/graaljs-launcher.jar
./charsets.jar
./jvmci-services.jar
./security/policy/unlimited/US_export_policy.jar
./security/policy/unlimited/local_policy.jar
./security/policy/limited/US_export_policy.jar
./security/policy/limited/local_policy.jar

From the total listing, we see some more subdirectories, and judging by what was added for the regular JDK, ./securitywe are not interested. In this case, the method of "" + "- deployed directory-shift-click-click, OK » to add the contents of subfolders truffle, polyglot, bootand graalvm. If something is not found then - we’ll add - it’s something everyday ...


Create a project on Scala


So, it seems the Idea has been tuned. Let's try to create an sbt project. Actually, there are no pitfalls, everything is intuitive, the main thing is not to forget to indicate our new JDK.


Now just create a new scala file and copy-pastecreatively recycle the code written in the Polyglot reference in the section Start Language Javaby clicking in the Target Language - LLVM.


By the way, I recommend paying attention to the abundance of other Start Language: JavaScript, R, Ruby, and even just C, but this is a completely different story that I have not read yet ...


object SQLiteTest {
  val polyglot = Context.newBuilder().allowAllAccess(true).build()
  val file: File = ???
  val source = Source.newBuilder("llvm", file).build()
  val cpart = polyglot.eval(source)
  ???
}

We will not inherit ours objectfrom Appor make the fields private - then they can be accessed from the Scala console (its configuration has already been added to the project).


As a result, we almost (by as much as 80%) rolled up an example from as many as five meaningful lines - it's time to lean back on the stool and read at last what did we writeJavadoc, especially since it’s just main()boring to call it somehow, and in general, our model example is SQLite, so you need to understand what exactly to write instead of the fifth line. Polyglot reference is great, but API documentation is needed. To find it, you need to walk around the repository, there is readme , and they contain links to Javadoc .


In the meantime, the meaning of what we have written is still not clear, we will ask JS the Answer to the Main Question: we select the Scala console configuration in the Idea, and ...


scala> import org.graalvm.polyglot.Context
val polyglot = Context.newBuilder().allowAllAccess(true).build()
polyglot.eval("js", "6 * 7")
import org.graalvm.polyglot.Context
scala> polyglot: org.graalvm.polyglot.Context = org.graalvm.polyglot.Context@68e24e7
scala> res0: org.graalvm.polyglot.Value = 42

... well, everything works, the answer is. And the Question will be left as an exercise to the reader.


Back to the example code. The variable polyglotcontains the context in which different languages ​​live - someone is turned off, someone is turned on, and someone has even initialized lazily. In this harsh world, even for access to files, you need to ask for permission, so in the example we just turn off the restrictions with allowAllAccess(true).


Next, we create a Source object with our LLVM bitcode. We indicate the language and file from where to download this "source code". You can also use the source line directly (as we have already seen), the URL (including from resources in the JAR file), and just an instance java.io.Reader. Next, we calculate the resulting source in context, and get Value . According to the documentation for this method, we will never get null, but exists Value, which is Null. But we still need to download something specific, so ...


Build SQLite


... Think of SQLite not as a replacement for Oracle but as a replacement for fopen ()
- From About SQLite . As you can see, allowing SQLite to run in GraalVM was not a terrible mistake for developers.


Based on the tips from the already mentioned part of the SQLite documentation, as well as the Graal instructions, we compose the command line. Here she is:


clang -g -c -O1 -emit-llvm sqlite3.с \
        -DSQLITE_OMIT_LOAD_EXTENSION \
        -DSQLITE_THREADSAFE=0 \
        -o ../../sqlite3.bc

-O1At least optimization is required for the correct operation of the code inside Sulong, it will -gsave us the names (for these two, as well as other options, read the documentation in more detail ), SQLITE_OMIT_LOAD_EXTENSIONwe use it so that it does not depend on libdl.soin our test example (how would we do this, with it’s not clear to the move), and since linking with pthread is unclear how, and why, we disable thread safety (otherwise it will fail with startup). That's all.


We launch our project


Now we have something to write in the second line:


  val file: File = new File("./sqlite3.bc")

Now we can get the necessary functions from the library:


  val sqliteOpen = cpart.getMember("sqlite3_open")
  val sqliteExec = cpart.getMember("sqlite3_exec")
  val sqliteClose = cpart.getMember("sqlite3_close")
  val sqliteFree = cpart.getMember("sqlite3_free")

And it works - it remains only to call them in the correct order - and that’s it! Well, for example, it sqlite3_openrequires a line with the file name and a pointer to a pointer to a structure (whose interiors we are not interested in from the word at all). Hmm ... and how to form the second argument? Need a function to create pointers - probably it is Sulong-specific. Add to Classpath sulong.jar, restart the whole sbt shell. And nothing. How long, shortly, I didn’t find anything smarter to create a directory libin the root of the sbt project (the standard directory for unmanaged jars) and execute in it


find ../../graalvm-1.0.0-rc1/jre/languages/ -name '*.jar' -exec ln -s {} . \;

After sbt refresh compilation completed successfully. But nothing starts ... Okay, we are returning the Classpath to its place. In general, I thought I’ll add the fifth line. Well, I’ll tell you Javadoc for each of the five, it’s a short article, and everyone will say: “Do we have Twitter here?” ...


It probably took about three hours, but I still tried to wrap the sqlite3_opensecond argument with the function ...


At some point it dawned on me: it’s necessary as in a joke: “What do you start with“ War and Peace ”, read“ Kolobok ”- just for your level” ... So it was sqlite3.ctemporarily replaced bytest.c


void f(int *x) {
  *x = 42;
}

Having stumbled a little more into all sorts of API type conversions of various degrees of privacy, I, to put it mildly, got tired. Only jokes remained in my head. For example, this: "iOS is an intuitive system. To understand it, logic is powerless - you need intuition." Indeed, what is the main principle of GraalVM and this all - everything should be transparent and relaxed, so you need to drop the slightest experience with FFI and think as a developer of a convenient system. We need a container with int. We transfer new java.lang.Integer(0)- record to the zero address. But what we were taught in the basics of C: the difference between an array and a pointer to a zero element is very arbitrary. In fact, the function fsimply takes an array of ints and writes the value to the null element. We try:


scala> val x = Array(new java.lang.Integer(12))
x: Array[Integer] = Array(12)
scala> SQLiteTest.cpart.getMember("f").execute(x)
res0: org.graalvm.polyglot.Value = LLVMTruffleObject(null:0)
scala> x
res1: Array[Integer] = Array(42)

THERE !!!


Тут, казалось бы, быстро написать функцию query и закончить на этом, но что ни передавай в качестве второго аргумента: ни Array(new Object), ни Array(Array(new Object)) — работать оно отказывается, ругаясь на strlen внутри LLVM-биткода O_O (кстати, LLVM IR, в отличие от обычного машинного кода из so-ки вполне себе типизированный).


Ещё энное время спустя я перестал откидывать мысль о том, что просто передать в execute() в качестве первого аргумента java.lang.String и даже Array[Byte] — это уж слишком интуитивно, и переделка нашей void f() это подтвердила.


В итоге во встроенных биндингах Sulong-а (SQLiteTest.polyglot.getBindings("llvm")) была найдена функция с многообещающим именем __sulong_byte_array_to_native. Пробуем:


val str = SQLiteTest.polyglot.getBindings("llvm")
              .getMember("__sulong_byte_array_to_native")
              .execute("toc.db".getBytes)
val db = new Array[Object](1)
SQLiteTest.sqliteOpen.execute(str, db)
scala> str: org.graalvm.polyglot.Value = LLVMTruffleObject(null:139990504321152)
scala> db: Array[Object] = Array(null)
scala> res0: org.graalvm.polyglot.Value = 0
scala> val str = SQLiteTest.polyglot.getBindings("llvm")
                    .getMember("__sulong_byte_array_to_native")
                    .execute("toc123.db".getBytes)
SQLiteTest.sqliteOpen.execute(str, db)
str: org.graalvm.polyglot.Value = LLVMTruffleObject(null:139990517528064)
scala> res1: org.graalvm.polyglot.Value = 0

Работает!!! Ой, а почему с неправильным именем файла тоже работает?.. Затаив дыхание, смотрим в каталог проекта — а там уже лежит новенькая toc123.db. Ура!


Итак, перепишем пример из документации по SQLite на Scala:


  def query(dbFile: String, queryString: String): Unit = {
    val filenameStr = toCString(dbFile)
    val ptrToDb = new Array[Object](1)
    val rc = sqliteOpen.execute(filenameStr, ptrToDb)
    val db = ptrToDb.head
    if (rc.asInt() != 0) {
      println(s"Cannot open $dbFile: ${sqliteErrmsg.execute(db)}!")
      sqliteClose.execute(db)
    } else {
      val zErrMsg = new Array[Object](1)
      val execRc = sqliteExec.execute(db, toCString(queryString), ???, zErrMsg)
      if (execRc.asInt != 0) {
        val errorMessage = zErrMsg.head.asInstanceOf[Value]
        assert(errorMessage.isString)
        println(s"Cannot execute query: ${errorMessage.asString}")
        sqliteFree.execute(errorMessage)
      }
      sqliteClose.executeVoid(db)
    }
  }

Вот только есть одна загвоздка — некий callback. Ну, когда никто не видит, студент-инженер описывает сердечник из дерева, а я попробую написать callback на JavaScript:


  val callback = polyglot.eval("js",
    """function(unused, argc, argv, azColName) {
      |  print("argc = " + argc);
      |  print("argv = " + argv);
      |  print("azColName = " + azColName);
      |  return 0;
      |}
    """.stripMargin)
  // ...
     val execRc = sqliteExec.execute(db, toCString(queryString), callback, Int.box(0), zErrMsg)

И вот, что получаем:


io.github.trosinenko.SQLiteTest.query("toc.db", "select * from toc;")
argc = 5
argv = foreign {}
azColName = foreign {}
argc = 5
argv = foreign {}
azColName = foreign {}
argc = 5
argv = foreign {}
azColName = foreign {}

Ну, магии маловато. К тому же, оказывается, в случае ошибки в zErrMsg лежит какой-то непонятный объект, сам в строку не конвертирующийся. Что же, соберём и загрузим ещё lib.bc, а в его исходнике lib.c напишем следующее:


#include 
void *fromCString(const char *str) {
  return polyglot_from_string(str, "UTF-8");
}

Почему polyglot_from_string недоступен прямо через bindings, я не понял, поэтому вытащим так и сделаем обвязку:


  val lib_fromCString = lib.getMember("fromCString")
  def fromCString(ptr: Value): String = {
    if (ptr.isNull)
      ""
    else
      lib_fromCString.execute(ptr).asString()
  }

Ну, с возвратом сообщений об ошибках разобрались, а вот callback давайте всё же напишем на Scala:


  val lib_copyToArray = lib.getMember("copy_to_array_from_pointers")
  val callback = new ProxyExecutable {
    override def execute(arguments: Value*): AnyRef = {
      val argc = arguments(1).asInt()
      val xargv = new Array[Long](argc)
      val xazColName = new Array[Long](argc)
      lib_copyToArray.execute(xargv, arguments(2))
      lib_copyToArray.execute(xazColName, arguments(3))
      (0 until argc) foreach { i =>
        val name = fromCString(polyglot.asValue(xazColName(i) ^ 1))
        val value = fromCString(polyglot.asValue(xargv(i) ^ 1))
        println(s"$name = $value")
      }
      println("========================")
      Int.box(0)
    }
  }

При этом в наш lib.c добавим ещё такую магию перекладывания из сишного массива в Polyglot-овский:


void copy_to_array_from_pointers(void *arr, void **ptrs) {
  int size = polyglot_get_array_size(arr);
  for(int i = 0; i < size; ++i) {
    polyglot_set_array_element(arr, i, ((uintptr_t)ptrs[i]) ^ 1);
  }
}

Обратите внимание на указатель ^ 1 — нужно это потому, что кто-то слишком умный: а именно, polyglot_set_array_element — это variadic-функция ровно с тремя аргументами, которая принимает и примитивные типы, и указатели на Polyglot values. В итоге, оно работает:


io.github.atrosinenko.SQLiteTest.query("toc.db", "select * from toc;")
name = sqlite3
type = object
status = 0
title = Database Connection Handle
uri = c3ref/sqlite3.html
========================
name = sqlite3_int64
type = object
status = 0
title = 64-Bit Integer Types
uri = c3ref/int64.html
========================
name = sqlite3_uint64
type = object
status = 0
title = 64-Bit Integer Types
uri = c3ref/int64.html
========================
...

Осталось добавить метод main:


  def main(args: Array[String]): Unit = {
    query(args(0), args(1))
    polyglot.close()
  }

в котором, вообще-то, контекст нужно закрыть, но в самом объекте я этого не делал, поскольку после инициализации SQLiteTest он нам, естественно, ещё нужен для Scala-консоли.


На этом я завершаю свой рассказ, а читателю предлагаю:


  1. Try to collect all this using SubstrateVM into a native binary, as if there was no Scala
  2. (*) Do the same, but with profile guided optimization

The resulting files:


SQLiteTest.scala
package io.github.atrosinenko
import java.io.File
import org.graalvm.polyglot.proxy.ProxyExecutable
import org.graalvm.polyglot.{Context, Source, Value}
object SQLiteTest {
  val polyglot: Context = Context.newBuilder().allowAllAccess(true).build()
  def loadBcFile(file: File): Value = {
    val source = Source.newBuilder("llvm", file).build()
    polyglot.eval(source)
  }
  val cpart: Value = loadBcFile(new File("./sqlite3.bc"))
  val lib:   Value = loadBcFile(new File("./lib.bc"))
  val sqliteOpen:   Value = cpart.getMember("sqlite3_open")
  val sqliteExec:   Value = cpart.getMember("sqlite3_exec")
  val sqliteErrmsg: Value = cpart.getMember("sqlite3_errmsg")
  val sqliteClose:  Value = cpart.getMember("sqlite3_close")
  val sqliteFree:   Value = cpart.getMember("sqlite3_free")
  val bytesToNative: Value = polyglot.getBindings("llvm").getMember("__sulong_byte_array_to_native")
  def toCString(str: String): Value = {
    bytesToNative.execute(str.getBytes())
  }
  val lib_fromCString: Value = lib.getMember("fromCString")
  def fromCString(ptr: Value): String = {
    if (ptr.isNull)
      ""
    else
      lib_fromCString.execute(ptr).asString()
  }
  val lib_copyToArray: Value = lib.getMember("copy_to_array_from_pointers")
  val callback: ProxyExecutable = new ProxyExecutable {
    override def execute(arguments: Value*): AnyRef = {
      val argc = arguments(1).asInt()
      val xargv = new Array[Long](argc)
      val xazColName = new Array[Long](argc)
      lib_copyToArray.execute(xargv, arguments(2))
      lib_copyToArray.execute(xazColName, arguments(3))
      (0 until argc) foreach { i =>
        val name  = fromCString(polyglot.asValue(xazColName(i) ^ 1))
        val value = fromCString(polyglot.asValue(xargv(i) ^ 1))
        println(s"$name = $value")
      }
      println("========================")
      Int.box(0)
    }
  }
  def query(dbFile: String, queryString: String): Unit = {
    val filenameStr = toCString(dbFile)
    val ptrToDb = new Array[Object](1)
    val rc = sqliteOpen.execute(filenameStr, ptrToDb)
    val db = ptrToDb.head
    if (rc.asInt() != 0) {
      println(s"Cannot open $dbFile: ${fromCString(sqliteErrmsg.execute(db))}!")
      sqliteClose.execute(db)
    } else {
      val zErrMsg = new Array[Object](1)
      val execRc = sqliteExec.execute(db, toCString(queryString), callback, Int.box(0), zErrMsg)
      if (execRc.asInt != 0) {
        val errorMessage = zErrMsg.head.asInstanceOf[Value]
        println(s"Cannot execute query: ${fromCString(errorMessage)}")
        sqliteFree.execute(errorMessage)
      }
      sqliteClose.execute(db)
    }
  }
  def main(args: Array[String]): Unit = {
    query(args(0), args(1))
    polyglot.close()
  }
}

lib.c
#include 
void *fromCString(const char *str) {
  return polyglot_from_string(str, "UTF-8");
}
void copy_to_array_from_pointers(void *arr, void **ptrs) {
  int size = polyglot_get_array_size(arr);
  for(int i = 0; i < size; ++i) {
    polyglot_set_array_element(arr, i, ((uintptr_t)ptrs[i]) ^ 1);
  }
}

Link to the repository .


Also popular now: