External term format

    If any program needs to transfer data to the Erlang server, this data must first be serialized. That is, they are converted to a binary form, so that the erlang can then unpack them back. This is usually done using ASN.1, google protobuf, thrift, etc. All these are without a doubt worthy products.

    Alternatively, consider using the Erlang external term format. Erlang executable system has two functions - term_to_binary () and binary_to_term () , which can quickly and efficiently pack / unpack any values ​​into this format, and the format itself is well described in the documentation - www.erlang.org/doc/apps/erts/ erl_ext_dist.html

    How it all works.


    Ext. format in its structure is very simple. Usually the data in it has the form "tag, data" or "tag, length, data." The tag describes what type of data is packed.

    For basic data types, tags are

    • Tuple - 104, number of elements (1 byte), all elements
    • List - 108, number of elements (4 bytes), all elements, 106
    • Atom - 115, length (1 byte), atom text
    • Integer - 98, value (4 bytes), or for a short integer (less than 255) - 97, value (1 byte)
    • String - 107, length (2 bytes), text. In principle, a string can be encoded as a list of integers, but if you need to transfer a short string with 8-bit text, this tag is what the doctor ordered


    All packed data must be preceded by tag 131. This is the version number of the current ext. term format.

    That is, the Erlang value [{banknote, 100, rub}] will be packed into the structure:



    As a proof of concept, we will write, for example, on a python a simple procedure that will pack python structures in ext term format, and unpack the result on an erlang ...

    We will pack the Python list to the Erlang list, python tuple to Erlang tuple, integers, python strings. For atoms ... well ... for example, create an heir from a line in python so that they can easily be distinguished from ordinary lines in the packaging procedure.

    It turns out something like:

    from types import IntType, StringType, TupleType, ListType
    from struct import pack
    import socket
    class atom(str): pass
    def _eterm(x,accum):
        if type(x) is IntType:
            accum.append(pack('>Bi',98,x))
            return
        if type(x) is StringType:
            accum.append(pack(">BH",107,len(x)))
            accum.append(x)
            return
        if type(x) is TupleType:
            accum.append(pack("BB",104,len(x)))
            for term in x: _eterm(term,accum)
            return
        if type(x) is ListType:
            accum.append(pack(">BI",108,len(x)))
            for term in x:_eterm(term,accum)
            accum.append(chr(106))
            return
        if isinstance(x,atom):
            accum.append(pack("BB",115,len(x)))
            accum.append(x)
            return
        raise AssertionError("Cannot convert that type to erlang term %s"%(x))
    def binary(X):
        accum = [chr(131)]
        _eterm(X,accum)
        return "".join(accum)    
    


    Excellent. Now we pack some kind of complex structure and transfer it to the erlang in the simplest way.

    pterm = (atom("vcard"),[(atom("firstname"),"Odobenus"),
                              (atom("lastname"),"Rosmarus"),
                              (atom("age"),48),
                              (atom("children"),[
                                                 ("Dimon",1988),
                                                 ("Natashka",1990),
                                                 ("Katka",2000),
                                                 ("Anka",2003)] )
    erlterm = binary(pterm)
    sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
    sock.sendto(erlterm,("localhost",10000))
    


    That is, they packed and sent an udp package. In Erlang we accept and decipher:

    1> gen_udp:open(10000,[binary]).
    {ok,#Port<0.585>}
    2> R=receive {udp,_ ,_,_,Bin} -> Bin end.
    <<131,104,2,115,5,118,99,97,114,100,108,0,0,0,4,104,2,115,
      9,102,105,114,115,116,110,97,109,101,107,...>>
    3> binary_to_term( R ).
    {vcard,[{firstname,"Odobenus"},
            {lastname,"Rosmarus"},
            {age,48},
            {children,[{"Dimon",1988},
                       {"Natashka",1990},
                       {"Katka",2000},
                       {"Anka",2003}]}]}
    


    conclusions


    We passed (with little blood) a complex structure from python to erlang. The advantages of this approach are
    • Flexibility. You can pack anything.
    • Performance. On the erlang, binary_to_term () and term_to_binary () are implemented as BIF, in C, and are well optimized for speed and memory ...


    The disadvantage of ext. term format: Any flexibility has a flip side. In the case of crooked hands or stupid programming, you can pack up structures with which the Erlang server will simply not know what to do.

    Outside of the article, there remained the formats of double numbers, dimensionless integers, transfer of compressed data, packaging binary into this format, etc. etc. But all this is well described in the documentation.

    The main thing is to understand the idea.

    Also popular now: