KPHP and engines release

    Quite often, speaking at various conferences, we shared the desire to release KittenPHP under an open license, in accordance with the tradition laid down by large IT companies such as Google and Facebook.

    This event was postponed several times due to the fear that we would not have enough time and energy to interact with the opensource community, but in the end the treasured day came, and the code of KPHP and some other tools used inside the project was made publicly available.

    In this regard, under the cut you will find a more detailed story about the internal structure of VKontakte and those tools that are now available to the open source community.




    Source codes were laid out under the GNU licenses (GPL and LGPL). These licenses are ideologically close to us, since when creating these libraries we often used tools licensed specifically by GNU.

    Kphp

    The source code for VKontakte is being developed in a PHP-like language called KittenPHP or KPHP for short. This code is translated in C ++ by a special translator of the same name. After that, the generated C ++ code is automatically compiled with gcc, resulting in a binary ready to run. This binary is a web server that accepts http requests and generates pages.
    In order to speed up the development process, KPHP compiles various project files separately, and then links. In subsequent compilations, only modified files are processed, or, in the case of large files, only parts of them.

    KPHP is a minimalistic language designed to provide a very high speed of work, without compromising on the convenience and speed of development. In this regard, KPHP does not support all the features of PHP, in particular, it does not have OOP, with the exception of some objects of the standard library. In addition, eval and related things, such as regular expressions with the 'e' modifier, are not supported (instead, the preg_replace_callback function is suggested). Also, functions for working with specific elements of arrays are not supported: first, end, next, prev, current, reset, key; to replace them, getValueByPos and getKeyByPos functions are implemented.
    The rejection of support for a large amount of functionality allowed KPHP to become incredibly fast compared to other tools for web development.
    As an example, we compared it with the Facebook developed HipHop VM and got the following results:
    TestsKphpHhvmPhp
    simple0.000 0.007 0.137
    simplecall 0.000 0.004 0.174
    simpleucall 0.007 0.008 0.178
    simpleudcall 0.007 0.009 0.181
    mandel 0.010 0.066 0.392
    mandel2 0.011 0.074 0.355
    ackermann (7) 0.001 0.011 0.189
    ary (50,000) 0.003 0.008 0.024
    ary2 (50,000) 0.003 0.010 0.022
    ary3 (2000) 0.011 0.077 0.191
    fibo (30) 0.003 0.019 0.481
    hash1 (50,000) 0.018 0.034 0.044
    hash2 (500) 0.011 0.021 0.039
    heapsort (20,000) 0.012 0.040 0.101
    matrix (20) 0.007 0.021 0.121
    nestedloop (12) 0.000 0.012 0.235
    sieve (30) 0.013 0.016 0.114
    strcat (200000) 0.002 0.005 0.014
    results0.1190.4422.992


    The test code is available at:
    gist.github.com/anonymous/9391146#file-bench-php

    From a development point of view, KPHP is sufficiently compatible with PHP so that you can use regular PHP to quickly test the written code, and only compile the code before the final testing and rolling out the project. To support functions implemented in KPHP, but not available in regular PHP, a special library github.com/vk-com/kphp-kdb/tree/master/vkext was added that extends the capabilities of PHP.

    In addition, KittenPHP is a good static PHP code analyzer that points out possible errors. For example, in the process of transferring VKontakte to it a year ago, more than 20 serious bugs were found.

    Together with the compiler under an open license, the developers laid out a set of engines that perfectly complement KPHP, but can be used separately from it. For the first time, we announced these libraries to the open source community at Highload 2010 , so we apologize for the rather long waiting period.

    PMemcached (“Persistent Memcached”)

    Reliable key-value storage that allows you to store data without a time limit. According to the MC protocol, the engine works identically with Memcache, except that after a reboot all the data remains.
    In addition to its basic functions, when you enable the corresponding option in the configuration, pmemcached allows you to immediately receive groups of records whose key prefix matches the one specified in the request.

    Lists

    This engine allows you to store and retrieve various lists of data.
    One copy of the engine can store a set of lists. Each list must have an identifier (int) by which to work with this list.
    Each list can have an unlimited number of items. Each element must also have an identifier (int), a value (int), a flag (int) and can store arbitrary 256 characters of text.
    In addition to receiving lists, it is possible to receive sub-lists by filtering by flags and sorting by values.

    Documentation: github.com/vk-com/kphp-kdb/blob/master/docs/en/KittenDB_Lists.wiki

    Lists-x

    Modification of the Lists engine, which allows the use of keys and record identifiers, consisting not of one number (int), but of the number of numbers (int) predefined in the engine configuration. For example, this allows you to create lists, the key of which is formed from the user ID and the record ID on its wall.

    Documentation: github.com/vk-com/kphp-kdb/blob/master/docs/en/KittenDB_Lists-X.wiki

    Search

    Designed to search for data on the site. Any textual information can be indexed in the engine with a specific identifier, and subsequently found by the words in the text. The identifiers specified during indexing will be returned in the search results.
    Search supports arbitrary parameters for searching by criteria, and special parameters for various sortings. The engine also allows for complex groupings and intersections.

    Documentation: github.com/vk-com/kphp-kdb/blob/master/docs/en/KittenDB_Search.wiki

    Storage

    The engine is designed to store user data - photos, videos, audio, documents. Thanks to storing different content in one file and indexing in offsets memory, Storage copes with this task better than using the classical approach of storing individual files in a file system.

    Documentation: github.com/vk-com/kphp-kdb/blob/master/docs/en/KittenDB_Storage.wiki

    Texts

    The Texts engine allows you to store various text data arrays. Initially, it was developed for VK's personal messaging system, but was later reused for walls and for comments.
    In addition to storing texts, the engine supports various groupings of lists with texts and text search. Thanks to him, an instant search is available in the entire personal correspondence of the user, no matter how big it is.
    Also, an HTTP server is built into this engine, which implements a long poll for receiving updates from the client side. However, later a separate queue engine was created for this purpose, which is described below.

    Documentation: github.com/vk-com/kphp-kdb/blob/master/docs/en/KittenDB_Texts.wiki

    Hints

    Hints solves two important tasks:
    1) Designed to search for user objects by word prefixes, used in quick searches on the site.
    2) Allows you to generate ratings of objects with which you can sort the lists of objects by the degree of interest in them with the user. For example, the VKontakte friends list works this way.

    Documentation: github.com/vk-com/kphp-kdb/blob/master/docs/en/KittenDB_Hints.wiki

    Queue

    Queue allows you to organize communication between the client and server sides in real time. The client connects to the Queue server assigned to it and receives updates from it, and the server can send the corresponding event to the client at any time. Thanks to the use of channels that the client can subscribe to when connecting to Queue, the engine can be used to transfer data one to many, for example, when the user has a news page open, he subscribes to the queue for events of all his friends, groups and subscriptions. When someone from this list publishes a record, he also writes it to the corresponding queue subscription, and each subscribed user receives information about this on the client, after which the latter can display the received data.

    Documentation:github.com/vk-com/kphp-kdb/blob/master/docs/en/KittenDB_Queue.wiki

    In addition to the above, you can find in the repository a number of other, not so universal, but no less interesting tools, the documentation for which you will find here .

    Conclusion

    By publishing these developments, we are returning the debt to the open-source community, to which we owe a lot. 

    We hope that now they will help the projects currently under development, as MySQL, Memcache, nginx and PHP helped create VKontakte in due time.

    You can see the source code of the engines and KPHP in the repository on github: github.com/vk-com/kphp-kdb
    Detailed documentation is located at: github.com/vk-com/kphp-kdb/tree/master/docs/ru

    Also popular now: