Neo4j graph database in PHP

Recently, I have increasingly heard about NoSQL and graph databases in particular. But using habro search, I was surprised to find that there are not so many articles on this topic, but at the request of “Neo4j”, there are 4 results in general, where this name is indirectly mentioned in the text of the articles.

What is Neo4j?


image
Neo4j is a high-performance, NoSQL graph-based database. It does not have such a concept as tables with strictly defined fields; it operates with a flexible structure in the form of nodes and relations between them.

How did I get to this?


For more than a year I have not used SQL in my projects since the time I tried the document-oriented DBMS " MongoDB ". After MySQL, my joy knew no bounds on how simple and convenient everything can be done in MongoDB. For the year, in our website creation studio, rewrote the top three CMS using the main Mongo features with its documents, and from a dozen sites working on their basis. Everything was fine, and I had already begun to forget what it was to write queries in fifty lines for each action from the database and nothing would happen until a project with a bunch of relationships fell on my head that didn’t fit into the documents. I really didn’t want to return to SQL, and a couple of days I spent purely on searching for NoSQL solution that allows us to make flexible connections - on graphical DBMSs. And for a number of reasons, my choice was Neo4j, one of the main reasons was that my engine was written in PHP, and a good driver, " Neo4jPHP ", was written for it , which covers almost 100% of the REST interface provided by the Noe4j server.

Get to the point


Graph databases are primarily designed to solve those problems where the data is closely interconnected in a relationship that can go into several levels. For example, in relational databases, it’s not difficult for us to fulfill the request: “Give me a list of all the actors who were in the movie with Kevin Bacon”.

> SELECT actor_name, role_name FROM roles WHERE movie_title IN (SELECT DISTINCT movie_title FROM roles WHERE actor_name='Kevin Bacon')


He gave an example with a query, you can rewrite it in your head using "JOIN".

But suppose we want to get the names of all the actors who were in the movie with someone who was in the movie with Kevin Bacon. And here we have another JOIN. Now try to add a third degree: “One who was in the movie with someone, who was in the movie with someone, who was in the movie with Kevin Bacon.” Sounds scary, but the task is real and with every new connection we have to add JOIN, and the request will become more complex, time-consuming, less and less productive.

Deep connections are especially relevant in various social projects, when we need to get friends of friends, in tasks of finding routes, etc. Graph databases are designed to solve these problems when our data can be removed from each other by two or more relations. They are solved very elegantly when we model the data as “vertices of the graphs”, and the connections as “edges of the graph” between these nodes. We can do graph traversal using long-known and efficient algorithms.

The above example can be easily modeled as follows: each actor and film are nodes, and the roles are relationships that go from the actor to the movie where they played:

image

Now it becomes very easy to find a way from Kevin Bacon to any other actor.

Some code


First, we need to establish a connection to the database. Since Neo4jPHP works with the database server via the REST interface, there is no permanent connection, and data transfer occurs only when we need to read or write data:

use Everyman\Neo4j\Client,
    Everyman\Neo4j\Transport,
    Everyman\Neo4j\Node,
    Everyman\Neo4j\Relationship;
$client = new Client(new Transport('localhost', 7474));


Now we need to create nodes for each actor and film. This is similar to how we do INSERT in traditional relational DBMSs:

$keanu = new Node($client);
$keanu->setProperty('name', 'Keanu Reeves')->save();
$laurence = new Node($client);
$laurence->setProperty('name', 'Laurence Fishburne')->save();
$jennifer = new Node($client);
$jennifer->setProperty('name', 'Jennifer Connelly')->save();
$kevin = new Node($client);
$kevin->setProperty('name', 'Kevin Bacon')->save();
$matrix = new Node($client);
$matrix->setProperty('title', 'The Matrix')->save();
$higherLearning = new Node($client);
$higherLearning->setProperty('title', 'Higher Learning')->save();
$mysticRiver = new Node($client);
$mysticRiver->setProperty('title', 'Mystic River')->save();


Each node has the setProperty and getProperty methods , which allow you to write arbitrary data to the node to read them. A node does not have a given structure, it is similar to documents in a document-oriented DBMS, although we cannot do embedded data and a property can only be one of two types: a string or a number.
Data is sent to the server only when we call save () and this needs to be done for each node.

Now we need to establish the connections between the actors and the films in which they played. In relational DBMS for this purpose we would create a foreign key, here we will create a relation that can be arbitrarily called to store any parameters in itself, like a node and is also stored in the database:

$keanu->relateTo($matrix, 'IN')->save();
$laurence->relateTo($matrix, 'IN')->save();
$laurence->relateTo($higherLearning, 'IN')->save();
$jennifer->relateTo($higherLearning, 'IN')->save();
$laurence->relateTo($mysticRiver, 'IN')->save();
$kevin->relateTo($mysticRiver, 'IN')->save(); 


As you can see, all relationships are called “IN”, but we can give them any other name, for example, “ACTED IN”. We can also ask the inverse relationship from films to actors and formulate it as a film “HAS” (has) an actor. The paths can be found no matter what direction of communication we create, i.e. we can use any semantics suitable in meaning for a specific subject area. At the same time, there can be multiple relationships between nodes in both directions.

All relationships are set up, and now we are ready to find a connection between any actor in our system and Kevin Bacon to any given depth:

$path = $keanu->findPathsTo($kevin)
    ->setMaxDepth(12)
    ->getSinglePath();
foreach ($path as $i => $node) {
    if ($i % 2 == 0) {
        echo $node->getProperty('name');
        if ($i+1 != count($path)) {
            echo " was in\n";
        }
    } else {
        echo "\t" . $node->getProperty('title') . " with\n";
    }
}


We can also choose not the nodes themselves, but the connections between them, for example:

echo $laurence->getProperty('name') . " was in:\n";
$relationships = $laurence->getRelationships('IN');
foreach ($relationships as $relationship) {
    $movie = $relationship->getEndNode();
    echo "\t" . $movie->getProperty('title') . "\n";
}


getRelationships - can return all relations for a node, it is not necessary to limit it to only a certain type of relationship. We can also receive, only all incoming or outgoing from the communication node.

I will finish this post for now, and I hope it will give some resonance to writing articles on the subject of graph databases and neo4j in particular.

The article used an example from the Neo4jPHP developer site with changes and comments based on my personal experience.

Also popular now: