Manticore November 26, 2009 at 12:27

DDB on top of RDB in Python

A few months ago, I needed a document database for one planned Python project. I don’t know how it happened, but when I googled, I managed not to notice a single existing solution and came to the conclusion that the only way out was to write my own. Since I wanted to get a working version as quickly as possible, I decided to build on existing relational databases so as not to worry about implementing storage, searching, transactions, and the like. In the end, my creation took on a decent look, and I decided to write a note about it here - maybe someone else will come in handy.

Where to get

For the most impatient:

project page with documentation and download link - pypi.python.org/pypi/brain
project git repository - github.com/Manticore/brain

The database is made in the form of a module and requires Python 3 for use (and for running unit tests - Python 3.1, very good methods have been added there). Immediately I apologize to all orthodoxes who still cannot decide to switch even to 2.6 - I, of course, could port my module to 2. *, but I was too lazy.

As I have already said, the module uses the usual relational database as its bottom layer. Currently supported are SQLite (the one that comes with Python) and Postgre 8 (if py-postgresql is installed).

… and what to do with it

I believe that document databases are already familiar to many, so I’ll be brief. So, the basic concept for us is an object. An object has a unique identifier and a certain amount of data of various types. In my database, the data is a combination of the simplest types - int, float, str, bytes and None - and the complex types - dict and list. The level of nesting is limited only by the relational engine, I'll explain the reasons later. The object identifier itself can also be stored in another (or in the same) object. Objects can be created, deleted, and their contents changed arbitrarily; Well, and, of course, look for the necessary objects according to the specified criteria.

So, suppose you have already downloaded and installed the module. Or even just presented it - it doesn’t matter. As an illustration of working with the module, I simply copy here some examples from the documentation (for those who are too lazy to follow the link above).

So, connect the module and create a connection. For simplicity, the default relational engine (SQLite) and an in-memory database will be used. Now create a couple of objects. Pay attention to the nested list in the second object. Objects can be read in their entirety or a specific section can be selected. In the second case, the path to the desired data is used - just a list, where the line means the key in the dict, and the number is the index in the list. The contents of the object can be changed. And finally, the desired object can be found.

>>> import brain

>>> conn = brain.connect(None, None)

>>> id1 = conn.create({'a': 1, 'b': 1.345})

>>> id2 = conn.create({'id1': id1, 'list': [1, 2, 'some_value']})

>>> print(conn.read(id1))

{'a': 1, 'b': 1.345}

>>> print(conn.read(id2, ['list']))

[1, 2, 'some_value']

>>> conn.modify(id1, ['a'], 2)

>>> print(conn.read(id1))

{'a': 2, 'b': 1.345}

>>> import brain.op as op

>>> objs = conn.search(['list', 0], op.EQ, 1)

>>> print(objs == [id2])

True

The condition used is decoded as "The 0th element of the list located in the 'dict' key of the root dictionary is 1".

And it's all?

Not really. For lists, there is a special insert command (working in much the same way as the Python equivalent). Objects and their parts can be deleted (including by mask). Search terms can be combined using logical operators. Oh yes, transactions are still supported, there is a simple RPC server and a cache connection (rather stupid). You can read about all this in the documentation.

What's inside?

Inside, everything is pretty simple. For each unique path, a table is created with the corresponding name (moreover, paths that differ only in the indices in the lists use the same table). For example, for data of the form {'key': [{'key2': 'val'}]}, the value of 'val' will be stored in a table named “field.TEXT.key..key2” (an empty space between the two points says that the key 'key' is a list). In addition, there is one large table in which for each object its fields and their types are recorded. Thus, if something suddenly goes wrong, then the data can be completely restored using the available tools for working with relational databases.

This results in a restriction on the level of nesting of stored data structures - it all depends on what maximum length of the table name the relational database supports.

So, what is next?

As I said, the existing version of the database is quite sufficient for my purposes. But if suddenly someone believes that this creation may be useful to someone else, then I will be glad to add the necessary features. And, of course, any constructive criticism of the code / architecture / documentation is welcome.

Tags: