Again about the performance of ORM, or a new promising project - Pony ORM

    In my first article on Habrahabr, I wrote about one of the main problems of existing ORMs (Object-Relational-Mapping, Object-Relational Mappings) - their performance. Considering and testing two of the most popular and well-known ORM implementations in python, Django and SQLAlchemy, I came to the conclusion: Using powerful universal ORMs leads to very noticeable performance losses. In the case of using fast DBMS engines, such as MySQL, data access performance decreases by more than 3-5 times .

    Recently, one of the developers of the new ORM engine called pony contacted me and asked me to share my thoughts on this engine. I thought that these considerations might be of interest to the Habrahabr community.


    Brief Summary



    I again ran some performance tests similar to those described in the previous article and compared their results with the results shown by pony ORM. To measure performance under the conditions of a cached parameterized request, I had to modify the test for receiving an object so that every new request received an object with a new key.

    Result: pony ORM surpasses the best results of django and SQLAlchemy by 1.5-3 times, even without object caching.

    Why pony turned out to be better



    I must admit right away: I was not able to equip pony ORM with django and SQLAlchemy on equal terms with regular means. This happened because if in django you can cache only designed specific queries, and in SQLAlchemy - prepared parameterized queries (with some non-trivial efforts), then pony ORM caches everything that is possible . Viewing the pony ORM text diagonally showed: cached

    — ready-made SQL query text for a particular DBMS
    — structure of the query compiled from the text
    — relationship translation
    — connections
    — created objects
    — read and modified objects
    — requests for deferred reading
    — requests for creating, updating, and deleting objects
    - requests for searching for objects
    - requests for blocking
    - requests for navigating relationships and modifying them
    - there may be something else that I missed.

    Such caching allows you to execute code that uses it as quickly as possible, without worrying about this tricky tricks about improving productivity like the one that I came up with despair and described here in one of my previous articles.

    Of course, caching sometimes brings some inconvenience. For example, caching of objects does not make it easy to compare the state of an object in memory with its image in the table — this is sometimes necessary for the correct processing of data that is competitively processed in different processes. In one of the pony releases, I personally would like to see options that allow you to optionally disable certain types of caching for a piece of code .

    Wishes



    What am I missing in pony ORM to fully compare it with other ORMs?

    - data migration - an absolutely necessary procedure for large projects using ORM
    adapters to some popular DBMSs, for example MS SQL
    - complete abstraction from a variety of DBMSs in the code
    - access to the complete metadata of the object
    - customization of field types
    - full documentation

    What am I missing in modern ORM, what could be embodied in pony ORM, while this project has not yet grown to a state of stagnation?

    - the use of mixed filters (access to the fields and methods of the object simultaneously in the filter)
    - computable fields and indices for them
    - composite fields (stored in several fields of the table)
    - a field of a nested object (a field representing a regular python object)
    - linking objects from different databases

    Well, of course, I would like to see a holistic framework for creating applications that uses pony ORM as the basis for effective access to the database.

    update 2013-08-03



    If you want to receive answers from Pony ORM authors to your questions, you can contact them at the following addresses: alexander.kozlovsky@gmail.com and m.alexey@gmail.com. Invites are welcome.

    Applications



    Test results


    >>> import test_native
    >>> test_native.test_native()
    get row by key: native req/seq: 3050.80815908 req time (ms): 0.327782
    get value by key: native req/seq: 4956.05711955 req time (ms): 0.2017733
    


    >>> import test_django
    >>> test_django.test_django()
    get object by key: django req/seq: 587.58369836 req time (ms): 1.7018852
    get value by key: django req/seq: 779.4622303 req time (ms): 1.2829358
    


    >>> import test_alchemy
    >>> test_alchemy.test_alchemy()
    get object by key: alchemy req/seq: 317.002465265 req time (ms): 3.1545496
    get value by key: alchemy req/seq: 1827.75593609 req time (ms): 0.547119
    


    >>> import test_pony
    >>> test_pony.test_pony()
    get object by key: pony req/seq: 1571.18299553 req time (ms): 0.6364631
    get value by key: pony req/seq: 2916.85249448 req time (ms): 0.3428353
    


    Test code


    test_native.py

    import datetime
    def test_native():
        from django.db import connection, transaction
        cursor = connection.cursor()
        t1 = datetime.datetime.now()
        for i in range(10000):
            cursor.execute("select username,first_name,last_name,email,password,is_staff,is_active,is_superuser,last_login,date_joined from auth_user where id=%s limit 1" % (i+1))
            f = cursor.fetchone()
            u = f[0]
        t2 = datetime.datetime.now()
        print "get row by key: native req/seq:",10000/(t2-t1).total_seconds(),'req time (ms):',(t2-t1).total_seconds()/10.
        t1 = datetime.datetime.now()
        for i in range(10000):
            cursor.execute("select username from auth_user where id=%s limit 1" % (i+1))
            f = cursor.fetchone()
            u = f[0][0]
        t2 = datetime.datetime.now()
        print "get value by key: native req/seq:",10000/(t2-t1).total_seconds(),'req time (ms):',(t2-t1).total_seconds()/10.
    


    test_django.py

    import datetime
    from django.contrib.auth.models import User
    def test_django():
       t1 = datetime.datetime.now()
       q = User.objects.all()
       for i in range(10000):
           u = q.get(id=i+1)
       t2 = datetime.datetime.now()
       print "get object by key: django req/seq:",10000/(t2-t1).total_seconds(),'req time (ms):',(t2-t1).total_seconds()/10.
       t1 = datetime.datetime.now()
       q = User.objects.all().values('username')
       for i in range(10000):
           u = q.get(id=i+1)['username']
       t2 = datetime.datetime.now()
       print "get value by key: django req/seq:",10000/(t2-t1).total_seconds(),'req time (ms):',(t2-t1).total_seconds()/10.
    


    test_alchemy.py

    import datetime
    from sqlalchemy import *
    from sqlalchemy.orm.session import Session as ASession
    from sqlalchemy.ext.declarative import declarative_base
    query_cache = {}
    engine = create_engine('mysql://testorm:testorm@127.0.0.1/testorm', execution_options={'compiled_cache':query_cache})
    session = ASession(bind=engine)
    Base = declarative_base(engine)
    class AUser(Base):
        __tablename__ = 'auth_user'
        id = Column(Integer, primary_key=True)
        username =  Column(String(50))
        password = Column(String(128))
        last_login = Column(DateTime())
        first_name = Column(String(30))
        last_name = Column(String(30))
        email = Column(String(30))
        is_staff = Column(Boolean())
        is_active = Column(Boolean())
        date_joined = Column(DateTime())
    def test_alchemy():
       t1 = datetime.datetime.now()
       for i in range(10000):
           u = session.query(AUser).filter(AUser.id==i+1)[0]
       t2 = datetime.datetime.now()
       print "get object by key: alchemy req/seq:",10000/(t2-t1).total_seconds(),'req time (ms):',(t2-t1).total_seconds()/10.
       table = AUser.__table__
       sel = select(['username'],from_obj=table,limit=1,whereclause=table.c.id==bindparam('ident'))
       t1 = datetime.datetime.now()
       for i in range(10000):
           u = sel.execute(ident=i+1).first()['username']
       t2 = datetime.datetime.now()
       print "get value by key: alchemy req/seq:",10000/(t2-t1).total_seconds(),'req time (ms):',(t2-t1).total_seconds()/10.
    


    test_pony.py

    import datetime
    from datetime import date, time
    from pony import *
    from pony.orm import *
    db = Database('mysql', db='testorm', user='testorm', passwd='testorm')
    class PUser(db.Entity):
        _table_ = 'auth_user'
        id = PrimaryKey(int, auto=True)
        username =  Required(str)
        password = Optional(str)
        last_login = Required(date)
        first_name = Optional(str)
        last_name = Optional(str)
        email = Optional(str)
        is_staff = Optional(bool)
        is_active = Optional(bool)
        date_joined = Optional(date)
    db.generate_mapping(create_tables=False)
    def test_pony():
        t1 = datetime.datetime.now()
        with db_session:
            for i in range(10000):
                u = select(u for u in PUser if u.id==i+1)[:1][0]
        t2 = datetime.datetime.now()
        print "get object by key: pony req/seq:",10000/(t2-t1).total_seconds(),'req time (ms):',(t2-t1).total_seconds()/10.
        t1 = datetime.datetime.now()
        with db_session:
            for i in range(10000):
                u = select(u.username for u in PUser if u.id==i+1)[:1][0]
        t2 = datetime.datetime.now()
        print "get value by key: pony req/seq:",10000/(t2-t1).total_seconds(),'req time (ms):',(t2-t1).total_seconds()/10.
    

    Also popular now: