Intro to complex queries in SQLAlchemy


    When I visited PyConRu 2014, I was surprised to learn that a fairly large audience of python developers does not use SQLAlchemy as the main tool for working with the database. After discussing this topic after Light Talks with colleagues, a decision was made - by all means write an article about what can be done with all the power of SQLAlchemy.


    Usually, writing a website does not require anything from a regular ORM. And if required, it is enough to replace the abnormal or read the main part of the documentation. And, as a rule, you don’t have to rack your brains over complex queries. Quite a lot of different ORMs offer the classic One-2-Many, One-2-One, Many-2-Many, etc. For ordinary queries and links this is enough. Unfortunately, in large projects it cannot do without special cases, and programmers with complex queries write either raw sql, or rely on what the basic ORM functionality offers them. It does not look quite beautiful or creates a rather large load on the database.

    It is clear that in pursuit of the speed of script execution, you can sacrifice the beauty of the code, but what if the speed can be neglected, but the cross-platform - no? Yes, and I do not want to see anything other than python code in python code. But what if you want to use your favorite ORM (SQLAlchemy for me) to the fullest and not write raw sql queries?

    It is assumed that you have already configured access to the database and you know how to map classes, create a session and make simple queries (this is described in the documentation for SQLAlchemy www.pythoncentral.io/series/python-sqlalchemy-database-tutorial ).

    Below is a set of classes that we will use in our examples. Of course, it will not cover all the many cases that may arise, but I hope it will give you a low start for writing your own complex queries and will help get rid of the manual writing of complex SQL queries.

    Short note: I will not register all imports for each example.
    Here are some features you might need:

    from sqlalchemy import func, and_, or_, not_, aliased
    


    For the rest, see the documentation.

    I want to separately note the func function. This function allows you to generate almost any expression for your database.

    
    from sqlalchemy import Column, Integer, String
    from sqlalchemy.schema import ForeignKey
    class UserStatus(Base):
    	__tablename__ = 'user_statuses'
    	STATUS_INITIAL = 1
    	id = Column(Integer(), primary_key=True)
    	name = Column(String(), unique=True)
    class User(Base):
    	__tablename__ = 'users'
    	id = Column(Integer(), primary_key=True)
    	username = Column(String(), unique=True)
    	password = Column(String(), nullable=False)
    	status_id = Column(
    		Integer(),
    		ForeignKey('user_statuses.id'),
    		nullable=False,
    		default=UserStatuses.STATUS_INITIAL
    	)
    class Role(Base):
    	__tablename__ = 'roles'
    	id = Column(Integer(), primary_key=True)
    	name = Column(String(), unique=True)
    class UserRole(Base):
    	__tablename__ = 'users_roles'
    	user_id = Column(Integer(), ForeignKey('users.id'))
    	role_id = Column(Integer(), ForeignKey('roles.id'))
    class Product(Base):
    	__tablename__ = 'products'
    	id = Column(Integer(), primary_key=True)
    	name = Column(String(), unique=True)
    class Order(Base):
    	__tablename__ = 'orders'
    	id = Column(Integer(), primary_key=True)
    	product_id = Column(Integer(), ForeignKey('products.id'))
    	user_id = Column(Integer(), ForeignKey('users.id'))
    


    Yes, there are no relationships in the structure. I did not begin to add them for one simple reason, we will not need them. This, of course, is cool when all relations and backrefs are registered, but only dependent data can be selected with them. We will try to use all tables at once in different variations.

    Simple join


    For example, we need to take all users with their roles, products, orders and statuses

    
    with SessionContext() as session:
    	query = session.query(User, Role, Product, Order, UserStatus)
    	records = query.all()
    	for user, role, product, order, user_status in records:
    		# execute all you need
    


    In this case, SQLAlchemy will generate an INNER JOIN. This method is good when you have all the indexes in the database (believe me, very often they are not). SQLAlchemy itself will generate a query based on the class data (because we have the connections registered).

    But what if everything is not going so smoothly and there is no way to specify a Foreign Key in the database (different reasons)? To do this, SQLAlchemy allows you to explicitly specify which columns we will link tables to.

    
    with SessionContext() as session:
    	query = session.query(User, Role, Product, Order, UserStatus)
    	query = query.join(UserRole, UserRole.user_id == User.id)
    	query = query.join(Role, Role.id == UserRole.role_id)
    	query = query.join(Order, Order.user_id == User.id)
    	query = query.join(Product, Product.id == Order.product_id)
    	query = query.join(UserStatus, UserStatus.id == User.status_id)
    	records = query.all()
    	for user, role, product, order, user_status in records:
    		# execute all you need
    


    In this case, even if the database has unrelated data, we can select all the necessary records.

    Simple LEFT JOIN


    Suppose we need to take ALL users and even those who do not have orders. Those. if the user has orders, then show them, and if not, then show the user without orders.

    
    with SessionContext() as session:
    	query = session.query(User, Role, Product, Order, UserStatus)
    	query = query.join(UserRole, UserRole.user_id == User.id)
    	query = query.join(Role, Role.id == UserRole.role_id)
    	query = query.join(UserStatus, UserStatus.id == User.status_id)
    	query = query.outerjoin(Order, Order.user_id == User.id)
    	query = query.outerjoin(Product, Product.id == Order.product_id)
    	records = query.all()
    	for user, role, product, order, user_status in records:
    		# execute all you need
    


    Here we applied the outerjoin function, which for PostgreSQL will generate a LEFT OUTER JOIN query.

    Complicated Queries


    Sometimes a situation arises when we need to do a lot of sortings and add a lot of conditions. Complicates everything, often the base itself.

    For example. Suppose users can order the same product several times. You need to select a record showing whether the user has purchased the product for each user and each product. Moreover, if the user has bought the product more than once, it does not matter which purchase the information will be selected. If the user did not buy this product, then the desired behavior is similar to left join. To group the results, you can use GROUP BY if the purchase data were not important to us. Otherwise, you must specify all the required fields (which are specified in select) for GROUP BY, which is extremely undesirable, as it will create additional load on the database. To select data, it is better to use DISTINCT ON, which simply cuts off duplicate records by product ID and user ID. The problem is that PostgreSQL requires those columns which are listed in DISTINCT ON were present in ORDER BY. And I want to sort the output by user name (for example). This is where the fun begins. Fortunately, the database allows you to “wrap” one query into another.

    SQLAlchemy has a cte () * Common Table Expression * function. This function creates a subquery from your query.

    Example
    
    with SessionContext() as session:
    		query = session.query(User, Role, Product, Order, UserStatus)
    		query = query.distinct(Product.id, User.id)
    		query = query.join(UserRole, UserRole.user_id == User.id)
    		query = query.join(Role, Role.id == UserRole.role_id)
    		query = query.join(UserStatus, UserStatus.id == User.status_id)
    		query = query.outerjoin(Order, Order.user_id == User.id)
    		query = query.outerjoin(Product, Product.id == Order.product_id)
    		query = query.order_by(Product.id, User.id)
    		found_records = query.cte()
    		main_query = session.query(found_records).order_by(found_records.c.user.username)
    		records = main_query.all()
    		for user, role, product, order, user_status in records:
    				# execute all you need
    


    In this example, we "isolated" the main query and sorted the result as we needed. You can also do this with complex selections with group_by.

    SQLAlchemy has another useful tool called subquery. This function allows you to use the generated statement in large queries or in JOIN.

    
    with SessionContext() as session:
    	stmt = session.query(Order.user_id, sqlalchemy.func.count('id').label('users_found')).subquery()
    	main_query = session.query(User, stmt.c.users_found).outerjoin(stmt, User.id==stmt.c.user_id).order_by(User.id)
    	records = main_query.all()
    	for user, orders in records:
    		# execute all you need
    


    This example makes a subquery inside the query through the LEFT OUTER JOIN and returns the number of orders for each user found.

    Also, there are situations when you need to link the same table in one query. There is an aliased () function for this. To be honest, I have never used it yet, so I'll take an example from the documentation

    with SessionContext() as session:
    	adalias1 = aliased(Address)
    	adalias2 = aliased(Address)
    	for username, email1, email2 in \
    		session.query(User.name, adalias1.email_address, adalias2.email_address).\
    		join(adalias1, User.addresses).\
    		join(adalias2, User.addresses).\
    		filter(adalias1.email_address=='jack@google.com').\
    		filter(adalias2.email_address=='j25@yahoo.com'):
    		print username, email1, email2
    


    Well, in the end, an example that is used in my code and works on PostgreSQL. The over () function is not in the postgresql dialect section, so it will most likely work everywhere. In this case, I want to show how you can work with the func function.

    
    from sqlalchemy import over
    with SessionContext() as session:
    	query = session.query(
    		User,
    		over(
    			func.row_number(),
    		        partition_by=User.id
    		)
    	)
    	for user, row_number in query.all():
    		print "{0}# {1}".format(row_number, user.username)
    


    This example will display the numbered usernames in the order in which they were found in the database.

    At the end of the article I repeat. This code may not be the fastest, but at least you can support many databases supported by SQLAlchemy.
    For example, for development and testing, you can not deploy a full-fledged relational database on the local machine, but rather use SQLite.

    The advantages of this approach include:
    - Code uniformity. In python files there will be only python
    - Escaping possible input data using SQLAlchemy tools
    - Ability to use various databases without changing the query syntax
    - Ability to not know SQL at all there is no direct use of SQL.
    - No need to write kilometer queries yourself

    Minuses:
    - Memory consumption for function calls, for storing generators in the memory. The string still takes up less memory space.
    - It is impossible to see immediately what SQLALchemy generated (only at the time of execution)
    - You need to know SQLAlchemy, its functions and capabilities in order to use it correctly. Still, the library is not small.

    More about SQLAlchemy ORM can be found here: sqlalchemy.org

    Also popular now: