Loriowar November 5, 2014 at 10:59

ActiveRecord is a little about a rake, Relations and indexes

I want to tell you about the painful: about working with AR in general and with Relation in particular; warn against standard garden products that can easily ruin your life and make the code slow and gluttonous. The narrative will be based on Rails 3.2 and ActiveRecord of the same spill. Rails 4, of course, has a lot of new and useful things, but you still need to switch to it, and the foundation is the same anyway.

This material, for the most part, is intended for beginners, because it is very painful for the author to look at extracting the contents of entire tables into memory in the form of ActiveRecord objects and other shooting limbs when using AR . For developers who have learned Zen, the topic is unlikely to be useful, they can only help, complementing it with their own examples and edifications.

How many times have they repeated to the world ...

If you started working with Relation (and with any ActiveRecord object in general), then you need to clearly present one thing: at what point we “materialize” the selection, that is, at what point we stop constructing the SQL query. In other words: when data is fetched and we move on to processing in memory. Why is it important? Yes, because it’s awkward:

Product.all.find{|p| p.id == 42}

It can hang a server, pick up all the RAM and do many more dirty tricks. And the same thing, but in other words:

Product.find(42)

will work quickly and without consequences. So find and find are not the same thing at all! Why? Yes, because in the first case we said Product.all and shot ourselves in the foot, since this means extracting the entire contents of the products table and for each row to build an AR object, create an array from them and walk through it find , which is a class method Array (generally speaking, find is from Enumerable , but these are already details). In the second case, everything is much better: find is an AR method and is designed to search by pk . That is, we generate a request

SELECT * FROM products WHERE products.id = 42;

We execute it, we receive one line and all.

What is good and what is bad

Now, having figured out why working with AR is a big responsibility, we’ll figure out how not to shoot yourself in the foot. This is quite simple: you need to use the methods that AR provides us. Here they are: where, select, pluck, includes, joins, scoped, unscoped, find_each and a few more, which can be found in the documentation or in the neighboring hub . But what’s better not to use is enumeration will be very difficult and, at the same time, very simple: it is undesirable to use everything else, since almost all the remaining variety of methods turns Relation into Array with all its consequences.

Simple recipes

Now, I will give a few standard and not very designs that make life easier, but which are often forgotten. But before asking the reader a question: remember the has_many function. Think about what parameters you know and which ones you are actively using. List them in your mind, count them ... and now the question is: do you know how many of them really are?

Answer

24 pieces in Rails3 and 12 in Rails4. The difference in 12 pieces is made by methods like where, group , etc., as well as methods for working with pure SQL, which are passed in Rails4 in a block, not in a hash.

Why did I ask this? Yes, in order to very roughly evaluate your level and say that if you know most of the options, then the following is unlikely to bring you new knowledge. This assessment is very conditional, therefore, dear Reader, do not get very angry if it seemed to you ridiculous / untenable / strange / etc (underline as necessary).

Recipe number of times

So, now let's go in order. Everyone knows about update_attributes and update_attribute (or not all?). The first one - massively updates fields with calling validations and callbacks. Nothing interesting. The second one skips all validations, launches callbacks, but can only update the value of one selected field (for some people, save (validate: false) is more like it ). But for some reason they often forget about update_column and update_all . This method skips both validations and callbacks and writes directly to the database without any foreplay.

Recipe number two

The comments reminded about the wonderful touch method . They also often forget about him and write something like

@product.updated_at = DateTime.now
@product.save

@product.update_attribute(:updated_at, DateTime.now)

Although, for good, for such purposes it is easier to do like this:

@product.touch(:updated_at)  # в данном случае параметр можно опустить

In addition, touch has its own after_touch callback , as well as the option : touch is present in the belongs_to method .

How to iterate correctly

The hub already talked about find_each, but I cannot but mention it again, because the constructions

product.documents.map{…}

and they are isomorphic, are found a little more than everywhere. There is only one problem with conventional iterators applied on Relation : they pull everything out of the database at once. And this is terrible. In contrast, find_each , by default, carries 1000 pieces at a time, and it’s just great!

UPD: as already noted in the comments, all methods that are not explicitly projected onto raw-sql are delegated to to_a, which is why the entire query is fetched into memory and working with it is no longer on the DB side, but on the Ruby side.

Tip about default_scope

Wrap the contents of default_scope in a block. Example:

default_scope where(nullified: false)  # плохо!
default_scope { where(nullified: false) }  # хорошо

What is the difference? That the first option is executed right at server startup, and if the nullified field was not in the database, then the server will not take off. The same applies to migrations - they will not pass due to the lack of a field, which, most likely, we just want to add. In the second case, due to the fact that Ruby is lazy, the block will be executed only when the model is accessed and the migrations will be performed normally.

Has_many through

Another common patient is

product.documents.collect(&:lines).flatten

here the product has many documents that have many lines. It often happens that you want to get all the lines of all documents related to the product. And in this case, they create the above construction. In this case, we can recall the through option for relations and do the following for the product:

has_many :lines, through: documents

and then execute

product.lines

It turns out both more clearly and efficiently.

A bit about JOIN

In continuation of the topic of joins, remember about includes . What is so special about it? Yes, that's LEFT JOIN . Quite often I see that left / right joins are written explicitly

joins("LEFT OUTER JOIN wikis ON wiki_pages.wiki_id=wikis.id")

this of course also works, but pure SQL in RoR has always been out of favor.

Also, without departing from the cash register, we must recall the difference in values between joins and where when used together. Suppose we have a users table , and different entities, for example products, have an author_id field and an author relation, which has a users table under it .

has_one :author,
	     class: 'User',
	     foreign_key: 'author_id'   # не обязательно, но для наглядности

The following code will not work for such a case.

products.joins(:author).where(author: {id: 42})

Why? Because in joins the name of the relational is indicated, which is the joinim, and in where the condition is imposed on the table and we must say

where(users: {id: 42})

This can be avoided by explicitly specifying 'AS author' in the join, but again it will be pure SQL.

Next, look at joins from a different angle. Whatever we join, we end up with objects of the class that started it all:

Product.joins(:documents, :files, :etc).first

In this case, we get the product regardless of the number of joins. This behavior upsets some, as they would like to receive fields from joined tables. And they begin to make the same request on the other hand: take documents, join them with products, write pure SQL to communicate with other entities, in general, invent a bicycle when the correct and logical code was written at the very beginning. Therefore, I recall the very basis:

Product.joins(:documents, :files, :etc).where(...).pluck('documents.type')

Here we get an array with the desired field from the database. Pros: minimum queries; no AR objects are created. Cons: in Rails 3, pluck accepts only 1 (one) parameter, and here it is

pluck('documents.type', 'files.filename', 'files.path')

can only be done in Rails 4.

Build relations

Now let's turn to the consideration of working with the build of relations. In general, everything is quite simple:

product.documencts.build(type: 'article', etc: 'etc').lines.build(content: '...')

After calling product.save , we will have all the associations saved together with validations, preference and courtesans. There is one caveat in all this joyful action: all this is good when the product is not readonly and / or there are no other storage restrictions. In such cases, many arrange a garden similar to the garden with joins in the example above. That is, they create a document , bind it to the product and build the lines for the document. It turns out a crooked and default behavior, which is usually tied to error handling productdoes not work. Therefore, in the appendage, all this is immediately framed with crutches, throwing mistakes and it turns out pretty nasty. What to do in this case? We need to remember about autosave and understand how it works. Without going into details I’ll say that he works on callbacks . Therefore, there is a way to preserve relations for the product described above:

product.autosave_associated_records_for_documents

In this case, the document will be saved, its callbacks will be called to save the lines, etc.

A few words about indexes

Lastly, you need to say about indices, because many banged their heads on solid objects due to problems on the basis of indices. I apologize right away for interfering with a bunch of ActiveRecord and database capabilities, but for my personal opinion: you cannot work well with AR without realizing what is happening at that moment on the database side.

Problem one

For some reason, many are sure that the order on Relation does not depend on which column we are sorting. A variation on this misconception is a lack of understanding of the difference between order Relation and order Array . Because of this, you can meet default_scope with an order on the VARCHAR field and questions in the spirit: “Why is it that your page loads so slowly? There, only a couple of records are retrieved from the database! ” The problem here is that default sorting is damn expensive if we don't have an index on this column. By default, AR sorts by pk . It happens when we do

Products.first

But pk has an index almost always and there are no problems. But when we say what order (: name) will do with any call to the model, problems begin.
For reference : if you explain “on the fingers”, then when sorting by an indexed column, real sorting does not occur, it is already present in the database and the data is immediately sent in the correct order.

Second problem

Compound indices. Not everyone knows about them and an even smaller circle of people knows why they are needed. In short, a composite index is an index based on two or more database fields. Where can it come in handy? Two common places to use it:

polymorphic associations
many-to-many interim table.

About polymorphic connections was described here . It is often convenient for them to create a composite index. Here is a slightly updated example from off.manula :

class CreatePictures < ActiveRecord::Migration
  def change
    create_table :pictures do |t|
      t.string  :name
      t.integer :imageable_id
      t.string  :imageable_type
      t.timestamps
    end
    add_index :pictures, [:imageable_id, :imageable_type] # вот он составной индекс
  end
end

Here are a few words about the difference between a regular index and a composite index. I will not go into details further, for the topic is for a separate hub. In addition, everything was already painted before me .
Now about the intermediate link table. Well- known HBTM . Here, in some cases, it is appropriate to post a composite index on assemblies_parts (see the HBTM link ). But we must remember that the sequence of fields in a composite index has knowledge. Details here .

Problem three

“Indexes are needed everywhere!” It is not so common, but it causes terrible brakes on everything and everyone. It must be remembered that the index is not a panacea and guaranteed x10-x100 for speed, but a tool that needs to be used in the right places, and not wave it above your head and shove it into each hole. Here here you can read about the types of indexes, and here you can learn why they are generally needed.

For sim everything

Thank you for reading to the end. For typos and inaccuracies, write to me on PM, I will be glad to fix it. I will also be glad if you share your “painful” and your experience about what you need to remember and what is better to use in different situations during development.

Only registered users can participate in the survey. Please come in.

What would you most like to see in the next hub?

47.1% More about ActiveRecord and the database (details about indexes, association options, etc.) 42
25.8% Something interesting about Ruby and its functions (for example, about inject, succ and others) 23
24.7% Bad advice (pieces of code and functions for which the author would like to hammer a nail into the head) 22
2.2% Own (write in the comments) 2

Tags: