Oracle DB for programmer

Published on May 26, 2011

Oracle DB for programmer

    Does an application programmer need to understand how a database works? Tom Kite, recognized Oracle specialist, author of the famous asktom column, in his book Oracle for Professionals. Architecture and key features. ”Insists that this is a must. Even if your team has a competent administrator, knowing how the Oracle DBMS works will help you better understand each other and interact more effectively, not to mention the case when you do not have such a specialist. In this topic, I will mention the main things that understanding will allow you to competently work with the Oracle database and use some of its features with great return for your application. If you have already read the above book by Tom Kite, you can simply use this article as a memo. One remark - I read the book a long time ago, and then the latest version of the Oracle database was 9i, I also took nine administration courses, so if something has changed and added in the top ten, then do not blame me. Although I am writing about fairly fundamental things that are unlikely to change much.

    Что позволяет БД Oracle работать так быстро?


    When you change data in the database, then your changes go to the cache first, and then asynchronously in several threads (the number can be configured) are written to disk. At the same time, a special log (operational log file) is written so that it is possible to recover data after a failure, if they have not yet had time to flush to disk from the cache. This approach allows you to win in speed, since in this case everything is written to the disk sequentially in one file, and you can configure it to be written simultaneously on two or more disks, thereby increasing the reliability of protection against loss of changes. There should be several described files, and they are used in a circle: as soon as all the data protected by one of the log files was written by the background process to data blocks on the disk, this log file can be reused.

    Usually I talk about this when I am asked to save something simply to a file on disk, as it will be “faster” due to the fact that we will write all the data sequentially and the head of the hard disk will not have to run around and look for random blocks. I still insist that we won’t gain anything here, as we will write to a slow disk, which is most likely actively used by many other processes to write a huge number of different logs, and Oracle also writes to its disk synchronously only sequentially, as I described above.

    Data recovery mechanism


    You can enable archiving of the above operational log files in the Oracle DBMS, and all changes will be archived. Thus, in the event of the loss of any disk with data blocks, we can restore them at any time, including the moment right before the crash, by rolling the current online journal onto the latest archive log files.

    Stand by copy


    The aforementioned archive files can be sent over the network and applied on the fly to a copy of the database. Thus, you will always have a hot copy on hand with minimal data latency. In some applications where there is no need to show data up to the last moment, you can configure such a database to read only and unload the main database instance, and there may be several such instances to read.

    Hanging some write requests


    If some of your requests freeze at an arbitrary point in time, it is worth looking at alert.log for an incomplete checkpoint. This means that your online log files are too large or too small, so the data they protect does not have time to be flushed from the cache to disk, and the DBMS has already filled all available online log files and wants to use them again in a circle, what should I do in no case impossible, and there is a pause. Although if your application runs on java, then first of all I would look at the presence of Full GC in the logs.

    Non-blocking reading and rollback segment


    One of the most remarkable features of the Oracle DBMS is non-blocking reading, which is achieved through the rollback segment. Read requests to Oracle are never blocked, since data can almost always be read from a rollback segment.

    The rollback segment gives one more boon: from it you can try to read slightly outdated data for some table that was in it at a certain point. This feature is called flashback .

    However, sometimes the rollback segment can be laid by a pig: if you have a big job for bulk data deletion (deletion generates all the more data in the rollback segment), then you can get ORA-01555: snapshot too old. The main thing to remember in this case is that you do not have to rewrite your job so that it commits every N operations, but you need to use a separate specially created rollback segment for such operations.

    Transaction Isolation Levels


    In Oracle, there is no READ_UNCOMMITED isolation level at all. The fact is that in other databases it is used to achieve maximum parallelism by removing read locks. But in Oracle, reading is always performed without locks, so we already have all the advantages this level can provide without introducing any additional restrictions.

    In general, only two isolation levels are clearly available in Oracle: READ_COMMITTED is used by default, but you can set SERIALIZABLE if you wish.

    However, at the statement level (SELECT, UPDATE, etc.), you already have REPEATABLE_READ by default, i.e. within the framework of one operator, you always get a consistent read, which is achieved of course due to the rollback segment. I always really liked the example provided by Tom Kite to describe what this gives. Suppose you have a very large table with accounts and you do SELECT to get the amount. In Oracle, unlike many other databases, even if in the middle of your query another transaction transfers a certain amount from the first account to the last, you will still get the actual data at the beginning of your query, since when you reach the last line your SELECT will see that the line has been changed, it will go to the rollback segment and read the data that was in this cell at the time the query started. In many other databases, You will receive a response in the form of an amount never existing in your table. However, in Oracle in this case there is a danger of gettingORA-01555: snapshot too old .

    In addition to the standard isolation levels, Oracle also has the so-called READ_ONLY transactions, which give REPEATABLE_READ within the entire transaction, and not just within a single statement. But as the name implies, in such a transaction you can only read.

    Let Oracle Cache Your Data Effectively


    In Oracle, all data is read-written not directly to disk, but via the cache. By default, the cache is based on the LRU algorithm, so if you read some very large plate by identifier in large quantities, requesting a new line each time, then such requests can push a small static tablet out of the cache, which would be the sweetest thing to constantly find in cache. For such purposes, when creating a table, you can specify a special type of cache where queries to your tables will go. So for the first table in the above example, the RECYCLE cache is suitable, which essentially does not store any data, but immediately discards it from the cache. And for the second table, the KEEP cache is suitable, which allows you to store small static tables in the cache and queries to all other tables will not crowd out the data of static tables from the cache.

    Empty lines


    There is one very interesting feature in Oracle, which they will now never be able to get rid of. The fact is that if you put an empty row in the database, then it will be saved as NULL. Thus, on subsequent reading, you will never get an empty string, but only NULL. Keep in mind that for the same reason empty lines do not fall into the index, so if you make queries whose execution plan will use the index, you will never get your empty (or rather NULL) lines, but that’s about a little bit later.

    Indices


    In addition to the well-known indexes in the form of B-trees, Oracle also has the so-called bit indexes, which show very high performance on queries to tables in which there are columns with very sparse values. Particularly effective in this case will work queries (compared to regular indexes) in which there are complex combinations of OR and AND to sparse columns. This index is not stored in the B-tree, but in bitmaps, which makes it possible to quickly execute the described requests. The question is the number of unique values ​​in the table at which this index will be even more preferable is very complicated: it can be either 10 unique values ​​or 10,000. Here you need to create an index on a specific table and see what happens.

    One of the things that I've always been very pleased with at Oracle is the ability to create an index on a function. Those. if you have to use a function in your queries, then you can build an index on it and significantly speed up read operations.

    Another interesting property of indexes to be aware of is that the index does not store NULL values. Thus, if you make queries with the condition <,> or <> on the indexed column, then you will not get back the lines with the NULL value in the indexed column. On the other hand, this property can be very effectively used for some specific cases. For example, you have a very large plate in which orders are stored, which is never cleaned. And there is a background process that is obliged to send all orders to some backoffice system. The first solution that begs is to get another column with the is_sent flag, where it initially stands at 0, and when sending, we will put down 1. That is the background process at each start will query the table with the condition is_sent = 0. You cannot use the bit index here, as the plate is very actively replenished. A regular index based on a B-tree will take up a lot of space, since you need to store links to a huge number of lines. But if we slightly change our logic both as a sending mark and in the is_sent column we will put NULL instead of 1, then our index will be tiny, because at any moment it will store only non-NULL values, and there will be very few of them.

    Tables are different


    In addition to regular tables, in oracle, like in many other databases, there are so-called index tables, when these tables directly lie in the index tree of the primary key. Thus, two things are achieved at once: firstly, for reading data on the primary key, you have one less reading, secondly, the data in the table is obtained sorted by the primary key, so the ORDER BY PK operation will be performed without additional sorting. The disadvantages include the fact that you can no longer distinguish between logging into the operational log files of this index.

    Another great type of table is cluster tables, which allow you to store data from two or more tables clustered by one key value in one data block. This can be quite effective if you always use some tables together.

    Based on cluster tables, there are also cluster hash tables in which, instead of the B-tree, a table based on the hash of the cluster key is used for access. It sounds, of course, very interesting, but, to be honest, I have never encountered it in practice.

    Variable binding


    Probably every programmer has already heard about this, but I will nevertheless mention such a mandatory technique as variable binding. The fact is that for each unique request a parsing plan is built and put in the cache. If there are a lot of different requests, such as a very common request by ID, then a plan will be generated for each request, moreover, they will crowd out all other plans from the cache, which can significantly increase the response time of your database.

    It is also worth noting that you should not abuse it and use binding for columns with a small number of different values, such as the is_deleted flag, because there will not be many different queries in this case, and it’s possible to build more DBMSs for a more specific query effective plan.

    A couple more notes for the programmer


    If your column is of type VARCHAR2 (100), then trying to cram the string longString.substring (0, 100) there is not a fact that will succeed, since the restriction 100 in the default column definition refers to the number of bytes, not characters, therefore in the presence of double-byte characters you can get into trouble. In fact, this behavior can be configured a little, more details can be found here . It’s good if you are not yet trying to insert in an endless loop, on the principle of doing it so far, it’s never going to “work out” in this case.

    Well, a general recommendation for all types of databases: never update all columns in a table when changing one object field. It seems very obvious, but as practice shows, this antipattern often has a place to be, so I highly recommend checking that your frameworks do UPDATE of only really changed fields.

    Conclusion


    I tried to describe most of the things that in my opinion can be useful to a programmer. Since there are a lot of them, I only outlined them, often without going into details. How to specifically make the necessary settings can always be read in the mentioned Tom Kite book, found in the asktom column or just google. The main thing is to know what to google, and I hope this topic prompted you to this.