mashuka November 8, 2011 at 10:52

Online redo logs or Checkpoint event in Oracle

From the sandbox

Quite often, such a nuisance happens that messages like “Checkpoint not complete” are streaming into the alert.log database one after another. The standard advice in this case is: "increase the number and / or size of redo logs." And then the question is who these redo logs are and what they eat with.

Bit of theory

So. When the application requests data, the database adds it to the buffer cache - the memory area in SGA. When data changes, the database does not make changes directly in the data file, but in the buffer cache. At the same time, information is recorded in a separate memory area - redo log buffer - by which, if necessary, it will be possible to repeat the change. When a change is committed, it, again, is not immediately dumped to the data file, but information from the redo log buffer is dumped to the online redo log - a file specially designed for this. Until the change is written to the data file, it is necessary to store information about it somewhere on the disk in case the database falls. If, for example, the server power is turned off, then, of course, all data stored in memory will be lost. In this case, the redo log is the only place where information about the change that has occurred is stored. After the database is restarted, Oracle will actually repeat the transaction, change the necessary blocks again and make commit. Therefore, until the information from the redo log is flushed to the data file, it is impossible to reuse this redo log.

A special background DBWn database process frees the buffer cache as needed, and also executes a checkpoint event. A checkpoint is an event during which dirty (modified) blocks are written to data files. The checkpoint process is responsible for the checkpoint event, which writes information about the checkpoint to the control file (about what a control file is, another time) and the headers of data files.

The checkpoint event provides data consistency and quick database recovery. Data recovery is accelerated because all changes to the checkpoint are written to the data files. This eliminates the need to use redo logs generated before the checkpoint during recovery. The checkpoint ensures that all changed blocks in the cache are actually written to the corresponding data files.

There are several types of control points.

Thread checkpoints In a data file, all changes that have occurred within the framework of a specific instance up to a certain point are written in a row. They happen in the following situations:
- full stop of the base;
- alter system checkpoint;
- switching online redo logs;
- alter database begin backup.
Checkpoints for data files and table spaces. It happens when operations with table spaces and data files occur (alter tablespace offline, alter tablespace read only, data file shrink, etc.)
Incremental control points. A subset of the instance control point, designed to avoid writing to the disk a huge number of blocks during switching redo logs. The DBWn process checks at least once every three seconds whether there are new “dirty” blocks for writing to disk. If there are, then they are entered into the data files, the checkpoint label in the redo log is shifted (so that the next time you have to look at less logs), but the headers of the data files do not change.

Frequent checkpoint events provide faster recovery from failures, but can cause performance degradation. Depending on the number of data files in the system, a checkpoint event can be quite a resource-intensive operation, because at this point, all the headers of all data files become inaccessible.

Parameters on which the frequency of control point events depends and which, if desired, can be configured:

FAST_START_MTTR_TARGET (how long it will take in seconds to restore the database after a failure; if I understand something, then we are talking about the time it takes to use the available online redo logs).
LOG_CHECKPOINT_INTERVAL (checkpoint event frequency is the allowable number of blocks of the online redo log file that were filled after the previous checkpoint; blocks are blocks in terms of the operating system, not the database).
LOG_CHECKPOINT_TIMEOUT (maximum number of seconds allowed between two checkpoint events).
LOG_CHECKPOINTS_TO_ALERT (true / false; determines whether to throw off the switching of the checkpoint in alert.log; a useful thing, it would be better to set it to true).

It makes sense to clarify that the FAST_START_MTTR_TARGET and LOG_CHECKPOINT_INTERVAL parameters, if you believe the documentation, are mutually exclusive.

Let's look at our base now

As already mentioned, the checkpoint event occurs when the online redo log is switched. A good practice, if you believe the metalware, is switching logs every twenty minutes. Too small online redo logs can increase the frequency of checkpoint events and reduce performance. Oracle recommends that the size of the online redo log files be the same, and that at least two log groups exist for each instance of the database.

To track the frequency of switching logs, you can look in the alert log.
An example of switching online redo logs.

Wed Nov 02 17:51:20 2011
Thread 1 advanced to log sequence 83 (LGWR switch)
  Current log# 2 seq# 83 mem# 0: D:\ORACLE\ORADATA\ORADB\REDO02.LOG
Thread 1 advanced to log sequence 84 (LGWR switch)
  Current log# 3 seq# 84 mem# 0: D:\ORACLE\ORADATA\ORADB\REDO03.LOG

Sometimes in alert.log you may find the following errors.

Wed Nov 02 17:51:53 2011
Thread 1 cannot allocate new log, sequence 87
Checkpoint not complete
  Current log# 2 seq# 86 mem# 0: D:\ORACLE\ORADATA\ORADB\REDO02.LOG

This means that Oracle is going to reuse the online redo log, the data from which has not yet been dumped into data files. In this case, all operations in the database are suspended (application performance deteriorates sharply), a checkpoint event is triggered, and dirty blocks are urgently flushed to disk. If such errors arise from case to case, then, perhaps, there is nothing catastrophic in this. However, if they become permanent, then it's time to think about how to change the size and number of redo logs.

From the cookbook. How to change the size and / or number of online redo logs

1. First, just look at the status of the logs.

SQL> select group#, bytes, status from v$log;
    GROUP#      BYTES STATUS
---------- ---------- ----------------
         1   52428800 ACTIVE
         2   52428800 ACTIVE
         3   52428800 CURRENT

Let's try to increase the size of the logs to 100M.

2. Let's look directly at the redo log files.

SQL> select group#, member from v$logfile;
    GROUP# MEMBER
---------- ----------------------------------------
         3 D:\ORACLE\ORADATA\ORADB\REDO03.LOG
         2 D:\ORACLE\ORADATA\ORADB\REDO02.LOG
         1 D:\ORACLE\ORADATA\ORADB\REDO01.LOG

3. Create three new groups of logs of 100M each.


SQL> alter database add logfile group 4 'D:\ORACLE\ORADATA\ORADB\REDO04.LOG' size 100M;
Database altered.
SQL> alter database add logfile group 5 'D:\ORACLE\ORADATA\ORADB\REDO05.LOG' size 100M;
Database altered.
SQL> alter database add logfile group 6 'D:\ORACLE\ORADATA\ORADB\REDO06.LOG' size 100M;
Database altered.

Let's see what happened


SQL> select group#, bytes, status from v$log;
    GROUP#      BYTES STATUS
---------- ---------- ----------------
         1   52428800 INACTIVE
         2   52428800 INACTIVE
         3   52428800 INACTIVE
         4  104857600 INACTIVE
         5  104857600 ACTIVE
         6  104857600 CURRENT

4. Now you can delete the old (too small) logs. To do this, the log from the newly created groups should be active. If this is not the case in the current situation, then you can use the command


SQL> alter system switch logfile;

Now, with a clear conscience, remove unnecessary logs

                                
SQL> alter database drop logfile group 1;
Database altered.
SQL> alter database drop logfile group 2;
Database altered.
SQL> alter database drop logfile group 3;
alter database drop logfile group 3
*
ERROR at line 1:
ORA-01624: log 3 needed for crash recovery of instance oradb (thread 1)
ORA-00312: online log 3 thread 1: 'D:\ORACLE\ORADATA\ORADB\REDO03.LOG'

When deleting the last log, a situation just happened when it is impossible to delete the log, because data from it has not yet been dumped into data files. You can force this by using the command


SQL> alter system checkpoint;
System altered.

Then we boldly try again.


SQL> alter database drop logfile group 3;
Database altered.

5. Check if we succeeded.


SQL> select group#, bytes, status from v$log;
    GROUP#      BYTES STATUS
---------- ---------- ----------------
         4  104857600 ACTIVE
         5  104857600 ACTIVE
         6  104857600 CURRENT

6. Now is the time to backup the database. So, just in case.

7. Now you can delete unnecessary files of the operating system.

D:\> del D:\oracle\oradata\oradb\REDO01.LOG
D:\> del D:\oracle\oradata\oradb\REDO02.LOG
D:\> del D:\oracle\oradata\oradb\REDO03.LOG

What to do if we have Oracle RAC installed.

In principle, the same thing. Taking into account the fact that for each instance there is a separate set of redo logs and they will have to be operated separately.

The ALTER SYSTEM CHECKPOINT LOCAL command only works with the instance you are currently connected to. To trigger a checkpoint event for the entire database, you need to call ALTER SYSTEM CHECKPOINT or ALTER SYSTEM CHECKPOINT GLOBAL.

The ALTER SYSTEM SWITCH LOGFILE command affects only the instance that you are currently connected to. To switch online redo logs for the entire system, you can use the ALTER SYSTEM ARCHIVE LOG CURRENT command.

It is necessary to create new online redo logs for each instance separately.

ALTER DATABASE ADD LOGFILE THREAD 1 GROUP 4 '+ASMGRP/ORADB/REDO04.LOG' SIZE 100M;
ALTER DATABASE ADD LOGFILE THREAD 2 GROUP 4 '+ASMGRP/ORADB/REDO04.LOG' SIZE 100M;

By the way, the file name can be omitted. The base itself will call it in accordance with its ideas about the beautiful.

By the way. You can propagate online redo log files.

ALTER DATABASE ADD LOGFILE GROUP 7 ('D:\ORACLE\ORADATA\ORADB\REDO07.LOG','C:\ORACLE\ORADATA\ORADB\REDO07.LOG') SIZE 100M;

Why propagate? Because, if for some reason the online redo log file is damaged or lost, then having it undamaged and not lost copy on another disk, recovery is a matter of two minutes. But if there is no copy, then you have to tinker with it (but the process of recovering lost redo logs is a completely different story).

___________________________
References
Oracle Database 11.2 official documentation.
Oracle Support nodes 147468.1 and 1035935.6
Thomas Kyte, Expert Oracle Database Architecture: 9i and 10g Programming Techniques and Solutions

Tags:

Online redo logs or Checkpoint event in Oracle

Bit of theory

Let's look at our base now

From the cookbook. How to change the size and / or number of online redo logs

What to do if we have Oracle RAC installed.

Also popular now: