深度解析实例恢复

通过v$datafile.last_time和v$datafile.last_change#判断是否需要实例恢复，当数据库正常关闭时，会在last_change#和last_time记录最后的change#与time，实例恢复是根据CKPT-Q的顺序进行恢复的，CKPT-Q的顺序是和REDO的顺序一致的。

在数据库服务器异常断电重启后，数据库会进行实例恢复，那么实例恢复的过程中Oracle做了什么操作呢？

首先说下实例恢复的定义：

Instance recovery is the process of applying records in the online redo log to data files to reconstruct changes made after the most recent checkpoint. Instance recovery occurs automatically when an administrator attempts to open a database that was previously shut down inconsistently.

Oracle Database performs instance recovery automatically in the following situations:

The database opens for the first time after the failure of a single-instance database or all instances of an Oracle RAC database. This form of instance recovery is also called crash recovery. Oracle Database recovers the online redo threads of the terminated instances together.

Some but not all instances of an Oracle RAC database fail. Instance recovery is performed automatically by a surviving instance in the configuration.

The SMON background process performs instance recovery, applying online redo automatically. No user intervention is required.

因此我们知道非一致性关闭会引发实例恢复（一致性关闭不会，参考shutdown immediate的官方定义）同时RAC节点宕机也会在一个存活节点进行实例恢复，其过程就是重构内存中的脏块并提交，同时对未提交的做出回滚，这个过程由smon后台进程负责。

实例恢复分两阶段：

1.前滚：Rolling Forward

Oracle根据redo日志中的记载：

1）对于提交的事务，根据日志进行内存中的脏块重现，然后进行commit，并按正常机制正常写入磁盘。

2）对于未提交的事务，也根据redo进行脏块重现（为何会有未提交的事务日志被写入磁盘呢？因为日志的写入是按时间排序的，一些已提交事务的写日志操作会引发之前的一些未提交事务日志的写入），对以此类脏块只是重现，oracle在此阶段完全不对此类脏块做其他操作。

由于一些未提交大事务的更改已经被写入磁盘（但依然会保持严格的日志先写机制），以及前滚过程中生成的未提交事务的脏块，oracle必须进行第二步的回滚。

2.回滚：Rolling Back

　　对于所有未提交的脏块，oracle根据undo的前镜像进行回滚，重新将内存中缓存的相关数据脏块换为非脏块，同时将写入disk的脏块使用undo重新覆盖。

这里上一幅官网的图：

图：Basic Instance Recovery Steps: Rolling Forward and Rolling Back

图解：

我们看到实例恢复前redo日志中记载的日志对应着四种更改块（redo只记载更改）：

1）已提交且被写入磁盘的更改块，oracle对这种块无需做任何操作。

2）已提交但未被写入磁盘的更改块，oracle会在前滚过程中在内存重现脏块，然后按正常机制提交。

3）未提交且未被写入磁盘的更改块。

4）未提交但已被写入磁盘的更改块。

由于回滚是按事务为单位进行处理的，因此对于3、4两种块的处理全部是在回滚阶段，oracle根据undo进行所有未提交事务的回滚操作，用前镜像覆盖掉磁盘or内存中的数据，这样就会处理掉第3、4种块。

此外，从上不难看出oracle默认undo中记载的事务进度是和redo日志中的完全一致的，不存在undo记载了事务被提交但是redo日志记录未提交的情况。

但是并不是所有情况都符合Oracle默认的预期，有时候数据库频繁掉电就可能出现无法成功进行实例恢复的情况，此时只能采取一些特殊手段对数据文件头和SCN做一些改动。

一般除非特别紧急的状况，否则不要用BBED、强制推进SCN等”偏方“去打开数据库，对于一个成功的DBA来说，做好备份和灾备永远是最重要的工作。

Instance Recovery Phases\
The first phase of instance recovery is called cache recovery or rolling forward, and involves reapplying all of the changes recorded in the online redo log to the data files. Because rollback data is recorded in the online redo log, rolling forward also regenerates the corresponding undo segments.

Rolling forward proceeds through as many online redo log files as necessary to bring the database forward in time. After rolling forward, the data blocks contain all committed changes recorded in the online redo log files. These files could also contain uncommitted changes that were either saved to the data files before the failure, or were recorded in the online redo log and introduced during cache recovery.

After the roll forward, any changes that were not committed must be undone. Oracle Database uses the checkpoint position, which guarantees that every committed change with an SCN lower than the checkpoint SCN is saved on disk. Oracle Database applies undo blocks to roll back uncommitted changes in data blocks that were written before the failure or introduced during cache recovery. This phase is called rolling back or transaction recovery.

深度解析实例恢复

相关推荐