环境说明
OS操作系统:WINDOWS 64位
数据库版 本:ORACLE 11.2.0.1
故障问题描述
客户反映数据库无法启动,报ORA-01589:要打开数据库必须使用RESETLOGS或NORESETLOGS选项。使用alter database open
resetlogs启动数据库报ORA-01194:文件1需要更多恢复来保持一致性,ORA-01110:数据文件1:‘E:\APP\YING_LUN\ORADATA\ORCL\SYSTEM01.DBF’。
故障分析
1、在网上搜索相关文档,对于这样的问题需要进行控制文件恢复:
SQL> recover database using backup controlfile;
由于数据库在出故障之前没有开启归档日志,因为找不到需要的归档日志控制文件恢复失败。
2.使用NORESETLOGS方式重建控制文件
目前数据库可以启动到MOUNT状态,可以备份控制文件创建脚本到TRACE文件中,获取控制文件创建脚本
SQL>alter database backup controlfile to trace;
==>在11G中可以通过v$diag_info视图查看备份的TRACE文件的位置
SQL> select value from v$diag_info where name='Default Trace
File';
==>把TRACE文件中的NORESETLOGS方式创建控制文件的SQL语句复制到create.sql脚本
STARTUP NOMOUNT
CREATE CONTROLFILE REUSE DATABASE "ORCL" NORESETLOGSARCHIVELOG
MAXLOGFILES 16
MAXLOGMEMBERS 3
MAXDATAFILES 100
MAXINSTANCES 8
MAXLOGHISTORY 584
LOGFILE
GROUP 1
'E:\APP\YING_LIU\ORADATA\ORCL\REDO01.LOG'SIZE 500M BLOCKSIZE 512,
GROUP 2
'E:\APP\YING_LIU\ORADATA\ORCL\REDO02.LOG'SIZE 500M BLOCKSIZE 512,
GROUP 3
'E:\APP\YING_LIU\ORADATA\ORCL\REDO03.LOG'SIZE 500M BLOCKSIZE 512,
GROUP 4 'E:\APP\YING_LIU\ORADATA\ORCL\REDO04.LOG'SIZE 500M BLOCKSIZE 512,
GROUP 5
'E:\APP\YING_LIU\ORADATA\ORCL\REDO05.LOG'SIZE 500M BLOCKSIZE 512,
GROUP 6
'E:\APP\YING_LIU\ORADATA\ORCL\REDO06.LOG'SIZE 500M BLOCKSIZE 512
-- STANDBY LOGFILE
DATAFILE
'E:\APP\YING_LIU\ORADATA\ORCL\SYSTEM01.DBF',
'E:\APP\YING_LIU\ORADATA\ORCL\SYSAUX01.DBF',
'E:\APP\YING_LIU\ORADATA\ORCL\UNDOTBS01.DBF',
'E:\APP\YING_LIU\ORADATA\ORCL\USERS01.DBF',
'E:\APP\YING_LIU\ORADATA\ORCL\EXAMPLE01.DBF',
'E:\APP\TABLESPACE\MALASONG.DBF',
'E:\APP\TABLESPACE\WX_REDPACKET.DBF',
'D:\ORACLE\ORCL\SIGNIN_USER_TEST.DBF',
'E:\APP\TABLESPACE\EXAM_SYSTEM.DBF',
'E:\APP\TABLESPACE\EXAM_SYSTEM_TEST.DBF',
'E:\APP\TABLESPACE\ICLUB.DBF',
'E:\APP\TABLESPACE\PINBAO.DBF',
'E:\APP\YING_LIU\ORADATA\ORCL\SYSTEM3.DBF'
CHARACTER SET ZHS16GBK
;
==>执行create.sql脚本重建控制文件
SQL> shutdown immediate;
SQL>@create.sql
==>执行create.sql脚本时报错日志组5与日志文件redo5.log不匹配。在网上搜索到的资料建议删除日志组相关内容或使用RESETLOGS方式创建控制文件,于是删除日志文件相关内容。
STARTUP NOMOUNT
CREATE CONTROLFILE REUSE DATABASE "ORCL" NORESETLOGSARCHIVELOG
MAXLOGFILES 16
MAXLOGMEMBERS 3
MAXDATAFILES 100
MAXINSTANCES 8
MAXLOGHISTORY 584
LOGFILE
-- STANDBY LOGFILE
DATAFILE
'E:\APP\YING_LIU\ORADATA\ORCL\SYSTEM01.DBF',
'E:\APP\YING_LIU\ORADATA\ORCL\SYSAUX01.DBF',
'E:\APP\YING_LIU\ORADATA\ORCL\UNDOTBS01.DBF',
'E:\APP\YING_LIU\ORADATA\ORCL\USERS01.DBF',
'E:\APP\YING_LIU\ORADATA\ORCL\EXAMPLE01.DBF',
'E:\APP\TABLESPACE\MALASONG.DBF',
'E:\APP\TABLESPACE\WX_REDPACKET.DBF',
'D:\ORACLE\ORCL\SIGNIN_USER_TEST.DBF',
'E:\APP\TABLESPACE\EXAM_SYSTEM.DBF',
'E:\APP\TABLESPACE\EXAM_SYSTEM_TEST.DBF',
'E:\APP\TABLESPACE\ICLUB.DBF',
'E:\APP\TABLESPACE\PINBAO.DBF',
'E:\APP\YING_LIU\ORADATA\ORCL\SYSTEM3.DBF'
CHARACTER SET ZHS16GBK
;
==>执行修改后的 create.sql报数据库名与SPFILE中的DB_NAME不一致,经检查数据库名称与SPFILE
==>在的DB_NAME是一致。于是决定使用 RESETLOGS方式重建控制文件。
;
3、使用RESETLOGS方式重建控制文件:
==>把 RESETLOGS方式创建控制文件的脚本复制到create_2.sql文件中
STARTUP NOMOUNT
CREATE CONTROLFILE REUSE DATABASE "ORCL" RESETLOGSARCHIVELOG
MAXLOGFILES 16
MAXLOGMEMBERS 3
MAXDATAFILES 100
MAXINSTANCES 8
MAXLOGHISTORY 584
LOGFILE
GROUP 1
'E:\APP\YING_LIU\ORADATA\ORCL\REDO01.LOG'SIZE 500M BLOCKSIZE 512,
GROUP 2
'E:\APP\YING_LIU\ORADATA\ORCL\REDO02.LOG'SIZE 500M BLOCKSIZE 512,
GROUP 3
'E:\APP\YING_LIU\ORADATA\ORCL\REDO03.LOG'SIZE 500M BLOCKSIZE 512,
GROUP 4
'E:\APP\YING_LIU\ORADATA\ORCL\REDO04.LOG'SIZE 500M BLOCKSIZE 512,
GROUP 5
'E:\APP\YING_LIU\ORADATA\ORCL\REDO05.LOG'SIZE 500M BLOCKSIZE 512,
GROUP 6 'E:\APP\YING_LIU\ORADATA\ORCL\REDO06.LOG'SIZE 500M BLOCKSIZE 512
-- STANDBY LOGFILE
DATAFILE
'E:\APP\YING_LIU\ORADATA\ORCL\SYSTEM01.DBF',
'E:\APP\YING_LIU\ORADATA\ORCL\SYSAUX01.DBF',
'E:\APP\YING_LIU\ORADATA\ORCL\UNDOTBS01.DBF',
'E:\APP\YING_LIU\ORADATA\ORCL\USERS01.DBF',
'E:\APP\YING_LIU\ORADATA\ORCL\EXAMPLE01.DBF',
'E:\APP\TABLESPACE\MALASONG.DBF',
'E:\APP\TABLESPACE\WX_REDPACKET.DBF',
'D:\ORACLE\ORCL\SIGNIN_USER_TEST.DBF',
'E:\APP\TABLESPACE\EXAM_SYSTEM.DBF',
'E:\APP\TABLESPACE\EXAM_SYSTEM_TEST.DBF',
'E:\APP\TABLESPACE\ICLUB.DBF',
'E:\APP\TABLESPACE\PINBAO.DBF',
'E:\APP\YING_LIU\ORADATA\ORCL\SYSTEM3.DBF'
CHARACTER SET ZHS16GBK
==>执行create_2.sql脚本重建控制文件,启动数据库报UNDO段 _SYSSMU8_1682283174$快照过旧。
SQL> shutdown immediate;
SQL>@create_2.sql
SQL> alter database openresetlogs;
alter database openresetlogs
*
第1行出现错误:
ORA-01092: ORACLE instance terminated.
Disconnection forced
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL
level 1
ORA-01555: snapshot too old: rollback
segment number 8 with name
"_SYSSMU8_1682283174$" too small
进程ID: 8600
会话ID: 576序列号:
1
4、推高数据库SCN:
经向同时请教觉得通过推高SCN跳过UNDO的回滚操作。
SQL> startupopenresetlogs;
ORACLE例程已经启动。
Total System Global Area 3423965184 bytes
Fixed Size2180544 bytes
Variable Size2566916672 bytes
Database Buffers838860800 bytes
Redo Buffers16007168 bytes
ORA-01504: ???? 'RESETLOGS' ??? db_name 'ORCL' ???
==>查看数据文件头的SCAN
select name,checkpoint_change# from v$datafile_header;
NAME
--------------------------------------------------------------------------------
CHECKPOINT_CHANGE#
----------------------------
E:\APP\YING_LIU\ORADATA\ORCL\SYSTEM01.DBF
10150360067303
.... ==>其他输出信息省略
==>重建控制文件,把数据库SCN推高一个亿,启动数据库仍然报错
SQL>@create_2.sql
SQL> alter session set
events '10015 trace name adjust_scn level 1';
SQL> alter database open resetlogs;
alter database open resetlogs
*
第1行出现错误:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number 8 with name
"_SYSSMU8_1682283174$" too small
进程ID: 6420
会话ID: 576序列号: 1
==>最后把数据库SCN推高到1亿亿还是报错
SQL>@create_2.sql
SQL> alter session set
events '10015 trace name adjust_scn level 100000000';
SQL> alter database open resetlogs;
alter database open resetlogs
*
第1行出现错误:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number 8 with name
"_SYSSMU8_1682283174$" too small
进程ID: 6420
会话ID: 576序列号: 1
==>最后查了下,在11.2.0.1中这个方法无法推高SCN,于是重建控制文件使用oradebug来推高数据库SCN,启动数据库时仍然报错,有可能是在WINDOWS 64位平台不支持oradebug推高SCN
SQL>@create_2.sql
SQL> select to_char('10150360067303','xxxxxxxxxxxxxxxxxxxxxx')
from dual;
TO_CHAR('10650360067303
-----------------------
9afbaedb4e7
SQL>oradebug setmypid
SQL>oradebug dumpvar sga kcsgscn_
SQL>oradebug poke 0x0082237B0 8 0x000012539f0dcce7
SQL>alter database open resetlogs;
alter database open resetlogs
*
第1行出现错误:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number 8 with name
"_SYSSMU8_1682283174$" too small
进程ID: 6420
会话ID: 576序列号: 1
5、跳过有问题的UNDO段,强制打开数据库:
SQL>
alter system set "_allow_resetlogs_corruption"=true
scope=spfile; ==>允许跳过损坏的UNDO段
alter system set
"_corrupted_rollback_segments"='_SYSSMU8$' scope=spfile; ==>把有问题的UNDO段标记为坏的UDNO段。
==>启动数据库
SQL>alter database open resetlogs;
==>创建信息的UNDO表空间,并修改数据库的默认UNDO表空间为新的UNDO表空间
SQL>
create undo tablespace UNDO02 datafile
'E:\app\ying_liu\oradata\ORCL\undo02_01.dbf' size 2G autoextend on;
alter system set undo_tablespace=UNDO02 scope=both;
==>删除旧的UNDO表空间
alter tablespace undotbs1 offline normal;
drop tablespace UNDOTBS1 including contents and datafiles;
==>把SPFILE中的隐藏参数恢复默认值,重启数据库
SQL>
alter system reset "_allow_resetlogs_corruption"
scope=spfile;
alter system reset "_corrupted_rollback_segments"
scope=spfile;
SQL> startup;
至此故障解决,数据库正常启动。
5、为了找到控制文件出问题的原因分析数据库告警日志:
==> 11月15日16:06:07启动数据库报无法找到REDO LOG的错误,可能是redo log损坏了,
==>也可是被删除了,这也是使用NORESETLOGS创建控制文件失败的原因
Tue Nov 15 16:06:07
Starting ORACLE instance (normal)
.... ==>省略部分日志信息
Successful mount of redo thread 1, with mount id 1455795184
Database mounted in Exclusive Mode
Lost write protection disabled
Completed: alter database mount exclusive
alter database open
Beginning crash recovery of 1 threads
parallel recovery started
with 7 processes
Started redo scan
Errors in file
e:\app\ying_liu\diag\rdbms\orcl\orcl\trace\orcl_ora_11860.trc:
ORA-00313: ??????? 5 (???? 1) ???
ORA-00312: ???? 5 ?? 1: 'E:\APP\YING_LIU\ORADATA\ORCL\REDO05.LOG'
ORA-27041: ??????
OSD-04002:无法打开文件
O/S-Error: (OS 2)系统找不到指定的文件。
Aborting crash recovery due to error 313
Errors in file e:\app\ying_liu\diag\rdbms\orcl\orcl\trace\orcl_ora_11860.trc:
ORA-00313: ??????? 5 (???? 1) ???
ORA-00312: ???? 5 ?? 1: 'E:\APP\YING_LIU\ORADATA\ORCL\REDO05.LOG'
ORA-27041: ??????
OSD-04002:无法打开文件
O/S-Error: (OS 2)系统找不到指定的文件。
Errors in file
e:\app\ying_liu\diag\rdbms\orcl\orcl\trace\orcl_ora_11860.trc:
ORA-00313: ??????? 5 (???? 1) ???
ORA-00312: ???? 5 ?? 1: 'E:\APP\YING_LIU\ORADATA\ORCL\REDO05.LOG'
ORA-27041: ??????
OSD-04002:无法打开文件
O/S-Error: (OS 2)系统找不到指定的文件。
ORA-313 signalled during: alter database open...
Tue Nov 15 16:06:13
Errors in file
e:\app\ying_liu\diag\rdbms\orcl\orcl\trace\orcl_m000_16324.trc:
ORA-00313: open failed for members of log group 1 of thread 1
ORA-00312: online log 1 thread 1:
'E:\APP\YING_LIU\ORADATA\ORCL\REDO01.LOG'
ORA-27041: unable to open file
OSD-04002:无法打开文件
O/S-Error: (OS 2)系统找不到指定的文件。
Errors in file
e:\app\ying_liu\diag\rdbms\orcl\orcl\trace\orcl_m000_16324.trc:
ORA-00313: open failed for members of log group 2 of thread 1
ORA-00312: online log 2 thread 1:
'E:\APP\YING_LIU\ORADATA\ORCL\REDO02.LOG'
ORA-27041: unable to open file
OSD-04002:无法打开文件
O/S-Error: (OS 2)系统找不到指定的文件。
Errors in file
e:\app\ying_liu\diag\rdbms\orcl\orcl\trace\orcl_m000_16324.trc:
ORA-00313: open failed for members of log group 3 of thread 1
ORA-00312: online log 3 thread 1: 'E:\APP\YING_LIU\ORADATA\ORCL\REDO03.LOG'
ORA-27041: unable to open file
OSD-04002:无法打开文件
O/S-Error: (OS 2)系统找不到指定的文件。
Errors in file
e:\app\ying_liu\diag\rdbms\orcl\orcl\trace\orcl_m000_16324.trc:
ORA-00313: open failed for members of log group 4 of thread 1
ORA-00312: online log 4 thread 1:
'E:\APP\YING_LIU\ORADATA\ORCL\REDO04.LOG'
ORA-27041: unable to open file
OSD-04002:无法打开文件
O/S-Error: (OS 2)系统找不到指定的文件。
Errors in file
e:\app\ying_liu\diag\rdbms\orcl\orcl\trace\orcl_m000_16324.trc:
ORA-00313: open failed for members of log group 5 of thread 1
ORA-00312: online log 5 thread 1:
'E:\APP\YING_LIU\ORADATA\ORCL\REDO05.LOG'
ORA-27041: unable to open file
OSD-04002:无法打开文件
O/S-Error: (OS 2)系统找不到指定的文件。
Errors in file
e:\app\ying_liu\diag\rdbms\orcl\orcl\trace\orcl_m000_16324.trc:
ORA-00313: open failed for members of log group 6 of thread 1
ORA-00312: online log 6 thread 1:
'E:\APP\YING_LIU\ORADATA\ORCL\REDO06.LOG'
ORA-27041: unable to open file
OSD-04002:无法打开文件
O/S-Error: (OS 2)系统找不到指定的文件。
Checker run found 12 new persistent data failures
Tue Nov 15 16:07:18
==>在/11/15 16:54:59执行了不完全恢复,但因为没有开启归档日志,找不到需要的归档日志
==>恢复失败
Tue Nov 15 16:54:59
ALTER DATABASE RECOVERdatabase until time '-10-01 00:00:00'
Media Recovery Start
started logmerger process
Parallel Media Recovery started with 8 slaves
Tue Nov 15 16:55:00
Media Recovery failed with error 19907
Slave exiting with ORA-283 exception
Errors in file
e:\app\ying_liu\diag\rdbms\orcl\orcl\trace\orcl_pr00_560.trc:
ORA-00283: recovery session canceled due to errors
ORA-19907: recovery time or SCN does not belong to recovered
incarnation
ORA-283 signalled during: ALTER DATABASE RECOVERdatabase until time '-10-01
00:00:00'.
Tue Nov 15 16:55:11
ALTER DATABASE RECOVERdatabase until time '-10-01 00:00:00'
Media Recovery Start
started logmerger process
Parallel Media Recovery started with 8 slaves
Media Recovery Not Required
Tue Nov 15 16:55:11
Errors in file
e:\app\ying_liu\diag\rdbms\orcl\orcl\trace\orcl_pr00_5260.trc:
ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get
error below
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: 'E:\APP\YING_LIU\ORADATA\ORCL\SYSTEM01.DBF'
Slave exiting with ORA-1547 exception
Errors in file
e:\app\ying_liu\diag\rdbms\orcl\orcl\trace\orcl_pr00_5260.trc:
ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get
error below
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: 'E:\APP\YING_LIU\ORADATA\ORCL\SYSTEM01.DBF'
ORA-10879 signalled during: ALTER DATABASE RECOVERdatabase until time '-10-01
00:00:00'...
Tue Nov 15 16:55:23
alter database open resetlogs
ORA-1194 signalled during: alter database open resetlogs...
Tue Nov 15 16:55:43
alter database clear unarchived logfile group 5
ORA-1624 signalled during: alter database clear unarchived logfile
group 5...
Tue Nov 15 16:55:43
Errors in file
e:\app\ying_liu\diag\rdbms\orcl\orcl\trace\orcl_m000_932.trc:
ORA-00313: open failed for members of log group 5 of thread 1
ORA-00312: online log 5 thread 1:
'E:\APP\YING_LIU\ORADATA\ORCL\REDO05.LOG'
ORA-27041: unable to open file
OSD-04002:无法打开文件
O/S-Error: (OS 2)系统找不到指定的文件。
Checker run found 13 new persistent data failures
==> /11/15 17:03:07手动执行了控制文件恢复操作,但因为没有对应的归档日志失败(没开归档)
==>这个操作直接导致了控制文件与数据文件头SCN不一致,以至于后续启动数据库时都要求恢复控制文
==>文件。
Tue Nov 15 17:03:07
ALTER DATABASE RECOVERdatabase using backup controlfile
Media Recovery Start
started logmerger process
Tue Nov 15 17:03:07
WARNING! Recovering data file 1 from a fuzzy file. If not the
current file
it might be an online backup taken without entering the begin backup
command.
WARNING! Recovering data file 2 from a fuzzy file. If not the
current file
it might be an online backup taken without entering the begin backup
command.
WARNING! Recovering data file 3 from a fuzzy file. If not the
current file
it might be an online backup taken without entering the begin backup
command.
WARNING! Recovering data file 4 from a fuzzy file. If not the
current file
it might be an online backup taken without entering the begin backup
command.
WARNING! Recovering data file 5 from a fuzzy file. If not the
current file
it might be an online backup taken without entering the begin backup
command.
WARNING! Recovering data file 6 from a fuzzy file. If not the
current file
it might be an online backup taken without entering the begin backup
command.
WARNING! Recovering data file 7 from a fuzzy file. If not the
current file
it might be an online backup taken without entering the begin backup
command.
WARNING! Recovering data file 8 from a fuzzy file. If not the
current file
it might be an online backup taken without entering the begin backup
command.
WARNING! Recovering data file 9 from a fuzzy file. If not the
current file
it might be an online backup taken without entering the begin backup
command.
WARNING! Recovering data file 10 from a fuzzy file. If not the
current file
it might be an online backup taken without entering the begin backup
command.
WARNING! Recovering data file 11 from a fuzzy file. If not the
current file
it might be an online backup taken without entering the begin backup
command.
WARNING! Recovering data file 12 from a fuzzy file. If not the
current file
it might be an online backup taken without entering the begin backup
command.
WARNING! Recovering data file 13 from a fuzzy file. If not the
current file
it might be an online backup taken without entering the begin backup
command.
Parallel Media Recovery started with 8 slaves
ORA-279 signalled during: ALTER DATABASE RECOVERdatabase using backup controlfile...
Tue Nov 15 17:04:40
ALTER DATABASE RECOVERLOGFILE 'E:\app\ying_liu\oradata\ORCL\REDO01.LOG;'
Media Recovery Log E:\app\ying_liu\oradata\ORCL\REDO01.LOG;
Tue Nov 15 17:04:40
Errors with log E:\app\ying_liu\oradata\ORCL\REDO01.LOG;
Errors in file
e:\app\ying_liu\diag\rdbms\orcl\orcl\trace\orcl_pr00_10108.trc:
ORA-00308: cannot open archived log
'E:\app\ying_liu\oradata\ORCL\REDO01.LOG;'
ORA-27041: unable to open file
OSD-04002:无法打开文件
O/S-Error: (OS 2)系统找不到指定的文件。
ORA-308 signalled during: ALTER DATABASE RECOVERLOGFILE
'E:\app\ying_liu\oradata\ORCL\REDO01.LOG;'...
Tue Nov 15 17:05:00
ALTER DATABASE RECOVERLOGFILE 'E:\app\ying_liu\oradata\ORCL\REDO01.LOG'
Media Recovery Log E:\app\ying_liu\oradata\ORCL\REDO01.LOG
Tue Nov 15 17:05:00
Errors with log E:\app\ying_liu\oradata\ORCL\REDO01.LOG
Errors in file
e:\app\ying_liu\diag\rdbms\orcl\orcl\trace\orcl_pr00_10108.trc:
ORA-00339: archived log does not contain any redo
ORA-00334: archived log: 'E:\APP\YING_LIU\ORADATA\ORCL\REDO01.LOG'
ORA-339 signalled during: ALTER DATABASE RECOVERLOGFILE 'E:\app\ying_liu\oradata\ORCL\REDO01.LOG'...
ALTER DATABASE RECOVER CANCEL
Media Recovery Canceled
Completed: ALTER DATABASE RECOVER CANCEL
总结:
1.处理故障前没检查数据库告警日志,做了使用NORESETLOGS创建控制文件的无用功,如果
查看了数据库告警日志知道redo
log损坏了,就会直接使用RESETLOGS重建控制文件。
2.选择解决问题的方案之前,要思考下这么做的目的是什么,是否有更高效的处理方式。比如推高数据库SCN的目的是为了跳过实例恢复时的回滚操作,其实可以直接使用隐藏参数
跳过有问题的回滚段,可以节省推高SCN操作的时间,提供工作效率。