elcarOnOsdnaH: January 2015

12.1.0.2 Bug When Starting Grid Infra Management Repository DB

During start up 12.1.0.2 Grid Infrastructure Management Repository, we came across the following error.

Startup failed with following error..

[oracle@dbrac1 bin]$ srvctl start mgmtdb
PRCR-1079 : Failed to start resource ora.mgmtdb
CRS-5017: The resource action "ora.mgmtdb start" encountered the following error:
ORA-00600: internal error code, arguments: [kcratr_scan_lastbwr], [], [], [], [], [], [], [], [], [], [], []
. For details refer to "(:CLSN00107:)" in "/u01/app/oracle/diag/crs/dbrac2/crs/trace/crsd_oraagent_oracle.trc".

CRS-2674: Start of 'ora.mgmtdb' on 'dbrac2' failed
CRS-5017: The resource action "ora.mgmtdb start" encountered the following error:
ORA-00600: internal error code, arguments: [kcratr_scan_lastbwr], [], [], [], [], [], [], [], [], [], [], []
. For details refer to "(:CLSN00107:)" in "/u01/app/oracle/diag/crs/dbrac1/crs/trace/crsd_oraagent_oracle.trc".

CRS-2674: Start of 'ora.mgmtdb' on 'dbrac1' failed
CRS-2632: There are no more servers to try to place resource 'ora.mgmtdb' on that would satisfy its placement policy

Upon checking alert logs we noticed the corruption, albeit logical...

Beginning crash recovery of 1 threads

Tue Jan 13 21:22:21 2015

Started redo scan

Hex dump of (file 4, block 3301) in trace file /u01/app/oracle/diag/rdbms/_mgmtdb/-MGMTDB/trace/-MGMTDB_ora_22748.trc

Reading datafile '+DATA/_MGMTDB/DATAFILE/undotbs1.259.860525535' for corruption at rdba: 0x01000ce5 (file 4, block 3301)

Reread (file 4, block 3301) found same corrupt data (logically corrupt)

Write verification failed for File 4 Block 3301 (rdba 0x1000ce5)

Tue Jan 13 21:22:36 2015

Errors in file /u01/app/oracle/diag/rdbms/_mgmtdb/-MGMTDB/trace/-MGMTDB_m000_22997.trc:

ORA-00338: log 2 of thread 1 is more recent than control file

ORA-00312: online log 2 thread 1: '+DATA/_MGMTDB/ONLINELOG/group_2.262.860525589'

Errors in file /u01/app/oracle/diag/rdbms/_mgmtdb/-MGMTDB/trace/-MGMTDB_ora_22748.trc (incident=41020) (PDBNAME=CDB$ROOT):

ORA-00600: internal error code, arguments: [kcratr_scan_lastbwr], [], [], [], [], [], [], [], [], [], [], []

Incident details in: /u01/app/oracle/diag/rdbms/_mgmtdb/-MGMTDB/incident/incdir_41020/-MGMTDB_ora_22748_i41020.trc

Tue Jan 13 21:22:49 2015

Dumping diagnostic data in directory=[cdmp_20150113212249], requested by (instance=1, osid=22748), summary=[incident=41020].

Use ADRCI or Support Workbench to package the incident.

See Note 411.1 at My Oracle Support for error and packaging details.

Tue Jan 13 21:22:51 2015

Aborting crash recovery due to error 600

Tue Jan 13 21:22:51 2015

Errors in file /u01/app/oracle/diag/rdbms/_mgmtdb/-MGMTDB/trace/-MGMTDB_ora_22748.trc:

ORA-00600: internal error code, arguments: [kcratr_scan_lastbwr], [], [], [], [], [], [], [], [], [], [], []

However, our management listener was running fine.

[oracle@dbrac1 trace]$ srvctl status mgmtlsnr
Listener MGMTLSNR is enabled
Listener MGMTLSNR is running on node(s): dbrac1

Upon checking MOS, we figure this is a partially published bug which caused this behaviour. The solution to overcome to this is as follows.

export ORACLE_SID= -MGMTDB

SQL> startup mount
SQL> recover database;
SQL> alter database open ;

Upon startup we hit the following error
alter database open
*
ERROR at line 1:
ORA-03113: end-of-file on communication channel
Process ID: 29109
Session ID: 1 Serial number: 24261

Alert Logs shows...

ORA-00338: log 2 of thread 1 is more recent than control file
ORA-00312: online log 2 thread 1: '+DATA/_MGMTDB/ONLINELOG/group_2.262.860525589'

This means the SCN of control file is older than SCN of logfile. This happens when you do recovery of old controlfile, which was not the case here. So something does not seem right. I also tried to start the DB on second node of cluster and again it failed.
I also tried to start MGMTCA tool, to reconfigure the management repository but it failed since my GIRM DB was not running.

There may be possibility of the wrong control file being used, but again that was ruled out. So ultimately, what I have to do is to simulate the cancel based recovery and then try to start the db with resetlogs option.

SQL> startup mount
ORACLE instance started.

Total System Global Area 788529152 bytes
Fixed Size 2929352 bytes
Variable Size 314576184 bytes
Database Buffers 465567744 bytes
Redo Buffers 5455872 bytes
Database mounted.
SQL> recover database until cancel;
Media recovery complete.
SQL> alter database open resetlogs;
Database altered.

And as you can this approach worked.

[oracle@dbrac1 bin]$ ps -ef | grep pmon
oracle 3320 1 0 20:30 ? 00:00:00 asm_pmon_+ASM1
oracle 3508 1 0 22:10 ? 00:00:00 mdb_pmon_-MGMTDB
oracle 4712 18509 0 22:13 pts/1 00:00:00 grep pmon
oracle 6378 1 0 20:34 ? 00:00:00 ora_pmon_orcl1
[oracle@dbrac1 bin]$ srvctl start mgmtdb

[oracle@dbrac1 bin]$ srvctl status mgmtdb

Database is enabled

Instance -MGMTDB is running on node dbrac1

Hope this helps...

Tuesday, January 13, 2015

12.1.0.2 Bug When Starting Grid Infra Management Repository DB