elcarOnOsdnaH: September 2012

Sunday, September 23, 2012

Node Reboot or Shutdown Skipped to Stop 11gR2 Grid Infrastructure

During maintenance in one of our RAC env , the node was rebooted without bringing down the grid manually (as GI automatically stops all its processes automatically when it detects the node shutting down).

Upon start up we had issues of starting up ASM instance and due to this our CSS and then on CRS was not coming up hence grid was un-operational.

During research it was revealed that this was due to the unpublished bug 8740030. Due to this bug, while rebooting a node, command K19ohasd in /etc, which suppose to stop Grid Infrastructure, will be skipped as /var/lock/subsys/ohasd* doesn't exist:

# ls -l /var/lock/subsys/ohasd* | wc -l

output: 0

Looking at the logs reveals following...

CSS Logs -

[ CSSD][4105858816]clssscProcessKillShutdown: Initiating shutdown due to process kill

[ CSSD][4105858816]###################################

[ CSSD][1145833792]clssgmSendShutdown: Aborting client (0x2aaaac01c850) proc (0x90f66c0), iocapables 1.

ASM Logs -

ORA-29746: Cluster Synchronization Service is being shut down.

ORA-29702: error occurred in Cluster Group Service operation

GMON (ospid: 6595): terminating the instance due to error 29746

Instance terminated by GMON, pid = 6595

Fix -

To fix the issue, one has to modify /etc/init.d/ohasd

1. From:

Linux)

LOGMSG="$LOGGER -puser.err"

LOGERR="$LOGGER -puser.alert"

;;

To:

Linux)

LOGMSG="$LOGGER -puser.err"

LOGERR="$LOGGER -puser.alert"

SUBSYSFILE="/var/lock/subsys/ohasd"

;;

2. From:

start()

{

$ECHO -n $"Starting $PROG: "

To:

start()

{

case `/bin/uname` in

Linux)

/bin/touch $SUBSYSFILE

;;

esac

$ECHO -n $"Starting $PROG: "

3. From:

stop()

{

$ECHO -n "Stopping Oracle Clusterware stack"

}

To:

stop()

{

case `/bin/uname` in

Linux)

$RMF $SUBSYSFILE

;;

esac

$ECHO -n "Stopping Oracle Clusterware stack"

}

Once /etc/init.d/ohasd is modified, please execute following command before reboot the node:

#/bin/touch /var/lock/subsys/ohasd

Once this is done, try to reboot the node again without shutting down the GI and see if it stops gracefully or not. I tested it and it worked fine this time.

Friday, September 21, 2012

Apps/EBS R12: How to Diagnose Start-up Problems for Apache

When using the adapcctl.sh script to start Apache in Release 12 this fails with an error like:
12/07/12-15:32:56 :: adapcctl.sh: starting OPMN managed OHS instance
opmnctl: starting opmn managed processes...
================================================================================
opmn id=xxx.yyy.com:6202
0 of 1 processes started.
ias-instance id=DEV_dev.dev.xxx.yyy
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
--------------------------------------------------------------------------------
ias-component/process-type/process-set:
HTTP_Server/HTTP_Server/HTTP_Server/
Error
--> Process (index=1,uid=1561342258,pid=14927)
failed to start a managed process after the maximum retry limit
Log:
/oracle/dev/inst/apps/DEV_dev/logs/ora/10.1.3/opmn/HTTP_Server~1.log
12/07/11-15:33:00 :: adapcctl.sh: exiting with status 204
================================================================================
The log file HTTP_Server~1.log referenced does not exactly reports a root-cause for the failure, so further analysis is required to identify what prevents Apache from being started.
The adapcctl.sh is essentially a wrapper script calling the native Apache starting command. To identify what is preventing the Apache from being started the approach is run this command direct. For this some additional actions are needed.

1. Run the <SID>_<host>.env location in $INST_TOP/ora/10.1.3 directory
# . ./$INST_TOP/ora/10.1.3/<SID>_<host>.env
This sets the $ORACLE_HOME to the AS10G 10.1.3 HOME (instead of AS10G 10.1.2 HOME), so relevant settings picked from the right AS10G HOME.

2. Run the following command:
# $INST_TOP/ora/10.1.3/Apache/Apache/bin/apachectl configtest -f $INST_TOP/ora/10.1.3/Apache/Apache/conf/httpd.conf
This validates the httpd.conf configuration file used by Apache. If there are errors raised for this step it appears the httpd.conf may be corrupted/misconfigured and this prevents Apache from being started. Resolve any problems reported (e.g. by running Autoconfig to have the configuration being recreated) and retest. If the commands responds with an OK proceed with the next step.

3. Run the following command:
# $INST_TOP/ora/10.1.3/Apache/Apache/bin/apachectl startsll -f $INST_TOP/ora/10.1.3/Apache/Apache/conf/httpd.conf
This starts the Apache server direct instead of using OPMN. This could expose errors not observed easily when Apache is started as OPMN service, so can assist in finding out why Apache can not be started.
After this command completes it's expected to see number of httpd processes while running:
# ps -ef | grep httpd
If this still does not show any obvious errors the next step is to run the same command and run strace/truss/tusc to see what OS calls are executed.
The below example uses strace command available on Linux platform. Check OS documentation for exact parameters to be used for the utility on the platform used.

4. Run the following command:
# strace -o startapache.trc -ff -t $INST_TOP/ora/10.1.3/Apache/Apache/bin/apachectl startsll -f $INST_TOP/ora/10.1.3/Apache/Apache/conf/httpd.conf &

This command saves the output in startapache.trc and on Linux the -ff makes that each child process started is logged in separate log file where the <PID> is added to the file name.
Review the trace files for errors reported. If useful it may be an option to collect the same from similar instance not having the problem so trace files can be compared. The OS calls logged in the trace file may expose problems in areas like:
Opening files required for Apache to run (missing, privileges)
Creating or updating (log/pid) files (privileges, size of log file hitting 2GB limit)
Memory issues

5. After root-cause has been identified and issue is resolved so direct start works fine run the following command to stop Apache service
# $INST_TOP/ora/10.1.3/Apache/Apache/bin/apachectl stop -f $INST_TOP/ora/10.1.3/Apache/Apache/conf/httpd.conf
Then use the adapcctl.sh script to confirm that Apache now also starts using the recommended way and confirm this also works fine:
# $INST_TOP/admin/scripts/adapcctl.sh start

Friday, September 14, 2012

Issues during recreating DB Control Console in 11gR2

Error -
[oracle@devtstdbsrv bin]$ ./emca -config dbcontrol db -repos recreate
INFO: Stopping Database Control (this may take a while) ...
Sep 13, 2012 12:00:22 AM oracle.sysman.emcp.EMConfig perform
SEVERE: Listener is not up or database service is not registered with it. Start the Listener and register database service and run EM Configuration Assistant again .
Refer to the log file at /u01/app/oracle/cfgtoollogs/emca/orcl/emca_2012_09_12_23_59_03.log for more details.
Could not complete the configuration. Refer to the log file at /u01/app/oracle/cfgtoollogs/emca/orcl/emca_2012_09_12_23_59_03.log for more details.

Since, I cannot drop the re
[oracle@devtstdbsrv bin]$ hostname
[oracle@devtstdbsrv bin]$ emca -deconfig dbcontrol db
STARTED EMCA at Sep 13, 2012 12:04:04 AM
EM Configuration Assistant, Version 11.2.0.0.2 Production
Copyright (c) 2003, 2005, Oracle. All rights reserved.

Enter the following information:
Database SID: orcl
Do you wish to continue? [yes(Y)/no(N)]: yes
Sep 13, 2012 12:04:11 AM oracle.sysman.emcp.EMConfig perform
INFO: This operation is being logged at /u01/app/oracle/cfgtoollogs/emca/orcl/emca_2012_09_13_00_04_04.log.
Sep 13, 2012 12:04:11 AM oracle.sysman.emcp.util.DBControlUtil stopOMS
INFO: Stopping Database Control (this may take a while) ...
Enterprise Manager configuration completed successfully
FINISHED EMCA at Sep 13, 2012 12:05:07 AM

-- However drop repos operation failed with following error.
[oracle@devtstdbsrv bin]$ emca repos drop
....................................
SEVERE: Error creating the repository
Sep 13, 2012 12:35:19 AM oracle.sysman.emcp.EMReposConfig invoke
INFO: Refer to the log file at /u01/app/oracle/cfgtoollogs/emca/orcl/emca_repos_create_<date>.log for more details.

Looking at the log file reveals following error...

CONFIG: ORA-20001: SYSMAN already EXISTS..
ORA-06512: at line 17
oracle.sysman.assistants.util.sqlEngine.SQLFatalErrorException: ORA-20001: SYSMAN already EXISTS..
ORA-06512: at line 17

Work around -

Perform following actions in your DB
SQL> drop user sysman cascade;
User dropped.
SQL> DROP PUBLIC SYNONYM setemviewusercontext;
Synonym dropped.
SQL> DROP ROLE mgmt_user;
Role dropped.
SQL> DROP PUBLIC SYNONYM mgmt_target_blackouts;
Synonym dropped.
SQL> DROP USER mgmt_view;
User dropped.

There is one more thing to add here is that since my repos drop operation failed earlier I have to manually drop the repos using following method..

[oracle@devtstdbsrv bin]$ pwd
/u01/app/oracle/product/11.0/db_1/sysman/admin/emdrep/bin
[oracle@devtstdbsrv bin]$ ./RepManager devtstdbsrv 1521 orcl -action drop
Enter SYS user's password :
Enter repository user name : sysman
Getting temporary tablespace from database...
Found temporary tablespace: TEMP
Checking SYS Credentials ... rem error switch
OK.
rem error switch
Dropping the repository..
Checking for Repos User ... Exists.
Repos User exists..
Clearing EM Contexts ... OK.
Dropping EM users ...
Done.
Dropping Repos User ...

Dropping Roles/Synonymns/Tablespaces ... Done.

Dropped Repository Successfully.

If this is not done then later during create repos operation you will face following error..

CONFIG: ORA-00955: name is already used by an existing object
oracle.sysman.assistants.util.sqlEngine.SQLFatalErrorException: ORA-00955: name is already used by an existing object

Now try to re-create the repository once again. We will try to create first repository and then db-control in separate steps as follows.

[oracle@devtstdbsrv dbs]$ emca -repos create
STARTED EMCA at Sep 13, 2012 12:53:10 AM
EM Configuration Assistant, Version 11.2.0.0.2 Production
Copyright (c) 2003, 2005, Oracle. All rights reserved.
Enter the following information:
Database SID: orcl
Listener port number: 1531
Password for SYS user:
Password for SYSMAN user:
Do you wish to continue? [yes(Y)/no(N)]: yes
Sep 13, 2012 12:53:21 AM oracle.sysman.emcp.EMConfig perform
INFO: This operation is being logged at /u01/app/oracle/cfgtoollogs/emca/orcl/emca_2012_09_13_00_53_10.log.
Sep 13, 2012 12:53:21 AM oracle.sysman.emcp.EMReposConfig createRepository
INFO: Creating the EM repository (this may take a while) ...

Sep 13, 2012 2:05:44 AM oracle.sysman.emcp.EMReposConfig invoke
INFO: Repository successfully created
Enterprise Manager configuration completed successfully
FINISHED EMCA at Sep 13, 2012 2:05:45 AM

--If you check the logfile during this time, you will see following messages

Sep 13, 2012 12:53:21 AM oracle.sysman.emcp.util.GeneralUtil initSQLEngine
CONFIG: isLocalConnectionRequired: true. Connecting to database instance locally.
Sep 13, 2012 12:53:21 AM oracle.sysman.emcp.util.GeneralUtil initSQLEngineLoacly
CONFIG: SQLEngine connecting with SID: orcl, oracleHome: /u01/app/oracle/product/11.0/db_1, and user: SYS
Sep 13, 2012 12:53:21 AM oracle.sysman.emcp.util.GeneralUtil initSQLEngineLoacly
CONFIG: SQLEngine created successfully and connected
Sep 13, 2012 12:53:21 AM oracle.sysman.emcp.EMReposConfig createRepository

So the listener issue as described earlier during drop operation was not really a issue. It was more like a place holder error rather than actual error!!!

Now one needs to config the db-control as follows
[oracle@devtstdbsrv dbs]$ emca -config dbcontrol db

STARTED EMCA at Sep 13, 2012 2:40:50 AM
EM Configuration Assistant, Version 11.2.0.0.2 Production
Copyright (c) 2003, 2005, Oracle. All rights reserved.
Enter the following information:
Database SID: orcl
The database orcl is already being monitored by central agent(s)
Database Control will monitor the exisiting targets
Do you wish to continue? [yes(Y)/no(N)]: yes
Listener ORACLE_HOME [ /u01/app/oracle/product/11.0/db_1 ]:
Password for SYS user:
Password for DBSNMP user:
Password for SYSMAN user:
Email address for notifications (optional):
Outgoing Mail (SMTP) server for notifications (optional):
-----------------------------------------------------------------
You have specified the following settings
Database ORACLE_HOME ................ /u01/app/oracle/product/11.0/db_1
Local hostname ................ devtstdbsrv.localdomain
Listener ORACLE_HOME ................ /u01/app/oracle/product/11.0/db_1
Listener port number ................ 1521
Database SID ................ orcl
Email address for notifications ...............
Outgoing Mail (SMTP) server for notifications ...............
-----------------------------------------------------------------
Do you wish to continue? [yes(Y)/no(N)]: yes
Sep 13, 2012 2:41:05 AM oracle.sysman.emcp.EMConfig perform
INFO: This operation is being logged at /u01/app/oracle/cfgtoollogs/emca/orcl/emca_2012_09_13_02_40_49.log.
Sep 13, 2012 2:41:13 AM oracle.sysman.emcp.EMReposConfig uploadConfigDataToRepository
INFO: Uploading configuration data to EM repository (this may take a while) ...

Sep 13, 2012 2:42:39 AM oracle.sysman.emcp.EMReposConfig invoke

INFO: Uploaded configuration data successfully

Sep 13, 2012 2:42:46 AM oracle.sysman.emcp.util.DBControlUtil configureSoftwareLib

INFO: Software library configured successfully.

Sep 13, 2012 2:42:46 AM oracle.sysman.emcp.EMDBPostConfig configureSoftwareLibrary

INFO: Deploying Provisioning archives ...

INFO: Provisioning archives deployed successfully.

Sep 13, 2012 2:44:33 AM oracle.sysman.emcp.util.DBControlUtil secureDBConsole

INFO: Securing Database Control (this may take a while) ...

Sep 13, 2012 2:44:48 AM oracle.sysman.emcp.util.PlatformInterface executeCommand

WARNING: Error executing /u01/app/oracle/product/11.0/db_1/bin/emctl secure dbconsole -host devtstdbsrv.localdomain -sid orcl

Sep 13, 2012 2:44:48 AM oracle.sysman.emcp.EMDBPostConfig performConfiguration

WARNING: Error securing Database control.

Sep 13, 2012 2:44:48 AM oracle.sysman.emcp.EMDBPostConfig setWarnMsg

INFO: Error securing Database Control, Database Control has been brought up in non-secure mode. To secure the Database Control execute the following command(s):

1) Set the environment variable ORACLE_SID to orcl

2) /u01/app/oracle/product/11.0/db_1/bin/emctl stop dbconsole

3) /u01/app/oracle/product/11.0/db_1/bin/emctl config emkey -repos -sysman_pwd < Password for SYSMAN user >

4) /u01/app/oracle/product/11.0/db_1/bin/emctl secure dbconsole -sysman_pwd < Password for SYSMAN user >

5) /u01/app/oracle/product/11.0/db_1/bin/emctl start dbconsole

To secure Em Key, run /u01/app/oracle/product/11.0/db_1/bin/emctl config emkey -remove_from_repos -sysman_pwd < Password for SYSMAN user >

Sep 13, 2012 2:44:48 AM oracle.sysman.emcp.util.DBControlUtil startOMS

INFO: Starting Database Control (this may take a while) ...

Sep 13, 2012 2:47:55 AM oracle.sysman.emcp.util.PlatformInterface executeCommand
WARNING: Error executing /u01/app/oracle/product/11.0/db_1/bin/emctl start dbconsole
Sep 13, 2012 2:47:55 AM oracle.sysman.emcp.EMDBPostConfig performConfiguration
INFO: >>>>>>>>>>> The Database Control URL is http://devtstdbsrv.localdomain:1158/em <<<<<<<<<<<
Sep 13, 2012 2:47:55 AM oracle.sysman.emcp.EMDBPostConfig invoke
WARNING: Error starting Database Control.Please execute the following command(s).

1) Set the environment variable ORACLE_UNQNAME to Database unique name
2) /u01/app/oracle/product/11.0/db_1/bin/emctl start dbconsole

Error securing Database Control, Database Control has been brought up in non-secure mode. To secure the Database Control execute the following command(s):

1) Set the environment variable ORACLE_SID to orcl
2) /u01/app/oracle/product/11.0/db_1/bin/emctl stop dbconsole
3) /u01/app/oracle/product/11.0/db_1/bin/emctl config emkey -repos -sysman_pwd < Password for SYSMAN user >
4) /u01/app/oracle/product/11.0/db_1/bin/emctl secure dbconsole -sysman_pwd < Password for SYSMAN user >
5) /u01/app/oracle/product/11.0/db_1/bin/emctl start dbconsole

To secure Em Key, run /u01/app/oracle/product/11.0/db_1/bin/emctl config emkey -remove_from_repos -sysman_pwd < Password for SYSMAN user >
Error starting Database Control.Please execute the following command(s).
1) Set the environment variable ORACLE_UNQNAME to Database unique name
2) /u01/app/oracle/product/11.0/db_1/bin/emctl start dbconsole

I ran the commands from 1 thru 5 and my dbconsole was up and running...

Sunday, September 9, 2012

11.2.0.2 RootUpgrade.sh fails with ASM Bug

After installing the 11.2.0.2 stack for upgrade on the RAC nodes when I tried running the rootupgrade.sh on first node, I hit the following error

CRS-5017: The resource action "ora.asm start" encountered the following error:

ORA-03113: end-of-file on communication channel

Process ID: 0

Session ID: 0 Serial number: 0CRS-2674: Start of 'ora.asm' on 'apps_rac01' failed

[root@apps_rac01 grid]# ./rootupgrade.sh
Running Oracle 11g root script...
The following environment variables are set as:
ORACLE_OWNER= oracle
ORACLE_HOME= /u03/app/11.2.0.2/grid
Enter the full pathname of the local bin directory: [/usr/local/bin]:
The contents of "dbhome" have not changed. No need to overwrite.
The file "oraenv" already exists in /usr/local/bin. Overwrite it? (y/n)
[n]:
The file "coraenv" already exists in /usr/local/bin. Overwrite it? (y/n)
[n]:
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /u03/app/11.2.0.2/grid/crs/install/crsconfig_params

ASM upgrade has started on first node.

CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'apps_rac01'

CRS-2673: Attempting to stop 'ora.crsd' on 'apps_rac01'

CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on 'apps_rac01'

CRS-2673: Attempting to stop 'ora.LISTENER_SCAN2.lsnr' on 'apps_rac01'

CRS-2673: Attempting to stop 'ora.registry.acfs' on 'apps_rac01'

CRS-2673: Attempting to stop 'ora.DATA.dg' on 'apps_rac01'

...........................

CRS-2677: Stop of 'ora.diskmon' on 'apps_rac01' succeeded

CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'apps_rac01' has completed

CRS-4133: Oracle High Availability Services has been stopped.

Successfully deleted 1 keys from OCR.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
OLR initialization - successful
Adding daemon to inittab
ACFS-9200: Supported
ACFS-9300: ADVM/ACFS distribution files found.
ACFS-9312: Existing ADVM/ACFS installation detected.
ACFS-9314: Removing previous ADVM/ACFS installation.
ACFS-9315: Previous ADVM/ACFS components successfully removed.
ACFS-9307: Installing requested ADVM/ACFS software.
ACFS-9308: Loading installed ADVM/ACFS drivers.
ACFS-9321: Creating udev for ADVM/ACFS.
ACFS-9323: Creating module dependencies - this may take some time.
ACFS-9327: Verifying ADVM/ACFS devices.
ACFS-9309: ADVM/ACFS installation correctness verified.
Start of resource "ora.asm" failed
CRS-2672: Attempting to start 'ora.drivers.acfs' on 'apps_rac01'
CRS-2676: Start of 'ora.drivers.acfs' on 'apps_rac01' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'apps_rac01'
CRS-5017: The resource action "ora.asm start" encountered the following error:

ORA-03113: end-of-file on communication channel

Process ID: 0

Session ID: 0 Serial number: 0CRS-2674: Start of 'ora.asm' on 'apps_rac01' failed

CRS-2679: Attempting to clean 'ora.asm' on 'apps_rac01'

CRS-2681: Clean of 'ora.asm' on 'apps_rac01' succeeded

CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'apps_rac01'

CRS-2677: Stop of 'ora.drivers.acfs' on 'apps_rac01' succeeded

CRS-4000: Command Start failed, or completed with errors.

Failed to start Oracle Clusterware stack

Failed to start ASM at /u03/app/11.2.0.2/grid/crs/install/crsconfig_lib.pm line 1051.

/u03/app/11.2.0.2/grid/perl/bin/perl -I/u03/app/11.2.0.2/grid/perl/lib -I/u03/app/11.2.0.2/grid/crs/install /u03/app/11.2.0.2/grid/crs/install/rootcrs.pl execution failed

Errors from ASM logfile -
Exception [type: SIGIOT, unknown code] [ADDR:0x1D34] [PC:0x71F402, __kernel_vsyscall()+2] [exception issued by pid: 7476, uid: 500] [flags: 0x0, count: 1]
Errors in file /u02/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_lmd0_7476.trc (incident=6481):
ORA-07445: exception encountered: core dump [__kernel_vsyscall()+2] [SIGIOT] [ADDR:0x1D34] [PC:0x71F402] [unknown code] []
Incident details in: /u02/app/oracle/diag/asm/+asm/+ASM1/incident/incdir_6481/+ASM1_lmd0_7476_i6481.trc
PMON (ospid: 7458): terminating the instance due to error 482
Mon Aug 27 14:34:42 2012
ORA-1092 : opitsk aborting process

This happens due the bug with OCR when placed on ASM. The workaround is simple as follows. Remember one has to implement this on both nodes after running rootuprade.sh on the respective nodes.

Fix -
[root@apps_rac02 ~]# sudo su - oracle
[oracle@apps_rac02 ~]$ . oraenv
ORACLE_SID = [oracle] ? +ASM2
The Oracle base for ORACLE_HOME=/u03/app/11.2.0.2/grid is /u01/app/oracle
[oracle@apps_rac02 ~]$ sqlplus / as sysasm
Connected to an idle instance.
SQL> startup
ASM instance started
Total System Global Area 284565504 bytes
Fixed Size 1343692 bytes
Variable Size 258055988 bytes
ASM Cache 25165824 bytes
ASM diskgroups mounted
SQL> select * from v$instance;
INSTANCE_NUMBER INSTANCE_NAME
--------------- ----------------
HOST_NAME
----------------------------------------------------------------
VERSION STARTUP_T STATUS PAR THREAD# ARCHIVE LOG_SWITCH_WAIT
----------------- --------- ------------ --- ---------- ------- ---------------
LOGINS SHU DATABASE_STATUS INSTANCE_ROLE ACTIVE_ST BLO
---------- --- ----------------- ------------------ --------- ---
2 +ASM2
apps_rac02.localdomain
11.2.0.2.0 28-AUG-12 STARTED YES 0 STOPPED
ALLOWED NO ACTIVE UNKNOWN NORMAL NO

SQL> select name,state from v$asm_diskgroup;
NAME STATE
------------------------------ -----------
DATA MOUNTED

[root@apps_rac02 grid]# crsctl stop crs

CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'apps_rac02'

CRS-4133: Oracle High Availability Services has been stopped.

[root@apps_rac02 grid]# crsctl start crs

CRS-4123: Oracle High Availability Services has been started.

[root@apps_rac02 grid]# crsctl check crs

CRS-4638: Oracle High Availability Services is online

CRS-4537: Cluster Ready Services is online

CRS-4529: Cluster Synchronization Services is online

CRS-4533: Event Manager is online

Ref -

Clustered ASM Instances Failed to Start During 'rootupgrade.sh' While Upgrading from 11.2.0.1 to 11.2.0.2 [ID 1437325.1]

Node 2 -
[root@apps_rac02 ~]# cd /u03/app/11.2.0.2/grid/
[root@apps_rac02 grid]# ./rootupgrade.sh
Running Oracle 11g root script...
The following environment variables are set as:
ORACLE_OWNER= oracle
ORACLE_HOME= /u03/app/11.2.0.2/grid
Enter the full pathname of the local bin directory: [/usr/local/bin]:
The contents of "dbhome" have not changed. No need to overwrite.
The file "oraenv" already exists in /usr/local/bin. Overwrite it? (y/n)
[n]:
The file "coraenv" already exists in /usr/local/bin. Overwrite it? (y/n)
[n]:
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /u03/app/11.2.0.2/grid/crs/install/crsconfig_params
Creating trace directory
Failed to add (property/value):('OLD_OCR_ID/'-1') for checkpoint:ROOTCRS_OLDHOMEINFO.Error code is 256
// This is not an error but mere a warning //
ASM upgrade has started on first node.
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'apps_rac02'
CRS-2673: Attempting to stop 'ora.crsd' on 'apps_rac02'
CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on 'apps_rac02'

......................................

CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'apps_rac02' has completed

CRS-4133: Oracle High Availability Services has been stopped.

Successfully deleted 1 keys from OCR.

Creating OCR keys for user 'root', privgrp 'root'..

Operation successful.

OLR initialization - successful

ACFS-9200: Supported

ACFS-9300: ADVM/ACFS distribution files found.

ACFS-9312: Existing ADVM/ACFS installation detected.

ACFS-9314: Removing previous ADVM/ACFS installation.

ACFS-9315: Previous ADVM/ACFS components successfully removed.

ACFS-9307: Installing requested ADVM/ACFS software.

ACFS-9308: Loading installed ADVM/ACFS drivers.

ACFS-9321: Creating udev for ADVM/ACFS.

ACFS-9323: Creating module dependencies - this may take some time.

ACFS-9327: Verifying ADVM/ACFS devices.

ACFS-9309: ADVM/ACFS installation correctness verified.

Once this is done finish the rest of the upgrade process by clicking OK on OUI prompt box and wait for the message to complete the upgrade test.

Check the cluster status at the end of the upgrade on all nodes.

[oracle@apps_rac02 bin]$ crsctl check cluster -all

**************************************************************

apps_rac01:

CRS-4537: Cluster Ready Services is online

CRS-4529: Cluster Synchronization Services is online

CRS-4533: Event Manager is online

**************************************************************

apps_rac02:

CRS-4537: Cluster Ready Services is online

CRS-4529: Cluster Synchronization Services is online

CRS-4533: Event Manager is online

**************************************************************

Friday, September 7, 2012

CRS Active Version Doesn't Upgrade After RootUpgrade.sh Execution On Cluster

After the upgrade of the GI 11202 home the software version was showing as new version but active version was still being shown as old one.

[root@apps_rac01 apps_rac01]# crsctl query crs softwareversion

Oracle Clusterware active version on the cluster is [11.2.0.2.0]

[root@apps_rac01 apps_rac01]# crsctl query crs activeversion

Oracle Clusterware active version on the cluster is [11.2.0.1.0]

I tried researching issue a bit and found out that the issue was due the the fact that, if rootupgrade.sh is not successfully executed then then this issue arises. Because the clusterware updates the activeversion only after the rootupgrade.sh completes successfully on last node. However in my case it completed successfully so not sure what went wrong.

Hence I decided to update it manually since my binaries were in place. so following fixed I tried...
[root@apps_rac01 apps_rac01]# crsctl query crs activeversion
Oracle Clusterware active version on the cluster is [11.2.0.1.0]
[root@apps_rac01 apps_rac01]# crsctl set crs activeversion 11.2.0.2.0
Started to upgrade the Oracle Clusterware. This operation may take a few minutes.
Started to upgrade the CSS.
Started to upgrade the CRS.
The CRS was successfully upgraded.
Oracle Clusterware operating version was successfully set to 11.2.0.2.0
[root@apps_rac01 apps_rac01]# crsctl query crs activeversion
Oracle Clusterware active version on the cluster is [11.2.0.2.0]

And Bingo, it worked. Now my active version on all nodes reflected properly and I can now safely proceed with RAC Binaries upgrade.