Monday, April 22, 2013



Cluster Agent Install fails with OUI-35000 & PRKC-1044


While installing the cluster OEM agent on RAC, we hit following error.
[oracle@rac1a bin]$agentDownload.linux_x64 -b /u01/app/oracle/product -m oemgc1 -r 7799 -n XXXPRD -c "rac1a,rac1b" -y 

ERROR: OUI-35000: Fatal cluster error encountered (PRKC-1044 : Failed to check remote command execution setup for node rac1b using shells /usr/bin/ssh and /usr/bin/rsh
rac1b: Connection refusedPRKC-1044 : Failed to check remote command execution setup for node rac1a using shells /usr/bin/ssh and /usr/bin/rsh
rac1a: Connection refused). Correct the problem and try the operation again.
Completed with Status=1

Primarily the issue seems to be with rsh setup on cluster nodes. Initially it was not setup hence we got it done with help of SA's. So RSH was in place but again it failed with same error. 

So I decided to run it from another node in cluster and After running from it, hit the following err at end...

ERROR: Remote 'AttachHome' failed on nodes: 'rac1a'. Refer to '/u01/app/oraInventory/logs/installActions2013-04-12_05-47-02AM.log' for details.
You can manually re-run the following command on the failed nodes after the installation:
 /u01/app/oracle/product/agent11g/oui/bin/runInstaller -attachHome -noClusterEnabled ORACLE_HOME=/u01/app/oracle/product/agent11g ORACLE_HOME_NAME=agent11g1 CLUSTER_NODES=rac1a,rac1b "INVENTORY_LOCATION=/u01/app/oraInventory" LOCAL_NODE=<node on which command is to be run>.

That looks like install went fine but agents were not configured at all. Also when you check inventory the agent home was not registered with it, hence above message made some sense. 
So I went ahead and ran following command on both nodes one by one. 

runInstaller -attachHome -noClusterEnabled ORACLE_HOME=/u01/app/oracle/product/agent11g ORACLE_HOME_NAME=agent11g1 CLUSTER_NODES=rac1a,rac1b "INVENTORY_LOCATION=/u01/app/oracle/oraInventory" LOCAL_NODE=rac1a

On Node1 -

[oracle@rac1a bin]$ ./runInstaller -attachHome -noClusterEnabled ORACLE_HOME=/u01/app/oracle/product/agent11g ORACLE_HOME_NAME=agent11g1 CLUSTER_NODES=rac1a,rac1b "INVENTORY_LOCATION=/u01/app/oraInventory" LOCAL_NODE=rac1a
Starting Oracle Universal Installer...

Checking swap space: must be greater than 500 MB.   Actual 19077 MB    Passed
Preparing to launch Oracle Universal Installer from /tmp/OraInstall2013-04-12_06-46-16AM. Please wait ...[oracle@rac1a bin]$ The inventory pointer is located at /etc/oraInst.loc
The inventory is located at /u01/app/oraInventory
Please execute the 'null' script at the end of the session.
'AttachHome' was successful.

[oracle@rac1a bin]$ ./agentca -f -n XXXPRD -c rac1a,rac1b
CLUSTER_NAME environment variable is set to XXXPRD

The ORACLE_HOME=/u01/app/oracle/product/agent11g doesn't exist in the oraInventory specified in /oracletemp/PS1/oraInventory, Please specify the correct oraInventory location using -i option

Since there are multiple ORACLE_HOME on this host, the inventory pointer was wrong in /etc/oraInst.loc. 

[oracle@rac1a bin]$ cat /etc/oraInst.loc
#inventory_loc=/oracletemp/oraInventory
inst_group=dba

So inventory was wrong hence needs to fix it to point at right location after adding following entry
inventory_loc=/u01/app/oraInventory

[oracle@rac1a bin]$ cat /etc/oraInst.loc
#inventory_loc=/oracletemp/oraInventory
inventory_loc=/u01/app/oraInventory
inst_group=dba

[oracle@rac1a bin]$ ./agentca -f -n XXXPRD -c rac1a,rac1b -i /etc/oraInst.loc
CLUSTER_NAME environment variable is set to XXXPRD

Stopping the agent using /u01/app/oracle/product/agent11g/bin/emctl  stop agent
EM Configuration issue. /u01/app/oracle/product/agent11g/rac1a not found.
Running agentca using /u01/app/oracle/product/agent11g/oui/bin/runConfig.sh ORACLE_HOME=/u01/app/oracle/product/agent11g ACTION=Configure MODE=Perform RESPONSE_FILE=/u01/app/oracle/product/agent11g/response_file RERUN=TRUE INV_PTR_LOC=/etc/oraInst.loc COMPONENT_XML={oracle.sysman.top.agent.10_2_0_1_0.xml}
Perform - mode is starting for action: Configure

Perform - mode finished for action: Configure

You can see the log file: /u01/app/oracle/product/agent11g/cfgtoollogs/oui/configActions2013-04-12_07-14-18-AM.log
Starting Oracle Universal Installer...

Checking swap space: must be greater than 500 MB.   Actual 19077 MB    Passed
The inventory pointer is located at /etc/oraInst.loc
The inventory is located at /u01/app/oraInventory
'UpdateNodeList' was successful.

Starting the agent using /u01/app/oracle/product/agent11g/bin/emctl  start agent
Oracle Enterprise Manager 11g Release 1 Grid Control 11.1.0.1.0
Copyright (c) 1996, 2010 Oracle Corporation.  All rights reserved.
Starting agent ......... started.

Stopping the agent using /u01/app/oracle/product/agent11g/bin/emctl  stop agent
Oracle Enterprise Manager 11g Release 1 Grid Control 11.1.0.1.0
Copyright (c) 1996, 2010 Oracle Corporation.  All rights reserved.
Stopping agent ... stopped.

Running Agent Addon Configuration using /u01/app/oracle/product/agent11g/perl/bin/perl /u01/app/oracle/product/agent11g/sysman/install/AddonConfig.pl
Arguments passed

Configuring Addon from xml : oracle.sysman.plugin.virtualization.agent.11_1_0_1_0.xml

Running Command : /u01/app/oracle/product/agent11g/oui/bin/runConfig.sh ORACLE_HOME=/u01/app/oracle/product/agent11g ACTION=configure MODE=perform RERUN=true  RESPONSE_FILE=/u01/app/oracle/product/agent11g/vt_responsefile COMPONENT_XML={oracle.sysman.plugin.virtualization.agent.11_1_0_1_0.xml}
 Setting the invPtrLoc to /u01/app/oracle/product/agent11g/oraInst.loc

perform - mode is starting for action: configure
perform - mode finished for action: configure

You can see the log file: /u01/app/oracle/product/agent11g/cfgtoollogs/oui/configActions2013-04-12_07-15-12-AM.log

Agent Addon Configuration done

Starting the agent using /u01/app/oracle/product/agent11g/bin/emctl  start agent
Oracle Enterprise Manager 11g Release 1 Grid Control 11.1.0.1.0
Copyright (c) 1996, 2010 Oracle Corporation.  All rights reserved.
Agent is already running

On Node2 - 

oracle@rac1b /u01/app/oracle/product/agent11g/bin> ./agentca -f -n XXXPRD -c rac1a,rac1b
CLUSTER_NAME environment variable is set to XXXPRD

Stopping the agent using /u01/app/oracle/product/agent11g/bin/emctl  stop agent
EM Configuration issue. /u01/app/oracle/product/agent11g/rac1b not found.
Running agentca using /u01/app/oracle/product/agent11g/oui/bin/runConfig.sh ORACLE_HOME=/u01/app/oracle/product/agent11g ACTION=Configure MODE=Perform RESPONSE_FILE=/u01/app/oracle/product/agent11g/response_file RERUN=TRUE INV_PTR_LOC=/u01/app/oracle/product/agent11g/oraInst.loc COMPONENT_XML={oracle.sysman.top.agent.10_2_0_1_0.xml}
Perform - mode is starting for action: Configure

Running Agent Addon Configuration using /u01/app/oracle/product/agent11g/perl/bin/perl /u01/app/oracle/product/agent11g/sysman/install/AddonConfig.pl
Arguments passed

Configuring Addon from xml : oracle.sysman.plugin.virtualization.agent.11_1_0_1_0.xml

Running Command : /u01/app/oracle/product/agent11g/oui/bin/runConfig.sh ORACLE_HOME=/u01/app/oracle/product/agent11g ACTION=configure MODE=perform RERUN=true  RESPONSE_FILE=/u01/app/oracle/product/agent11g/vt_responsefile COMPONENT_XML={oracle.sysman.plugin.virtualization.agent.11_1_0_1_0.xml}
 Setting the invPtrLoc to /u01/app/oracle/product/agent11g/oraInst.loc

perform - mode is starting for action: configure
perform - mode finished for action: configure

You can see the log file: /u01/app/oracle/product/agent11g/cfgtoollogs/oui/configActions2013-04-12_07-21-35-AM.log
 Agent Addon Configuration done
Starting the agent using /u01/app/oracle/product/agent11g/bin/emctl  start agent
Oracle Enterprise Manager 11g Release 1 Grid Control 11.1.0.1.0
Copyright (c) 1996, 2010 Oracle Corporation.  All rights reserved.
Agent is already running

Once done, cluster was properly discovered by OEM GC and agent was able to upload data successfully....

Wednesday, April 3, 2013

PRCR-1013 : Failed to start resource  OR 

CRS-2640: Required resource is missing


During start of DB resource we hit following error...

Error - 
PRCR-1013 : Failed to start resource ora.ASdb.db
PRCR-1064 : Failed to start resource ora.ASdb.db on node orarac1d
CRS-2640: Required resource 'ora.ASDATA03.dg' is missing.

So it says the disk group missing. When I check the ASM for the existing disk group I found that the complain was correct and I indeed cannot see the ASDATA03 disk group. After some digging I found out that disk group has been renamed from ASDATA03 to ASDATA03_05 hence the dependency was failing during start of the resource.

Diskgroup Status in ASM for ASDB -
ASDATA02_04                  MOUNTED     EXTERN             0
ASDATA03_05                  MOUNTED     EXTERN             0
ASDATA01                     MOUNTED     EXTERN             0

Disgroups registered with Cluster resouse ASDB- 
Disk Groups: ASDATA01,FRA (Missing DG ASDATA02_04 and ASDATA03_05)

Currently Datafile and archlog files are distributed on following - 
Diskgorups - ASDATA01, ASDATA03_05, ASDATA02_04

Since ASM diskgroup ASDATA02 and ASDATA03 hAS been physically dropped, We should them from CRS start/stop dependency for databASe resource 

[grid@orarac1d bin]$ crsctl stat res -p | grep ASDB
NAME=ora.FRA_ASDB.dg
DB_UNIQUE_NAME=ASDB
GEN_AUDIT_FILE_DEST=/u01/app/oracle/admin/ASDB/adump
GEN_USR_ORA_INST_NAME@SERVERNAME(orarac1c)=ASDB1
GEN_USR_ORA_INST_NAME@SERVERNAME(orarac1d)=ASDB2
SERVER_POOLS=ora.ASDB
SPFILE=+ASDATA01/ASDB/spfileASDB.ora

START_DEPENDENCIES=hard(ora.ASDATA01.dg,ora.FRA.dg, ora.ASDATA03.dg,ora.FRA_ASDB.dg,ora.ASDATA02.dg)
weak(type:ora.listener.type,global:type:ora.scan_listener.type,uniform:ora.ons,uniform:ora.eons) pullup(ora.ASDATA01.dg,ora.FRA.dg)

STOP_DEPENDENCIES=hard(intermediate:ora.ASm,shutdown:ora.ASDATA01.dg,shutdown:ora.FRA.dg,shutdown:ora.ASDATA03.dg, shutdown:ora.FRA_ASDB.dg, shutdown:ora.ASDATA02.dg)

Fix –
1. Modify ASM disk group dependency requirement
$ srvctl modify databASe -d MYDB -a “ASDATA01,ASDATA03_05,ASDATA02_04,FRA,FRA_ASDB”

2. Disable disk group if exist
$ srvctl disable diskgroup -g ASDATA03,ASDATA02 

3. Drop disk group info from OCR. Make sure you dont have any existing DB using this disk groups
$ srvctl remove diskgroup -g ASDATA03,ASDATA02 –f

Once you finish this steps and next time around when you start or stop the resource using srvctl utility, it wont complain and will start resources as expected.