Thursday, August 29, 2013


 Instance Caging

Recently on one of our production cluster running 8 nodes and having shared environment with multiple applications, ran into high CPU usage. Since this is shared environment, running few databases, its always difficult to pinpoint which is culprit. Hence after more than couple of incidents we were able to point out which one is actually causing mayhem. Also the pattern of problem is not predictable as application is used across globe. But since hang was not momentarily it really affects all other candidates as well. This is serious problem in shared environment as one wants to save cost on hardware and software, which is quite under stable. However, this also ensures the optimum usage of the capacity as not all the databases are always used up to the capacity (again, case to case).

So the question is how do we avoid such an instance and still provide the optimum usage of resources ? 

11g, introduces one new feature called "instance caging", where you can have more than one instance on the same server, and you can share the CPUs reported by the operating system

-- To find out total cpu and cores available to your system
SQL> select cpu_count_current,cpu_core_count_current from v$license;

System with 12 physical cpu cores


CPU_COUNT_CURRENT CPU_CORE_COUNT_CURRENT

----------------- ----------------------

               12                     12


System with 6 physical cpu  and 12 cores, Hyper Threaded


CPU_COUNT_CURRENT CPU_CORE_COUNT_CURRENT

----------------- ----------------------

               12                     6


One can use following query to find out the CPU SOCKET and CPU CORES available to database 

    select * from v$osstat where stat_name like '%CPU%'





In Hyper threaded Servers, the output will be like following.


 
Now one needs to determine how the database instances on your server will share the CPU.  With Instance Caging, each instance's cpu_count specifies the maximum number of CPUs you want it to use at any time for that particular instance. The sum of the cpu_counts across all database instances determines the amount of isolation between the database instances and the efficiency of the server.

For maximum isolation one can use  "partition" approach. With the partition approach, the sum of the cpu_counts is less than or equal to the number of CPUs. 

For example, suppose the total number of CPUs (i.e. CPU threads/cores) is 16.  Using the partition approach, we could set cpu_count=8 for database A, cpu_count=4 for database B, and cpu_count=4 for database C.  The sum of the cpu_counts is 16, which equals the number of CPUs.

The disadvantage of the partition approach is that any CPU unused by one database instance cannot be used by another. Therefore, for non-critical databases where you also want to achieve better CPU utilization efficiency, use the "over-subscribe" approach. With the over-subscribe approach, the sum of the cpu_counts is less than or equal to 3x the number of CPUs.

For example, for a server with 16 CPUs, you could use the over-subscribe approach and set cpu_count=8 for database A, cpu_count=8 for database B, and cpu_count=8 for database C.  The sum of the cpu_counts is 24, which is greater than the number of CPUs.  Therefore, if all databases are using their full CPU allocation, there will be some CPU contention.

Enable Instance Caging – 

-- Set the cpu_count initialization parameter.
-- This is a dynamic parameter, and can be set with the following statement:

SQL> ALTER SYSTEM SET CPU_COUNT = 4;

To verify that Instance Caging is enabled, check that "instance_caging" equals "ON" and that "cpu_count" is set appropriately.

SQL> select instance_caging from v$rsrc_plan where is_top_plan = 'TRUE';

INS

---

ON

Once this is enabled,  The CPU spikes will be limited to only specified number of cpu's and will be not be hampering other databases.



Wednesday, August 28, 2013

Identify The Top Temp Consumers In DataBase

First of all one needs to identify the sessions using temp space. The view reflecting this info is v$tempseg_usage. 

One imp note is that v$tempseg_usage is same as v_$sort_area_usage view

SELECT s.sid, s.username, u.tablespace , s.sql_hash_value, u.sqlhash hash_value, u.segtype, u.contents, u.blocks
FROM gv$session s, gv$tempseg_usage u
WHERE s.saddr=u.session_addr
order by u.blocks desc;











The main segment type one needs to check is SORT segment as that is what we are after. 
Now we have session that is causing strain on our database sort area we need to check the statement causing all the mayhem. If you are in luck there might be only open cursor against this session. However, in heavily used production databases there will be multiple open cursors against one session. 
So the view you need to consult is v$open_cursor. We also need to identify the number of child cursor for this parent cursor as well as text. Following is the way to do it..


select sql_id, sorts ,  sql_text from gv$sql 
where hash_value in 
(select hash_value from gv$open_cursor where sid=1920)
order by 2 desc;




As you can see there are possible two possible sql that are using temp at the moment. Now you have identified the possible culprits you can investigate further. 


Following Script will tell you how much free space you have available in Temp Tablespace.
SQL> select b.tablespace_name, sum(b.bytes_used/1024/1024) used_mb,
     sum(bytes_free/1024/1024) free_mb
     from v$temp_space_header b
     group by b.tablespace_name;

Thursday, August 1, 2013

NET-BACKUP Configuration For Oracle



If the oracle_link script is executed, it will display a message similar to the following and backups of the Oracle instance will not utilize the NetBackup implementation of the Oracle SBT API.

Following is the work around to fix the issue. This is linux/Unix specific. For AIX one has to do modifications as per environment. 

On the client host, manually create the needed symbolic link between the Oracle installation and the NetBackup installation.

1)     Confirm that the NetBackup shared libraries are present on the client host and of the correct version.

$ cd /usr/openv/netbackup/
$ ./bin/goodies/support/versioninfo –f bin/libobk.so
======= bin/libobk.so =======
@(#) NetBackup_7.1 1234567890

2)     Check for an existing libobk being used by Oracle and rename if present.

$ cd $ORACLE_HOME/lib
$ ls libobk.so
libobk.so
$ mv libobk.so libobk.so.orig

3)     Create the symbolic link to the NetBackup shared library.

$ ln –s /usr/openv/netbackup/bin/libobk.so libobk.so
  
 An updated oracle_link script is included in NetBackup release update 7.1.0.4 and release 7.5.


Error During OEM Agent Install On RAC with XML-20108: (Fatal Error) Start of root element expected. Completed with Status=102


During install of an agent on RAC environment we hit following error.



ERROR: <Line 1, Column 1>: XML-20108: (Fatal Error) Start of root element expected.

Completed with Status=102



Looked at MOS but that didn’t gave up any clues and has to look around a bit. Eventually when trying to look at installed products on one of the nodes in cluster, I found that entire inventory.xml file is missing.

So that might the cause of problem here.

Though I was not sure whether the problem is due to missing file so decided to go step by step. First thing first, I need an inventory.xml.



Now the standard way is to create inventory is to run runInstaller executable with attachHome option. However, I decided to try another work around, much simpler than earlier one.  The reason being that, this is functional node and cannot do any changes to current configuration without change approval.  So I decided to copy the inventory.xml from another node to the current node(assuming that both have ideal ORACLE products installed as they are part of same cluster).



Following is the snippet of the XML file copied.



<?xml version="1.0" standalone="yes" ?>

<!-- Copyright (c) 1999, 2010, Oracle. All rights reserved. -->

<!-- Do not modify the contents of this file by hand. -->

<INVENTORY>

<COMPOSITEHOME_LIST>

</COMPOSITEHOME_LIST>

<VERSION_INFO>

   <SAVED_WITH>11.1.0.8.0</SAVED_WITH>

   <MINIMUM_VER>2.1.0.6.0</MINIMUM_VER>

</VERSION_INFO>

<HOME_LIST>

<HOME NAME="Ora11g_gridinfrahome1" LOC="/u01/app/11.2.0/grid" TYPE="O" IDX="1" CRS="true">

   <NODE_LIST>

      <NODE NAME="xxxxxxxxx01"/>

      <NODE NAME=" xxxxxxxxx02"/>

   </NODE_LIST>

</HOME>

<HOME NAME="OraDb11g_home1" LOC="/oracle/rdbms/11203" TYPE="O" IDX="2">

   <NODE_LIST>

      <NODE NAME=" xxxxxxxxx01"/>

      <NODE NAME=" xxxxxxxxx02"/>

   </NODE_LIST>

</HOME>

</HOME_LIST>

</INVENTORY>



Now with inventory.xml file in place when I tried installation, it worked.



Lesson learned –

Inventory.xml file is needed as any new product installed will add an entry in that file hence missing file may cause failures…