Thursday, June 12, 2014

Client TNS Connection Time Out 


Error Descripton -

TNS-12535: TNS:operation timed out
    ns secondary err code: 12606
    nt main err code: 0
    nt secondary err code: 0
    nt OS err code: 0

##Following is the brief list of clients that got affected. This is not conclusive list, only indicative.

Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=151.149.159.163)(PORT=3687))
Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=157.241.198.49)(PORT=49492))
Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=151.149.159.90)(PORT=2338))

SQLNET.ORA Parameters -

SQLNET.EXPIRE_TIME = 10
# following parameter is not defined but has default value of 60 seconds
SQLNET.INBOUND_CONNECT_TIMEOUT =60

Diagnosis -
Diagnosis is done based on hosts/clients facing issue of connection time out. I have broken the analysis client wise for easier understanding of the problem.

HOST=157.241.198.49

# following line identifies the batch server as user and time at which connection establishment request is accepted by listener.
Listener directive -
02-MAY-2014 06:01:44 * (CONNECT_DATA=(SID=xxx)(CID=(PROGRAM=)(HOST=__jdbc__)(USER=aimsbatc))) * (ADDRESS=(PROTOCOL=tcp)(HOST=157.241.198.49)(PORT=31563)) * establish * xxx* 0

# Following line describes the error recorded for the same client, same port on DB. Pls see that establish and time out time has 60 seconds gap.
DB Alert -
2014-05-02 06:02:44.552000 -04:00
TNS-12535: TNS:operation timed out
    ns secondary err code: 12606
    nt main err code: 0
    nt secondary err code: 0
    nt OS err code: 0
  Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=157.241.198.49)(PORT=31563))

---------------------------------------------------------------------------
HOST=157.241.198.49
Listener directive -
03-MAY-2014 01:31:42 * (CONNECT_DATA=(SID=P2PAIMS)(CID=(PROGRAM=)(HOST=__jdbc__)(USER=aimsbatc))) * (ADDRESS=(PROTOCOL=tcp)(HOST=157.241.198.49)(PORT=44309)) * establish * P2PAIMS * 0

DB Alert -
Fatal NI connect error 12170.
  Time: 03-MAY-2014 01:32:42
TNS-12535: TNS:operation timed out
    ns secondary err code: 12606
    nt main err code: 0
    nt secondary err code: 0
    nt OS err code: 0
  Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=157.241.198.49)(PORT=44309))
WARNING: inbound connection timed out (ORA-3136)

## SIMILAR ERROR FOR SAME HOST AND PORT, ARE ALSO RECORDED ON  03 AND 05 MAY FOR DIFFERENT USERS.
Listener directive -

03-MAY-2014 05:30:54 * (CONNECT_DATA=(SID=P2PAIMS)(CID=(PROGRAM=)(HOST=__jdbc__)(USER=aimsbatc))) * (ADDRESS=(PROTOCOL=tcp)(HOST=157.241.198.49)(PORT=44309)) * establish * P2PAIMS * 0

05-MAY-2014 15:05:06 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=P2PAIMS)(CID=(PROGRAM=sqlplus)(HOST=tryprmaim01.intra.searshc.com)(USER=aimsuser))) * (ADDRESS=(PROTOCOL=tcp)(HOST=157.241.198.49)(PORT=44309)) * establish * P2PAIMS * 0

---------------------------------------------------------------------------------------------------------
HOST=157.241.198.49
Listener directive -
06-MAY-2014 00:32:14 * (CONNECT_DATA=(SID=xxx)(CID=(PROGRAM=)(HOST=__jdbc__)(USER=aimsbatc))) * (ADDRESS=(PROTOCOL=tcp)(HOST=157.241.198.49)(PORT=25360)) * establish * xxx* 0

DB Alert -
Time: 06-MAY-2014 00:33:14
 
TNS-12535: TNS:operation timed out
    ns secondary err code: 12606
    nt main err code: 0
    nt secondary err code: 0
    nt OS err code: 0
  Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=157.241.198.49)(PORT=25360))

----------------------------------------------------------------------------
HOST=151.149.159.90

# For following user the connection terminated after 38 minutes, whcih could be due to dead connection detection.
Listener directive -
06-MAY-2014 14:51:00 * (CONNECT_DATA=(SID=xxx)(CID=(PROGRAM=)(HOST=__jdbc__)(USER=rgupta4))) * (ADDRESS=(PROTOCOL=tcp)(HOST=151.149.159.90)(PORT=2122)) * establish * xxx* 0

DB Alert -
2014-05-06 15:28:02.831000 -04:00

TNS-12535: TNS:operation timed out
    ns secondary err code: 12560
    nt main err code: 505

TNS-00505: Operation timed out
    nt secondary err code: 110
    nt OS err code: 0
  Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=151.149.159.90)(PORT=2122))

Interpretation -

As defined earlier the listener receives the request to create connection. Here for the authentication purpose the client has to complete the handshake in 60 seconds. If the client does not able to respond back in 60 seconds due to load on app server or congested network or DB could not process the connection due heavy load on the server AND if elapsed time > 60 seconds, the TIME OUT will occur.

Solution/Workaround -

It is often necessary to increase the values for INBOUND CONNECT TIMEOUT at  both the listener and the database in order to resolve this issue. It is usually advisable to set the database (sqlnet.ora) value slightly higher than the listener (listener.ora).  The authentication process is more demanding for the database than the listener.

1. To set these parameters to use values higher than the default of 60 seconds in RDBMS_HOME/network/admin/sqlnet.ora, follow these instructions and restart the listener.  There is no need to restart Oracle:

                SQLNET.INBOUND_CONNECT_TIMEOUT = 120

2. To set the INBOUND CONNECT TIMEOUT on listener, first check the existing value.

LSNRCTL> show inbound_connect_timeout
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=EXTPROC1531)))
LISTENER parameter "inbound_connect_timeout" set to 60
The command completed successfully

# Now set the value to 110 seconds
LSNRCTL>set  INBOUND_CONNECT_TIMEOUT_LISTENER=110

Other Areas -

If the application server is using Connection Pool, then following parameters worth to look at. These are standard features offered by various application servers and not the platform/Technology specific.
If you run into a timeout it can mean that your definition of the timeouts in the data-source are not sufficient. The following timeout parameters can be set:

abandoned-connection-timeout
Sets the maximum time that a connection can remain unused before the connection is closed and returned to the cache. A connection is considered unused if it has not had SQL database activity. When AbandonedConnectionTimeout is set, JDBC monitors SQL database activity on each logical connection. For example, when stmt.execute() is invoked on the connection, a heartbeat is registered to convey that this connection is active. The heartbeats are set at each database execution. If a connection has been inactive for the specified amount of time, the underlying connection is reclaimed and returned to the cache for reuse.

time-to-live-timeout
The maximum time, in seconds, a used connection may be active. When this timeout expires, the used connection is unconditionally closed, the relevant statement handles are canceled, and the connection is returned to the connection pool.

Inactivity timeout
Maximum time for which an unused connection can be kept in the pool.

However, there was one caveat here. Even after changing time out value, the connection was still timing out in 60 seconds only. Upon close examination we found out that this home was upgraded from 11.2.0.3 to 11.2.0.4 . 

Normally the SQLNET.ORA file contained in TNS_ADMIN_HOME affects the time out values on DB. In this case the value was modified in 11.2.0.4 home but TNS_ADMIN_HOME  was pointing to 11.2.0.3 home hence the new time out value was not reflecting. So there are 2 ways to fix it.
1. Either update TNS_ADMIN value to 11.2.0.4 home
2. Modify the 11.2.0.3 home SQLNET.ORA file. 

Once we fixed it the issue got resolved.