Monday, November 27, 2017


Unable To Start DSE Post 5.1 Upgrade

After upgrading our DSE servers running SPARK, from 5.0.8 version to 5.1.0 version, it refused to start with following error. 


ERROR [main] 2017-11-20 14:33:58,931  DseDaemon.java:494 - Unable to start DSE server.
com.datastax.bdp.plugin.PluginManager$PluginActivationException: Unable to activate plugin com.datastax.bdp.plugin.DseFsPlugin

Caused by: org.apache.cassandra.exceptions.ConfigurationException: DSEFS does not support authentication method configured with org.apache.cassandra.auth.PasswordAuthenticator. DSEFS supports INTERNAL, LDAP and DIGEST authentication schemes configured with DseAuthenticator.

As you can see, the default Authenticator and Authorizer are being deprecated and now changed to following. 

authenticator: com.datastax.bdp.cassandra.auth.DseAuthenticator
authorizer: com.datastax.bdp.cassandra.auth.DseAuthorizer
role_manager: com.datastax.bdp.cassandra.auth.DseRoleManager

In the dse.yaml file, configure the corresponding options:

Configure the DSE Authenticator by uncommenting the authentication_options and changing the settings.
FROM -
# authentication_options:
#     enabled: false
#     default_scheme: internal

TO -
enabled: true
default_scheme: internal


Configure the DSE Role Manager by uncommenting role_management_options and setting the mode
role_management_options:
     mode: internal

Configure the DSE Authorizer by uncommenting the authorization_options and changing the settings.
 authorization_options:
     enabled: true
#     transitional_mode: normal
#     allow_row_level_security: false


Once done, run nodetool upgradesstables, if you have not already ran it. 

[root@hqidlinfdb36 cassandra]# nodetool  upgradesstables
WARN  14:50:54,675 Small cdc volume detected at /var/lib/cassandra/cdc_raw; setting cdc_total_space_in_mb to 243.  You can override this in cassandra.yaml
WARN  14:50:54,681 memtable_cleanup_threshold has been deprecated and should be removed from cassandra.yaml


You must configure the replication factors appropriate for using DSE Security in production environments. The keyspaces that require an increased replication factor are:
  • system_auth
  • dse_security

Change the system_auth keyspace RF:
ALTER KEYSPACE system_auth
    WITH REPLICATION= {'class' : 'NetworkTopologyStrategy',
                       'data_center_name' : N,
                       'data_center_name' : N};

Example - ALTER KEYSPACE system_auth WITH REPLICATION =
  {'class' : 'NetworkTopologyStrategy', 'dc1' : 3, 'dc2' : 2};


Change the dse_security keyspace RF:

ALTER KEYSPACE "dse_security"
   WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', 'dc1' : 3, 'dc2' : 2};

Run the nodetool repair on the security keyspaces.
nodetool repair --full system_auth
nodetool repair --full dse_security

After changing the replication strategy, you must run nodetool repair with the --full option. when running full repair, you may see following warning where one has to modify the RF for dse_leases key space. 

insufficient replication (you created a new DC and didn't ALTER KEYSPACE dse_leases) and the duration (30000) being different (you have to disable/delete/recreate the lease to change the duration).
java.io.IOException: No live replicas for lease Leader/master/5.1.dc1 in table dse_leases.leases Nodes [/10.22.102.27] are all down/still starting.

ALTER  KEYSPACE dse_leases WITH replication = {'class': 'NetworkTopologyStrategy', 'dc1': '3'}  AND durable_writes = true;

Once this is done, you are now running successfully on 5.1 version. 


Wednesday, November 8, 2017

DB Does Not Open Post FlashBack ORA-01139: RESETLOGS option only valid after an incomplete database recovery


During one of the flashback activity on one of RAC DB on Exadata system, we faced a challenge while opening DB post flashback. 
we tried to open DB in resetlogs on single instance and it would not open. 

alter database open resetlogs 

ERROR at line 1: 
ORA-38760: This database instance failed to turn on flashback database 

we try then turning the flashback off then open the database but after turning flash back off following is the status. 

alter database flashback off; 
SYS@DBS01P>select flashback_on from v$database; 

FLASHBACK_ON 
------------------ 
RESTORE POINT ONLY 

SYS@DBS01P>select open_mode from v$database; 

OPEN_MODE 
-------------------- 
MOUNTED 

alter database open resetlogs 

ERROR at line 1: 
ORA-01139: RESETLOGS option only valid after an incomplete database recovery 

ERROR at line 1: 
ORA-00600: internal error code, arguments: [kcvcrv_fb_inc_mismatch], [46], [525459024], [959427550], [45], [4004329076], [958125589], [], [], [], [], [] 


Upon research we found that the issue is locks on datafiles and controlfile. 
after flashback database the locks on header and data files didnt clear and caused the failure while starting the DB. 

Hence we need to  recreate controfile from trace with NO RESETLOGS as resetlogs was already done. 

we tried doing but but we faced an error stating that we need DB Open in Read/Write Mode, which is not happening to us. So another way is to restore controlfile to temp location and use it to recreate controlfile. 

1. create pfile from current spfile. 
create pfile='/tmp/DBS01P_new.ora' from spfile;

2. Open pfile and modify controlfile location. 
control_files='/tmp/DBS01P_new.ctl'

3. Mount the instance with new pfile 
startup nomount pfile='/tmp/DBS01P_new.ora'

4. Restore controlfile from backup and create trace file 

Backup Piece -  Piece Name: /ora/DBS01P/backup01/c-2841507472-20171106-01
restore controlfile to '/tmp/DBS01P_new.ctl' from '/ora/DBS01P/backup01/c-2841507472-20171106-01'

5. Create trace from current controlfile.
alter database backup controlfile to trace

6. Open the trace file and just copy the NORESETLOGS Command as follows to new file. 

CREATE CONTROLFILE REUSE DATABASE "DBS01P" NORESETLOGS FORCE LOGGING ARCHIVELOG
MAXLOGFILES 192
MAXLOGMEMBERS 3
MAXDATAFILES 1024
MAXINSTANCES 32
MAXLOGHISTORY 4192
LOGFILE
  GROUP 1 '+DATAC1/DBS01P/onlinelog/group_1.296.907412313'  SIZE 4096M BLOCKSIZE 512,
  GROUP 2 '+DATAC1/DBS01P/onlinelog/group_2.295.907412287'  SIZE 4096M BLOCKSIZE 512,
  GROUP 3 '+DATAC1/DBS01P/onlinelog/group_3.277.907412331'  SIZE 4096M BLOCKSIZE 512,
  GROUP 4 '+DATAC1/DBS01P/onlinelog/group_4.313.907412375'  SIZE 4096M BLOCKSIZE 512,
  GROUP 5 '+DATAC1/DBS01P/onlinelog/group_5.287.907412391'  SIZE 4096M BLOCKSIZE 512,
  GROUP 6 '+DATAC1/DBS01P/onlinelog/group_6.278.907412419'  SIZE 4096M BLOCKSIZE 512,
  GROUP 7 '+DATAC1/DBS01P/onlinelog/group_7.289.907412449'  SIZE 4096M BLOCKSIZE 512,
  GROUP 8 '+DATAC1/DBS01P/onlinelog/group_8.288.907412459'  SIZE 4096M BLOCKSIZE 512
-- STANDBY LOGFILE
DATAFILE
  '+DATAC1/DBS01P/datafile/system.1405.904054341',
  '+DATAC1/DBS01P/datafile/sysaux.1403.904054339',
  .............
  '+DATAC1/DBS01P/datafile/dpa_tbs.9286.950976195'
CHARACTER SET AL32UTF8
;

 29   30   31   32   33   34   35   36   37   38   39   40   41   42   43   44   45   46   47   48   49   50   51   52   53   54   55   56   57   58 
CREATE CONTROLFILE REUSE DATABASE "DBS01P" NORESETLOGS FORCE LOGGING ARCHIVELOG
*
ERROR at line 1:
ORA-01503: CREATE CONTROLFILE failed
ORA-12720: operation requires database is in EXCLUSIVE mode

## This activity needs the DB to start in non cluster mode. 

SYS@DBS01P1>alter system set cluster_database=false scope=spfile sid='*';
System altered.

SYS@DBS01P1>shut immediate;
ORA-01507: database not mounted
ORACLE instance shut down.

SYS@DBS01P1>startup nomount
ORA-32004: obsolete or deprecated parameter(s) specified for RDBMS instance
ORACLE instance started.

Total System Global Area 1.0262E+11 bytes

Fixed Size                  2265224 bytes
Variable Size            1.7985E+10 bytes
Database Buffers         8.4289E+10 bytes
Redo Buffers              342851584 bytes

SYS@DBS01P1>sho parameter cluster

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
cluster_database                     boolean     FALSE
cluster_database_instances           integer     1

SYS@DBS01P1> @create_ctl
;
Control file created.

SYS@DBS01P1>SYS@DBS01P1>recover database;
ORA-00283: recovery session canceled due to errors
ORA-00264: no recovery required

SYS@DBS01P1>alter database open ;
Database altered.

SYS@DBS01P1>select  open_mode from v$database;
OPEN_MODE
--------------------
READ WRITE

Now one can put the DB in cluster mode and start the other instances. 
Hope this will help you to overcome any controlfile corruption issues.