Monday, August 31, 2015

procedure to drop empty partitions

set serveroutput on
declare
v_count number(4);
Var_sql varchar2(1000);
Var_sql1 varchar2(1000);
BEGIN
  FOR item IN
  ( SELECT partition_name
from dba_tab_partitions
where partition_name in ('P20061412_7','P20061412_5')
and table_name='SC_RAW'
and table_owner='RRS' )
  LOOP
var_sql:='select count(1)  from rrs.SC_RAW partition(' || item.partition_name || ')';
execute immediate var_sql into v_count;
dbms_output.put_line(var_sql);
dbms_output.put_line(v_count);
        if v_count=0
then
dbms_output.put_line('dropping this partition');
var_sql1:='alter table rrs.sc_raw drop partition '||item.partition_name ;
dbms_output.put_line(var_sql1);
execute immediate var_sql1;
END IF;
   END LOOP;
END;
/
Technical Explanation is included below 
Usually  CELLSRV observes that the service time on the disk increases, its proactively marked as confinedOffline.  This is a first phase of detection of poor performance on the disk. At this stage, we see the alert being generated. After this, the disk is ready for further tests.      Now MS runs the performance tests on the CELLDISK. If it turns out that the disk is performing well, MS will notify CELLSRV that the disk is fine and CELLSRV will notify ASM to put the grid disks ONLINE.   If it comes out that the performance is bad in the tests, the disk status is changed to proactive failure.    Now the GRIDDISK  are dropped from ASM. Then the status of the disk is set to predictive failure.   If MS has indicated that the disk is performing well and hence is set back to normal and a notification for the alert clearance is updated in alert history.   This is all automated operation and no interference is required. 


As the alert is cleared and disk status changed to NORMAL , there is no further action required on this disk.  However in your case as indicated by Travis yesterday this frequently happening, so this need to go to Oracle Support immediately. 
Do you have the full rman log? To restore a cold backup you have to use the noredo.

restore database;
recover database noredo;



1. I need additional information to analyze and troubleshoot your issue.


You can copy and paste into sqlplus the complete script and it will create
results01.txt. I will review it and see what our options are.

Note: Do this as the SYSDBA.

REM # Please run the following SQL statements, as shown below, via SQL*Plus,
REM # and then upload the "results01.txt" for further analysis. You can run
REM # these statements while the database is at least mounted. Cut and paste
REM # the script into sqlplus.
REM ##########################################################################
REM

spool results01.txt
set echo on feedback on
set pagesize 50
alter session set nls_date_format = 'DD-MON-YYYY HH24:MI:SS';
select * from v$version;
select to_char(sysdate, 'DD-MON-YYYY HH24:MI:SS') as current_date from dual;
column name format a10
select dbid, name,
to_char(created, 'DD-MON-YYYY HH24:MI:SS') created,
open_mode, log_mode,
to_char(checkpoint_change#, '999999999999999') as checkpoint_change#,
controlfile_type,
to_char(controlfile_change#, '999999999999999') as controlfile_change#,
to_char(controlfile_time, 'DD-MON-YYYY HH24:MI:SS') controlfile_time
from v$database;

select * from v$instance;

SELECT to_char(Current_scn, '9999999999999999') FROM V$DATABASE;

select to_char(checkpoint_change#, '999999999999999') as ck_chg from v$database;

select min(fhscn) from x$kcvfh;


Select min(fhafs) "minvalue" ,max(fhafs) "max value" from x$kcvfh;

select max(to_number(fhafs)) "Minimum PITR SCN", min(fhrba_Seq), max(fhrba_Seq) from X$KCVFH;

archive log list;


select SEQUENCE#, to_char(FIRST_CHANGE#,'999999999999999') as CHK_CHANGE, group#, archived, status,thread#
from v$log order by first_change#;


column member format a45
select group#, status,type,member from v$logfile;

column name format a79
select name, file#, status, enabled, creation_change#,
to_char(creation_time, 'DD-MON-YYYY HH24:MI:SS') as creation_time, to_char(checkpoint_change#, '999999999999999') as checkpoint_change#, to_char(checkpoint_time, 'DD-MON-YYYY HH24:MI:SS') as checkpoint_time, to_char(offline_change#, '999999999999999') as offline_change#, to_char(online_change#, '999999999999999') as online_change#, to_char(online_time, 'DD-MON-YYYY HH24:MI:SS') as online_time, to_char(bytes, '9,999,999,999,990') as bytes
from v$datafile
where status <> 'ONLINE'
or checkpoint_change# <> (select checkpoint_change# from v$database);


select distinct to_char(checkpoint_change#, '9999999999999999') from v$datafile;

select distinct(status)from v$datafile;
select FILE#,TS# , status, NAME from v$datafile
where status not in ('SYSTEM','ONLINE')
order by 1;

select distinct (to_char(checkpoint_change#,'9999999999999999')) from v$datafile_header;

REM #
REM # fuzzy (fhsta=4 for example).
REM # Status to open the database is fhsta=0 (except system datafile(s) may be something like 8192)
REM # status 4 which means "online fuzzy" this datafile should be restored again.
REM # You can only cancel recovery once all datafiles are at status 0 using this query
REM# Absolutely FuZzy in Status 16 / 64 / 80 and the database will not open
REM# Status 16 - Media recovery fuzzy
REM# Status 64 - Absolute Fuzzy If you see status 64 redo the recover and apply
REM# the next log, keep going until there are no status 64.


REM# Status 80 - Media fuzzy & Absolute.
REM #Summarize the fhsta information from above in case query above returns hundreds of rows.

select distinct FHSTA from X$KCVFH;

REM#You may run the following query to determine the sequence# of the archivelogs needed for the initial recovery.

select distinct FHRBA_SEQ Sequence, FHTHR Thread from X$KCVFH
order by FHTHR, FHRBA_SEQ;


select min(fhrba_Seq), max(fhrba_Seq) from X$KCVFH;

REM #
REM#

select to_char(hxfil,'99999999999') as FILE#,fhsta STAT,fhscn SCN,
fhthr thrd, fhrba_Seq SEQ,fhtnm T_SPACE
from x$kcvfh order by 1;


select HXFIL File_num,substr(HXFNM,1,40) File_name,FHTYP Type,HXERR Validity, FHSCN SCN, FHTNM TABLESPACE_NAME,FHSTA status ,FHRBA_SEQ Sequence from X$KCVFH;


select df.file#, to_char(df.checkpoint_change#, '9999999999999999') controlfile_change,
to_char(dh.checkpoint_change# , '9999999999999999') datafile_change, df.status
from v$datafile df, v$datafile_header dh
where df.file# = dh.file#;

select a.recid, a.thread#, a.sequence#, a.name, a.archived, a.deleted,
to_date(a.completion_time, 'DD-MON-YYYY HH24:MI:SS') as completed
from v$archived_log a, v$log l
where a.thread# = l.thread#
and a.sequence# = l.sequence#;

REM # If the query return 1 row with STATUS='ONLINE', FUZZY='NO', and similar CHECKPOINT_TIMEthen you REM # can proceed to open the database.

set echo on feedback on time on timing on pagesize 100 linesize 80 numwidth 13
alter session set nls_date_format = 'DD-MON-YYYY HH24:MI:SS';
select error, fuzzy, status, checkpoint_change#,
to_char(checkpoint_time, 'DD-MON-YYYY HH24:MI:SS') as checkpoint_time,
count(*)
from v$datafile_header
group by error, fuzzy, status, checkpoint_change#,checkpoint_time
order by checkpoint_change#, checkpoint_time;


select * from v$recovery_log;

select file#, to_char(change#, '9999999999999999') from v$recover_file;

rem# Use the third value: RECOVER DATABASE UNTIL SCN xxxxxxxxxx;

select min(FHSCN) "LOW FILEHDR SCN", max(FHSCN) "MAX FILEHDR SCN", max(FHAFS) "Min PITR ABSSCN"
from X$KCVFH ;


SELECT f.name, f.file#, b.status, to_char(b.change#, '9999999999999999'), b.time
FROM v$backup b,
v$datafile f
WHERE b.file# = f.file#
AND b.status = 'ACTIVE';

SELECT status,
checkpoint_change#,
TO_CHAR(checkpoint_time, 'DD-MM-YYYY HH24:MI:SS') as checkpoint_time,
count(*)
FROM v$datafile_header
GROUP BY status, checkpoint_change#, checkpoint_time
ORDER BY status, checkpoint_change#, checkpoint_time;

SELECT fhthr thread,
fhrba_seq sequence,
fhscn scn,
fhsta status,
count(*)
FROM x$kcvfh
group by fhthr,fhrba_seq,fhscn,fhsta;


spool off
REM
REM ##########################################################################
REM # Note:
REM # 1. The "results01.txt" will help me analyze the latest status of
REM # the datafiles and controlfile, particularly their checkpoint change#.



---- Upload results01.txt

---- Upload the alert log

3. Do the following so we will know your RMAN configuration.

To get the right time of the backup you have to make sure NLS date is set.

If using csh

setenv NLS_DATE_FORMAT 'DD-MON-YYYY HH24:MI:SS'

If using ksh use:

export NLS_DATE_FORMAT='DD-MON-YYYY HH24:MI:SS'

For Windows:

sql 'alter session set NLS_DATE_FORMAT = "Mon dd yyyy hh24:mi:ss"';
set echo on;


crosscheck backup of database;
crosscheck backup of archivelog all;

spool log to rman_report.log;
set echo on;
show all;

list incarnation of database;

list backup summary;

list backup of database archivelog all;

list backup of controlfile;

list archivelog all;

report schema;

list copy of database;

restore preview database;

-- Upload rman_report.log  
Purpose 
-------

To provide Information on time sharing scheduling policy on HP and how Oracle 
makes use of the SCHED_NOAGE scheduling policy.
 
SCOPE & APPLICATION
-------------------

This note is intended for DBAs/ UNIX Admin to improve performance in OLTP.


HP SCHED_NOAGE Scheduling Policy
--------------------------------
On HP, most processes use a time sharing scheduling policy. Time sharing can 
have detrimental effects on Oracle performance by descheduling an Oracle process
during critical operations, for example, holding a latch. HP has a 
modified scheduling policy, referred to as SCHED_NOAGE, that specifically  
addresses this issue. Unlike the normal time sharing policy, a process scheduled
using SCHED_NOAGE does not increase or decrease in priority, nor is it preempted.

This feature is suited to online transaction processing (OLTP) environments 
because OLTP environments can cause competition for critical resources.  
In laboratory tests, the use of the SCHED_NOAGE policy with Oracle9i increased 
performance by up to 10 percent in OLTP environments. The SCHED_NOAGE policy 
creates little or no gains in decision support (DSS) Environments because 
there is little resource competition in these environments. Because each 
application and server environment is different, you should test and verify 
whether your environment benefits from the SCHED_NOAGE policy.


Enabling SCHED_NOAGE for Oracle
-------------------------------


To allow Oracle to use the SCHED_NOAGE scheduling policy, the group that the 
Oracle software owner belongs to (dba), must have the RTSCHED and RTPRIO
privileges to change the scheduling policy and set the priority level for Oracle 
processes. 


To give the dba group these privileges: 

1.As the root user, enter the following command: 

# setprivgrp dba RTSCHED RTPRIO 


2.To retain these privileges after rebooting, create the /etc/privgroup file, if 
it does not exist on your system, and add the following line to it: 

dba RTSCHED RTPRIO

___________________________________________________________________
Note:
Please remember to set the MLOCK privilege as well when using
raw devices.
___________________________________________________________________

3. The Oracle DBA must add the parameter HPUX_SCHED_NOAGE to the INIT.ORA file 
to enable the scheduling policy. HPUX_SCHED_NOAGE takes an integer parameter, 
and for HPUX 11.0 the valid ranges are 154 to 255; for HPUX 11i, the valid 
ranges are 178 to 255. Higher priorities are represented by lower values.

hpux_sched_noage=154 (for HP/UX 11.0)
hpux_sched_noage=178 (for HP/UX 11i)

Overall performance improvements of 5 to 10% especially for OLTP applications
can be expected.

References:
Note:217990.1 Init.ora Parameter  "HPUX_SCHED_NOAGE"  Reference
The HP documentation, rtsched(1) and rtsched(2) man page for more information 
on priority policies and priority ranges.

backup oracle home and orainventory

tar cvf – .|gzip -c > /mars/oracle/MRSP/arch/oracle_home_`hostname`_`date +%Y%m%d`.tar.gz




cd /opt/oracle/oraInventory

tar cvf – .|gzip -c > /mars/oracle/MRSP/arch/oracle_oraInventory_`hostname`_`date +%Y%m%d`.tar.gz

Thursday, August 20, 2015

How to Force OMS to use FQDN

Situation : My OMS and the database server which I was trying to add Databases and listener from were in different domain.

When OMS was trying to connect to database it filed with below error message.


Looking at the error message , it looks like OMS is not able connect because it use host name without the domain name.

We tried many ways to force OMS to FQDN , tried adding name in host file, added domain name in resolve.conf etc


Solution: But one method which resolved the issue is adding the fully qualified domain name in host section on listener.ora of host where database is running.  


Oracle Redaction commands

to check the redacted columns
--------------------------------------------------------------
select * from redaction_columns;

to check the redaction policies
--------------------------------------------------------------
select * from redaction_policies;



enable a policy
----------------------------------------------------------------
BEGIN
  SYS.dbms_redact.enable_policy (
   object_schema => 'SCHEMA_NAME',
    object_name   => 'T__MEMBER',
    policy_name   => 'T__MEMBER_REDACT'  
    );
END;
/

disable a policy
----------------------------------------------------------------
BEGIN
  SYS.dbms_redact.disable_policy (
   object_schema => 'SCHEMA_NAME',
    object_name   => 'T__MEMBER',
    policy_name   => 'T__MEMBER_REDACT'  
    );
END;
/


add a column to policy
-----------------------------------------------------------------
BEGIN
  DBMS_REDACT.alter_policy (
   object_schema => 'SCHEMA_NAME',
    object_name   => 'T__ADDRESS',
    policy_name   => 'T__ADDRESS_REDACT',
    action        => DBMS_REDACT.add_column,
    column_name   => 'ZIP',
 function_type => DBMS_REDACT.full
    );
END;
/


drop a column from policy
------------------------------------------------------------------
BEGIN
  DBMS_REDACT.alter_policy (
   object_schema => 'SCHEMA_NAME',
    object_name   => 'T__MEMBER',
    policy_name   => 'T__MEMBER_REDACT',
    action        => DBMS_REDACT.drop_column,
    column_name   => 'LAST_NAME',
 function_type => DBMS_REDACT.full
    );
END;
/

Thursday, August 13, 2015

ORA-28046: Password change for SYS disallowed

dbhost:dbname> alter user sys identified by "new_password";
alter user sys identified by "new_password"
*
ERROR at line 1:
ORA-28046: Password change for SYS disallowed



dbhost:dbname> show parameter remote

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
remote_dependencies_mode             string      TIMESTAMP
remote_listener                      string
remote_login_passwordfile            string      SHARED
remote_os_authent                    boolean     FALSE
remote_os_roles                      boolean     FALSE
result_cache_remote_expiration       integer     0

dbhost:dbname> alter system set REMOTE_LOGIN_PASSWORDFILE=exclusive scope=spfile;

System altered.

dbhost:dbname> shut immediate;

Database closed.
Database dismounted.
ORACLE instance shut down.
dbhost:dbname> startup;
ORACLE instance started.

Total System Global Area 2155233280 bytes
Fixed Size                  2161064 bytes
Variable Size            1442842200 bytes
Database Buffers          704643072 bytes
Redo Buffers                5586944 bytes
Database mounted.
Database opened.
dbhost:dbname> show parameter remote

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
remote_dependencies_mode             string      TIMESTAMP
remote_listener                      string
remote_login_passwordfile            string      EXCLUSIVE
remote_os_authent                    boolean     FALSE
remote_os_roles                      boolean     FALSE
result_cache_remote_expiration       integer     0
dbhost:dbname> alter user sys identified by "new_password";

User altered.

Tuesday, August 11, 2015

How the database Target is named in EM12c

  • If using the parameter 'db_unique_name' (for example, if using Data Guard) -  the discovered database name will be: db_unique_name.db_domain (if the parameter db_domain is not set, then just db_unique_name will be used)
  • If not using db_unique_name, the format db_name.db_domain will be used (if the parameter db_domain is not set, then just the db_name will be used)
  • If the names found in the discovery are not unique, a version number will be appended to the name. eg. orcl_1
  • If the spfile<sid>.ora cannot be read (or cannot be converted to a pfile) the database target will assume the name <sid>_<hostname>
  • If the database is discovered with the incorrect case, see Note:2024374.1"Enterprise Manager 12c discovers the Database Target name in Upper Case"

Wednesday, May 20, 2015

RMAN-11003: failure during parse/execution of SQL statement: alter database recover logfile

We got the below error , while doing the backup based rman duplicate.
RMAN command for duplicate is below

RUN{ 
set archivelog destination to '/${target}_archredo1/redologs/'; 
allocate auxiliary channel ch01 device type 'SBT_TAPE' parms 'ENV=(NB_ORA_CLIENT=$client,BLKSIZE=1048576)' maxopenfiles 1; 
allocate channel aux1 device type disk; 
duplicate target database to ${target} 
UNTIL TIME "to_date('$until_time','mm-dd-yyyy hh24:mi:ss')" 
pfile = /oracle/app/oracle/admin/$target/pfile/init$target.ora; 


Recovery part of the duplication started failing with below error

RMAN-03015: error occurred in stored script Memory Script
RMAN-11003: failure during parse/execution of SQL statement: alter database recover logfile '/db_name_archredo1/redologs/arch0000079905_1_822846244.log'
ORA-10877: error signaled in parallel recovery slave

When I checked the alert log of the destination database , we found below error 

alter database recover logfile '/db_name_archredo1/redologs/arch0000079905_1_822846244.log'
88982 Media Recovery Log /db_name_archredo1/redologs/arch0000079905_1_822846244.log
88983 Wed May 20 12:35:04 2015
88984 Recovery interrupted!
88985 Recovered data files to a consistent state at change 11114962070940
88986 Media Recovery failed with error 19755
88987 Errors in file /oracle/app/oracle/admin/db_name/diag/rdbms/db_name/db_name/trace/db_name_pr00_146224.trc:
88988 ORA-00283: recovery session canceled due to errors
88989 ORA-19755: could not open change tracking file
88990 ORA-19750: change tracking file: '/dwprod1_db01/oracle/rman_change_trck_dwprod1.dbf'
88991 ORA-27037: unable to obtain file status
88992 Linux-x86_64 Error: 2: No such file or directory
88993 Additional information: 3
88994 Slave exiting with ORA-283 exception
88995 Errors in file /oracle/app/oracle/admin/db_name/diag/rdbms/db_name/db_name/trace/db_name_pr00_146224.trc:
88996 ORA-00283: recovery session canceled due to errors
88997 ORA-19755: could not open change tracking file
88998 ORA-19750: change tracking file: '/dwprod1_db01/oracle/rman_change_trck_dwprod1.dbf'
88999 ORA-27037: unable to obtain file status
89000 Linux-x86_64 Error: 2: No such file or directory
89001 Additional information: 3
89002 ORA-10877 signalled during: alter database recover logfile '/db_name_archredo1/redologs/arch0000079905_1_822846244.log'...


This suggested that RMAN was looking for the Block change tracking file to perform the recovery.
Since the path of the BCT file was not present in destination, I decided to disable the change tracking at source itself.

SQL>alter database disable block change tracking;

Database altered.


That did not help , even after disabling the change tracking at source did not help  .Since my recover time was some time in the past , the controlfile which was restore by RMAN was always looking for change tracking file.

Then I tried to create a dummy file with same name as source change tracking file , but the recovery was failing again, with below message

Media Recovery failed with error 19755
91851 Errors in file /oracle/app/oracle/admin/db_name/diag/rdbms/db_name/db_name/trace/db_name_pr00_38701.trc:
91852 ORA-00283: recovery session canceled due to errors
91853 ORA-19755: could not open change tracking file
91854 ORA-19750: change tracking file: '/dwprod1_db01/oracle/rman_change_trck_dwprod1.dbf'
91855 ORA-27047: unable to read the header block of file
91856 Linux-x86_64 Error: 25: Inappropriate ioctl for device
91857 Additional information: 1
91858 Slave exiting with ORA-283 exception
91859 Errors in file /oracle/app/oracle/admin/db_name/diag/rdbms/db_name/db_name/trace/db_name_pr00_38701.trc:
91860 ORA-00283: recovery session canceled due to errors
91861 ORA-19755: could not open change tracking file
91862 ORA-19750: change tracking file: '/dwprod1_db01/oracle/rman_change_trck_dwprod1.dbf'
91863 ORA-27047: unable to read the header block of file
91864 Linux-x86_64 Error: 25: Inappropriate ioctl for device
91865 Additional information: 1
91866 ORA-10877 signalled during: alter database recover logfile '/db_name_archredo1/redologs/arch0000079905_1_822846244.log'...


Then I realized that, RMAN needs BCT file for restore , I re enabled change tracking at source

SQL> alter database enable block change tracking using FILE '/source_db01/oracle/rman_change_trck_dwprod1.dbf';

Database altered.

Manually copied the file to destination to same path and started the recovery again .
This did the trick and recovery started.

Note 1 : Please include location of BCT file in DB_FILE_NAME_CONVERT parameter as well

Note 2: I don't know if BCT file is backed by RMAN. IF you know , please drop your answer in comments

Note 3: Another great piece of info aboutt BCT I found on some blog
The size of the change tracking file is proportional to the size of the database and the number of enabled threads of redo. The size is not related to the frequency of updates to the database.

Typically, the space required for block change tracking is approximately 1/30,000 the size of the data blocks to be tracked. The following two factors that may cause the file to be larger than this estimate suggests:
  • To avoid overhead of allocating space as database grows, the change tracking file size starts at 10MB, and new space is allocated in 10MB increments. Thus, for any database up to approximately 300GB the file size is no smaller than 10MB, for up to approximately 600GB the file size is no smaller than 20MB, and so on.
  • For each datafile, a minimum of 320K of space is allocated in the change tracking file, regardless of the size of the file. Thus, if you have a large number of relatively small datafiles, the change tracking file is larger than for databases with a smaller number of larger datafiles containing the same data.
SELECT FILE#, INCREMENTAL_LEVEL, COMPLETION_TIME, BLOCKS, DATAFILE_BLOCKS FROM V$BACKUP_DATAFILE WHERE INCREMENTAL_LEVEL > 0 AND BLOCKS / DATAFILE_BLOCKS > .5 ORDER BY COMPLETION_TIME;

Friday, May 15, 2015

GFS cluster commands to manage database resources

We have few databases which Redhat's GFS to provide high availability.
Here are few commands to manage the cluster

1. To check the status of cluster

[root@host_dbs02 ~]# clustat
Cluster Status for host_dbsc1 @ Fri May 15 17:04:07 2015
Member Status: Quorate

 Member Name               ID   Status
 ------ ----               ---- ------
 host_dbs01_hb        1 Online, rgmanager
 host_dbs02_hb        2 Online, Local, rgmanager
 /dev/block/253:4     0 Online, Quorum Disk

 Service Name                     Owner (Last)       State
 ------- ----                     ----- ------       -----
 service:db1                  host_dbs02_hb      started    [Z]
 service:db2                   host_dbs01_hb      started    [Z]
 service:db3                 host_dbs02_hb      started    [Z]
 service:db4                 host_dbs01_hb      started    [Z]
 service:db5                  host_dbs02_hb      started    [Z]
 service:db6                  host_dbs01_hb      started    [Z]
 service:my_test_oracle           host_dbs02_hb      started    [Z]
 service:odb1                 host_dbs02_hb      started    [Z]
 service:odb2                 host_dbs02_hb      started    [Z]
 service:oracle_database          host_dbs02_hb      started    [Z]
 service:odb3                   host_dbs02_hb      started    [Z]

2.To freeze a service 
Freezing a service is means , no automatic fail over will happen for the resource.
This is best option when DBAs want to perform some maintenance on the database.
They can take any operation on the database as a stand alone database without considering about the cluster.

[root@host_dbs02 ~]# clusvcadm -Z dbname

Local machine freezing service:dssdevl1...Success

3.To unfreeze(thaw) a service

[root@host_dbs02 ~]# clusvcadm -U dbname

Local machine unfreezing service:odsdevl1...Success

4.To restart the service on the same node

[root@host_dbs02 ~]# clusvcadm -R dbname
Local machine trying to restart service:odssten1...

Success

5.To move the service to other node, 
-r is the obtion for the database/resource name and -m is for the destination node for the service

[root@host_dbs02 ~]# clusvcadm -r odssten1 -m host_dbs02_hb


6.Continusly check the status
This check the status of the cluster every 10 sec 

clustat -i 10

Other options


[root@host_dbs02 ~]# clusvcadm
usage: clusvcadm [command]

Resource Group Control Commands:
  -v                     Display version and exit
  -d <group>             Disable <group>.  This stops a group
                         until an administrator enables it again,
                         the cluster loses and regains quorum, or
                         an administrator-defined event script
                         explicitly enables it again.
  -e <group>             Enable <group>
  -e <group> -F          Enable <group> according to failover
                         domain rules (deprecated; always the
                         case when using central processing)
  -e <group> -m <member> Enable <group> on <member>
  -r <group> -m <member> Relocate <group> [to <member>]
                         Stops a group and starts it on another
                         cluster member.
  -M <group> -m <member> Migrate <group> to <member>
                         (e.g. for live migration of VMs)
  -q                     Quiet operation
  -R <group>             Restart a group in place.
  -s <group>             Stop <group>.  This temporarily stops
                         a group.  After the next group or
                         or cluster member transition, the group
                         will be restarted (if possible).
  -Z <group>             Freeze resource group.  This prevents
                         transitions and status checks, and is
                         useful if an administrator needs to
                         administer part of a service without
                         stopping the whole service.
  -U <group>             Unfreeze (thaw) resource group.  Restores
                         a group to normal operation.
  -c <group>             Convalesce (repair, fix) resource group.
                         Attempts to start failed, non-critical
                         resources within a resource group.
Resource Group Locking (for cluster Shutdown / Debugging):
  -l                     Lock local resource group managers.
                         This prevents resource groups from
                         starting.
  -S                     Show lock state
  -u                     Unlock resource group managers.

                         This allows resource groups to start.

Wednesday, May 13, 2015

QMON Session waiting for 'latch free'


We are observing close to 30 sessions waiting on 'latch free' most of them logged on to the system around same time. All these sessions belong to streams. We dont use streams based replica in our system.All these sessions where qmon slaves 

All the sessions where on node 1 of a 3 node RAC


support suggested to run Note 273674.1 : Streams Configuration Report and Health Check Script  to check the streams setup. 
Streams was never setup or used

then the suspect was data pump , we checked our data pump logs
There is a data pump metadata export JOB which failed , it started at May 9 23:00:01 2015 


Export: Release 11.2.0.4.0 - Production on Sat May 9 23:00:01 2015 

Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved. 
;;; 
Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production 
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP, 
Data Mining and Real Application Testing options 
Starting "SYSTEM"."SYS_EXPORT_FULL_02": system/******** DIRECTORY=datapump_metadata DUMPFILE=0509_XXXXXXX_metadata.dmp FULL=Y LOGFILE=0509_XXXXXXXXX_metadata.log content=metadata_only 
Processing object type DATABASE_EXPORT/TABLESPACE 
... 
... 
.... 
Processing object type DATABASE_EXPORT/SCHEMA/TABLE/TABLE 
ORA-39097: Data Pump job encountered unexpected error -12805 
ORA-39065: unexpected master process exception in MAIN 
ORA-12805: parallel query server died unexpectedly 
Job "SYSTEM"."SYS_EXPORT_FULL_02" stopped due to fatal error at Sun May 10 02:30:27 2015 elapsed 0 03:30:20 


We checked the datapump job status from the dba_datapump_jobs and below job orphaned

OWNER_NAME JOB_NAME OPERATION JOB_MODE STATE ATTACHED_SESSIONS 
---------- -------------------------------------------------- ---------- ---------- ------------ ----------------- 
SYSTEM SYS_EXPORT_FULL_01 EXPORT FULL NOT RUNNING 0 
SYS SYS_EXPORT_TABLE_01 EXPORT TABLE NOT RUNNING 0 
SYSTEM SYS_EXPORT_FULL_02 EXPORT FULL NOT RUNNING 0 


We went ahead purged the job table. 
drop table SYSTEM.SYS_EXPORT_FULL_02 ; 

still there is queue , which stated with the datapump job 

select * from gv$buffered_queues;

INST_ID QUEUE_ID QUEUE_SCHEMA QUEUE_NAME STARTUP_TIME NUM_MSGS SPILL_MSGS 
---------- ---------- ------------------------------ ------------------------------ ----------------- ---------- ---------- 
1 14217557 SYS KUPC$S_1_20150509230004 09/05/15 23:00:05 0 0 


to clean the queue , we manually stopped and dropped the queue

SQL> exec dbms_aqadm.stop

SQL> exec dbms_aqadm.stop_queue('KUPC$S_1_20150509230004');

PL/SQL procedure successfully completed.

SQL>  exec dbms_aqadm.drop_queue('KUPC$S_1_20150509230004',TRUE); 

PL/SQL procedure successfully completed.

SQL> select INST_ID,QUEUE_ID,QUEUE_SCHEMA,QUEUE_NAME,STARTUP_TIME,NUM_MSGS,SPILL_MSGS from gv$buffered_queues;

no rows selected


All the sessions waiting on latch free got cleared , and the instance is back to normal.




Sunday, May 10, 2015

ORA-27086: unable to lock file - already in use

We have a database using NAS as its file storage , due to power outage , we had the server rebooted and database crashed.

When the server came back up and we tried to start the database , we got below error 
ORA-27086: unable to lock file - already in use

It seems the files are already locked at the isilon end and since the shut down was not very clean , those locks are still held on the isilon side.

Below are some commands which would help to resolve the issue.

To identify the server and the node on which the files are locked.....

isi_classic nfs clients list |grep  <database hostname>

( basically the database hostname should be the hostname that is registered with the NAS…..with regards to the NAS frame the database host would be a client to the NAS frame  )

isi_classic nfs clients list |grep  db_server

17052@db_server/10.38.15.207 exclusive [all] /ifs/netview/prd/NNM1/oradata/NNM1/small_data2.dbf
17052@db_server/10.38.15.207 exclusive [all] /ifs/netview/prd/NNM1/oradata/NNM1/small_index2.dbf
17052@db_server/10.38.15.207 exclusive [all] /ifs/netview/prd/NNM1/oradata/NNM1/small_data4.dbf

To remove the locks…..
 
isi_classic nfs clients remove <database hostname>/<NAS node holding the lock>



isi_classic nfs clients remove db_server/10.38.15.207


Database came up fine after this.

Friday, May 1, 2015

ORDIM DICOM repository has 0 documents.

While doing an database upgrade from 9.2.0.8 to 11.2.0.4 , oracle intermedia component has become invalid.

while troubleshooting the issue , came across a note id :Oracle Multimedia Component Became INVALID After Upgrade (Doc ID 1497562.1) which addresses this issue.
It suggested to execute below .



SQL> set serveroutput on;
SQL> exec VALIDATE_ORDIM();
ORDIM DICOM repository has 0 documents.
The following default DICOM repository documents are not installed:
ordcman.xml
ordcmcmc.xml
ordcmcmd.xml
ordcmct.xml
ordcmmp.xml
ordcmpf.xml
ordcmpv.xml
ordcmsd.xml
ordcmui.xml

PL/SQL procedure successfully completed.



Result of the validate_ordim procedure gave different output as shown in the note id.
So I searched more on this issue and found that this because DICOM xml schemas have not been loaded.

How To Reload Oracle Multimedia Related Information when XML Database (=XDB) Has Been Reinstalled (Doc ID 965892.1)

declare
*
ERROR at line 1:
ORA-31050: Access denied
ORA-06512: at "ORDSYS.ORD_DICOM_ADMIN_PRV", line 6306
ORA-06512: at "ORDSYS.ORD_DICOM_ADMIN", line 125
ORA-06512: at line 67



ORA-31050 "Access denied" during Upgrade of Oracle Multimedia (ORDIM) to 11.2.0.2.0 (Doc ID 1274613.1)


Oracle note suggested the , though XDB compenent is valid, there is an issue during upgrade , look for below in the upgrade log

  11 END;
  12 /
DECLARE
*
ERROR at line 1:
ORA-31061: XDB error : XML event error
ORA-19202: Error occurred in XML processing (
LSX-00023: unknown namespace URI "0/�Â"
)
ORA-06512: at "XDB.DBMS_XDB", line 201
ORA-06512: at line 9


Although dba_registry shows that XDB is VALID, this component was not completely upgraded due to the incorrect setting of the environment variable, LD_LIBRARY_PATH (LIBPATH on AIX, SHLIB_PATH on HP-UX PA-RISC). Solution given was to do the upgrade again.




  1. Confirm XML DB is not already installed.
    • SELECT username FROM DBA_USERS where username = 'XDB'; returns 0 rows.
    • DESCRIBE RESOURCE_VIEW fails.
  2. As SYS, perform the XML DB installation - Note: this will not be a complete build due to Oracle Bug 9818995:
    • @?/rdbms/admin/catqm.sql xdb <xml db tablespace> <xdb temp tablespace> <use secure files yes|no>
  3. As SYS, grant the relevent permissions to the XDB user:
    • GRANT EXECUTE ON utl_file TO xdb;
    • GRANT EXECUTE ON dbms_lob TO xdb;
  4. As SYS, ensure their are no invalid objects:
    • @?/rdbms/admin/utlrp.sql
  5. Assuming an spfile is in use, as SYS configure the XDB dispatcher:
    • ALTER SYSTEM SET DISPATCHERS="(PROTOCOL=TCP)(SERVICE=<SID>DB)" SCOPE=both;
    • If init.ora is in use, edit the init.ora file instead.
  6. As SYS, install the missing contents caused by Oracle Bug 9818995:
    • SHUTDOWN IMMEDIATE;
    • STARTUP UPGRADE
    • @?/rdbms/admin/xsdbmig.sql
    • SHUTDOWN IMMEDIATE;
    • STARTUP
  7. As SYS, ensure their are no invalid objects:
    • @?/rdbms/admin/utlrp.sql


After reinstall of XDB , finally ran the below again 


set echo on
spool ord.log
alter session set current_schema="ORDSYS";
@?/ord/im/admin/imxreg.sql;
@?/ord/im/admin/impbs.sql;
@?/ord/im/admin/impvs.sql;
@?/ord/im/admin/imtyb.sql;
@?/ord/im/admin/implb.sql;
@?/ord/im/admin/imxrepos.sql;
spool off



Finally that resolved the issue and Oracle multimedia became VALID