Wednesday, May 20, 2015

RMAN-11003: failure during parse/execution of SQL statement: alter database recover logfile

We got the below error , while doing the backup based rman duplicate.
RMAN command for duplicate is below

RUN{ 
set archivelog destination to '/${target}_archredo1/redologs/'; 
allocate auxiliary channel ch01 device type 'SBT_TAPE' parms 'ENV=(NB_ORA_CLIENT=$client,BLKSIZE=1048576)' maxopenfiles 1; 
allocate channel aux1 device type disk; 
duplicate target database to ${target} 
UNTIL TIME "to_date('$until_time','mm-dd-yyyy hh24:mi:ss')" 
pfile = /oracle/app/oracle/admin/$target/pfile/init$target.ora; 


Recovery part of the duplication started failing with below error

RMAN-03015: error occurred in stored script Memory Script
RMAN-11003: failure during parse/execution of SQL statement: alter database recover logfile '/db_name_archredo1/redologs/arch0000079905_1_822846244.log'
ORA-10877: error signaled in parallel recovery slave

When I checked the alert log of the destination database , we found below error 

alter database recover logfile '/db_name_archredo1/redologs/arch0000079905_1_822846244.log'
88982 Media Recovery Log /db_name_archredo1/redologs/arch0000079905_1_822846244.log
88983 Wed May 20 12:35:04 2015
88984 Recovery interrupted!
88985 Recovered data files to a consistent state at change 11114962070940
88986 Media Recovery failed with error 19755
88987 Errors in file /oracle/app/oracle/admin/db_name/diag/rdbms/db_name/db_name/trace/db_name_pr00_146224.trc:
88988 ORA-00283: recovery session canceled due to errors
88989 ORA-19755: could not open change tracking file
88990 ORA-19750: change tracking file: '/dwprod1_db01/oracle/rman_change_trck_dwprod1.dbf'
88991 ORA-27037: unable to obtain file status
88992 Linux-x86_64 Error: 2: No such file or directory
88993 Additional information: 3
88994 Slave exiting with ORA-283 exception
88995 Errors in file /oracle/app/oracle/admin/db_name/diag/rdbms/db_name/db_name/trace/db_name_pr00_146224.trc:
88996 ORA-00283: recovery session canceled due to errors
88997 ORA-19755: could not open change tracking file
88998 ORA-19750: change tracking file: '/dwprod1_db01/oracle/rman_change_trck_dwprod1.dbf'
88999 ORA-27037: unable to obtain file status
89000 Linux-x86_64 Error: 2: No such file or directory
89001 Additional information: 3
89002 ORA-10877 signalled during: alter database recover logfile '/db_name_archredo1/redologs/arch0000079905_1_822846244.log'...


This suggested that RMAN was looking for the Block change tracking file to perform the recovery.
Since the path of the BCT file was not present in destination, I decided to disable the change tracking at source itself.

SQL>alter database disable block change tracking;

Database altered.


That did not help , even after disabling the change tracking at source did not help  .Since my recover time was some time in the past , the controlfile which was restore by RMAN was always looking for change tracking file.

Then I tried to create a dummy file with same name as source change tracking file , but the recovery was failing again, with below message

Media Recovery failed with error 19755
91851 Errors in file /oracle/app/oracle/admin/db_name/diag/rdbms/db_name/db_name/trace/db_name_pr00_38701.trc:
91852 ORA-00283: recovery session canceled due to errors
91853 ORA-19755: could not open change tracking file
91854 ORA-19750: change tracking file: '/dwprod1_db01/oracle/rman_change_trck_dwprod1.dbf'
91855 ORA-27047: unable to read the header block of file
91856 Linux-x86_64 Error: 25: Inappropriate ioctl for device
91857 Additional information: 1
91858 Slave exiting with ORA-283 exception
91859 Errors in file /oracle/app/oracle/admin/db_name/diag/rdbms/db_name/db_name/trace/db_name_pr00_38701.trc:
91860 ORA-00283: recovery session canceled due to errors
91861 ORA-19755: could not open change tracking file
91862 ORA-19750: change tracking file: '/dwprod1_db01/oracle/rman_change_trck_dwprod1.dbf'
91863 ORA-27047: unable to read the header block of file
91864 Linux-x86_64 Error: 25: Inappropriate ioctl for device
91865 Additional information: 1
91866 ORA-10877 signalled during: alter database recover logfile '/db_name_archredo1/redologs/arch0000079905_1_822846244.log'...


Then I realized that, RMAN needs BCT file for restore , I re enabled change tracking at source

SQL> alter database enable block change tracking using FILE '/source_db01/oracle/rman_change_trck_dwprod1.dbf';

Database altered.

Manually copied the file to destination to same path and started the recovery again .
This did the trick and recovery started.

Note 1 : Please include location of BCT file in DB_FILE_NAME_CONVERT parameter as well

Note 2: I don't know if BCT file is backed by RMAN. IF you know , please drop your answer in comments

Note 3: Another great piece of info aboutt BCT I found on some blog
The size of the change tracking file is proportional to the size of the database and the number of enabled threads of redo. The size is not related to the frequency of updates to the database.

Typically, the space required for block change tracking is approximately 1/30,000 the size of the data blocks to be tracked. The following two factors that may cause the file to be larger than this estimate suggests:
  • To avoid overhead of allocating space as database grows, the change tracking file size starts at 10MB, and new space is allocated in 10MB increments. Thus, for any database up to approximately 300GB the file size is no smaller than 10MB, for up to approximately 600GB the file size is no smaller than 20MB, and so on.
  • For each datafile, a minimum of 320K of space is allocated in the change tracking file, regardless of the size of the file. Thus, if you have a large number of relatively small datafiles, the change tracking file is larger than for databases with a smaller number of larger datafiles containing the same data.
SELECT FILE#, INCREMENTAL_LEVEL, COMPLETION_TIME, BLOCKS, DATAFILE_BLOCKS FROM V$BACKUP_DATAFILE WHERE INCREMENTAL_LEVEL > 0 AND BLOCKS / DATAFILE_BLOCKS > .5 ORDER BY COMPLETION_TIME;

Friday, May 15, 2015

GFS cluster commands to manage database resources

We have few databases which Redhat's GFS to provide high availability.
Here are few commands to manage the cluster

1. To check the status of cluster

[root@host_dbs02 ~]# clustat
Cluster Status for host_dbsc1 @ Fri May 15 17:04:07 2015
Member Status: Quorate

 Member Name               ID   Status
 ------ ----               ---- ------
 host_dbs01_hb        1 Online, rgmanager
 host_dbs02_hb        2 Online, Local, rgmanager
 /dev/block/253:4     0 Online, Quorum Disk

 Service Name                     Owner (Last)       State
 ------- ----                     ----- ------       -----
 service:db1                  host_dbs02_hb      started    [Z]
 service:db2                   host_dbs01_hb      started    [Z]
 service:db3                 host_dbs02_hb      started    [Z]
 service:db4                 host_dbs01_hb      started    [Z]
 service:db5                  host_dbs02_hb      started    [Z]
 service:db6                  host_dbs01_hb      started    [Z]
 service:my_test_oracle           host_dbs02_hb      started    [Z]
 service:odb1                 host_dbs02_hb      started    [Z]
 service:odb2                 host_dbs02_hb      started    [Z]
 service:oracle_database          host_dbs02_hb      started    [Z]
 service:odb3                   host_dbs02_hb      started    [Z]

2.To freeze a service 
Freezing a service is means , no automatic fail over will happen for the resource.
This is best option when DBAs want to perform some maintenance on the database.
They can take any operation on the database as a stand alone database without considering about the cluster.

[root@host_dbs02 ~]# clusvcadm -Z dbname

Local machine freezing service:dssdevl1...Success

3.To unfreeze(thaw) a service

[root@host_dbs02 ~]# clusvcadm -U dbname

Local machine unfreezing service:odsdevl1...Success

4.To restart the service on the same node

[root@host_dbs02 ~]# clusvcadm -R dbname
Local machine trying to restart service:odssten1...

Success

5.To move the service to other node, 
-r is the obtion for the database/resource name and -m is for the destination node for the service

[root@host_dbs02 ~]# clusvcadm -r odssten1 -m host_dbs02_hb


6.Continusly check the status
This check the status of the cluster every 10 sec 

clustat -i 10

Other options


[root@host_dbs02 ~]# clusvcadm
usage: clusvcadm [command]

Resource Group Control Commands:
  -v                     Display version and exit
  -d <group>             Disable <group>.  This stops a group
                         until an administrator enables it again,
                         the cluster loses and regains quorum, or
                         an administrator-defined event script
                         explicitly enables it again.
  -e <group>             Enable <group>
  -e <group> -F          Enable <group> according to failover
                         domain rules (deprecated; always the
                         case when using central processing)
  -e <group> -m <member> Enable <group> on <member>
  -r <group> -m <member> Relocate <group> [to <member>]
                         Stops a group and starts it on another
                         cluster member.
  -M <group> -m <member> Migrate <group> to <member>
                         (e.g. for live migration of VMs)
  -q                     Quiet operation
  -R <group>             Restart a group in place.
  -s <group>             Stop <group>.  This temporarily stops
                         a group.  After the next group or
                         or cluster member transition, the group
                         will be restarted (if possible).
  -Z <group>             Freeze resource group.  This prevents
                         transitions and status checks, and is
                         useful if an administrator needs to
                         administer part of a service without
                         stopping the whole service.
  -U <group>             Unfreeze (thaw) resource group.  Restores
                         a group to normal operation.
  -c <group>             Convalesce (repair, fix) resource group.
                         Attempts to start failed, non-critical
                         resources within a resource group.
Resource Group Locking (for cluster Shutdown / Debugging):
  -l                     Lock local resource group managers.
                         This prevents resource groups from
                         starting.
  -S                     Show lock state
  -u                     Unlock resource group managers.

                         This allows resource groups to start.

Wednesday, May 13, 2015

QMON Session waiting for 'latch free'


We are observing close to 30 sessions waiting on 'latch free' most of them logged on to the system around same time. All these sessions belong to streams. We dont use streams based replica in our system.All these sessions where qmon slaves 

All the sessions where on node 1 of a 3 node RAC


support suggested to run Note 273674.1 : Streams Configuration Report and Health Check Script  to check the streams setup. 
Streams was never setup or used

then the suspect was data pump , we checked our data pump logs
There is a data pump metadata export JOB which failed , it started at May 9 23:00:01 2015 


Export: Release 11.2.0.4.0 - Production on Sat May 9 23:00:01 2015 

Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved. 
;;; 
Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production 
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP, 
Data Mining and Real Application Testing options 
Starting "SYSTEM"."SYS_EXPORT_FULL_02": system/******** DIRECTORY=datapump_metadata DUMPFILE=0509_XXXXXXX_metadata.dmp FULL=Y LOGFILE=0509_XXXXXXXXX_metadata.log content=metadata_only 
Processing object type DATABASE_EXPORT/TABLESPACE 
... 
... 
.... 
Processing object type DATABASE_EXPORT/SCHEMA/TABLE/TABLE 
ORA-39097: Data Pump job encountered unexpected error -12805 
ORA-39065: unexpected master process exception in MAIN 
ORA-12805: parallel query server died unexpectedly 
Job "SYSTEM"."SYS_EXPORT_FULL_02" stopped due to fatal error at Sun May 10 02:30:27 2015 elapsed 0 03:30:20 


We checked the datapump job status from the dba_datapump_jobs and below job orphaned

OWNER_NAME JOB_NAME OPERATION JOB_MODE STATE ATTACHED_SESSIONS 
---------- -------------------------------------------------- ---------- ---------- ------------ ----------------- 
SYSTEM SYS_EXPORT_FULL_01 EXPORT FULL NOT RUNNING 0 
SYS SYS_EXPORT_TABLE_01 EXPORT TABLE NOT RUNNING 0 
SYSTEM SYS_EXPORT_FULL_02 EXPORT FULL NOT RUNNING 0 


We went ahead purged the job table. 
drop table SYSTEM.SYS_EXPORT_FULL_02 ; 

still there is queue , which stated with the datapump job 

select * from gv$buffered_queues;

INST_ID QUEUE_ID QUEUE_SCHEMA QUEUE_NAME STARTUP_TIME NUM_MSGS SPILL_MSGS 
---------- ---------- ------------------------------ ------------------------------ ----------------- ---------- ---------- 
1 14217557 SYS KUPC$S_1_20150509230004 09/05/15 23:00:05 0 0 


to clean the queue , we manually stopped and dropped the queue

SQL> exec dbms_aqadm.stop

SQL> exec dbms_aqadm.stop_queue('KUPC$S_1_20150509230004');

PL/SQL procedure successfully completed.

SQL>  exec dbms_aqadm.drop_queue('KUPC$S_1_20150509230004',TRUE); 

PL/SQL procedure successfully completed.

SQL> select INST_ID,QUEUE_ID,QUEUE_SCHEMA,QUEUE_NAME,STARTUP_TIME,NUM_MSGS,SPILL_MSGS from gv$buffered_queues;

no rows selected


All the sessions waiting on latch free got cleared , and the instance is back to normal.




Sunday, May 10, 2015

ORA-27086: unable to lock file - already in use

We have a database using NAS as its file storage , due to power outage , we had the server rebooted and database crashed.

When the server came back up and we tried to start the database , we got below error 
ORA-27086: unable to lock file - already in use

It seems the files are already locked at the isilon end and since the shut down was not very clean , those locks are still held on the isilon side.

Below are some commands which would help to resolve the issue.

To identify the server and the node on which the files are locked.....

isi_classic nfs clients list |grep  <database hostname>

( basically the database hostname should be the hostname that is registered with the NAS…..with regards to the NAS frame the database host would be a client to the NAS frame  )

isi_classic nfs clients list |grep  db_server

17052@db_server/10.38.15.207 exclusive [all] /ifs/netview/prd/NNM1/oradata/NNM1/small_data2.dbf
17052@db_server/10.38.15.207 exclusive [all] /ifs/netview/prd/NNM1/oradata/NNM1/small_index2.dbf
17052@db_server/10.38.15.207 exclusive [all] /ifs/netview/prd/NNM1/oradata/NNM1/small_data4.dbf

To remove the locks…..
 
isi_classic nfs clients remove <database hostname>/<NAS node holding the lock>



isi_classic nfs clients remove db_server/10.38.15.207


Database came up fine after this.

Friday, May 1, 2015

ORDIM DICOM repository has 0 documents.

While doing an database upgrade from 9.2.0.8 to 11.2.0.4 , oracle intermedia component has become invalid.

while troubleshooting the issue , came across a note id :Oracle Multimedia Component Became INVALID After Upgrade (Doc ID 1497562.1) which addresses this issue.
It suggested to execute below .



SQL> set serveroutput on;
SQL> exec VALIDATE_ORDIM();
ORDIM DICOM repository has 0 documents.
The following default DICOM repository documents are not installed:
ordcman.xml
ordcmcmc.xml
ordcmcmd.xml
ordcmct.xml
ordcmmp.xml
ordcmpf.xml
ordcmpv.xml
ordcmsd.xml
ordcmui.xml

PL/SQL procedure successfully completed.



Result of the validate_ordim procedure gave different output as shown in the note id.
So I searched more on this issue and found that this because DICOM xml schemas have not been loaded.

How To Reload Oracle Multimedia Related Information when XML Database (=XDB) Has Been Reinstalled (Doc ID 965892.1)

declare
*
ERROR at line 1:
ORA-31050: Access denied
ORA-06512: at "ORDSYS.ORD_DICOM_ADMIN_PRV", line 6306
ORA-06512: at "ORDSYS.ORD_DICOM_ADMIN", line 125
ORA-06512: at line 67



ORA-31050 "Access denied" during Upgrade of Oracle Multimedia (ORDIM) to 11.2.0.2.0 (Doc ID 1274613.1)


Oracle note suggested the , though XDB compenent is valid, there is an issue during upgrade , look for below in the upgrade log

  11 END;
  12 /
DECLARE
*
ERROR at line 1:
ORA-31061: XDB error : XML event error
ORA-19202: Error occurred in XML processing (
LSX-00023: unknown namespace URI "0/�Â"
)
ORA-06512: at "XDB.DBMS_XDB", line 201
ORA-06512: at line 9


Although dba_registry shows that XDB is VALID, this component was not completely upgraded due to the incorrect setting of the environment variable, LD_LIBRARY_PATH (LIBPATH on AIX, SHLIB_PATH on HP-UX PA-RISC). Solution given was to do the upgrade again.




  1. Confirm XML DB is not already installed.
    • SELECT username FROM DBA_USERS where username = 'XDB'; returns 0 rows.
    • DESCRIBE RESOURCE_VIEW fails.
  2. As SYS, perform the XML DB installation - Note: this will not be a complete build due to Oracle Bug 9818995:
    • @?/rdbms/admin/catqm.sql xdb <xml db tablespace> <xdb temp tablespace> <use secure files yes|no>
  3. As SYS, grant the relevent permissions to the XDB user:
    • GRANT EXECUTE ON utl_file TO xdb;
    • GRANT EXECUTE ON dbms_lob TO xdb;
  4. As SYS, ensure their are no invalid objects:
    • @?/rdbms/admin/utlrp.sql
  5. Assuming an spfile is in use, as SYS configure the XDB dispatcher:
    • ALTER SYSTEM SET DISPATCHERS="(PROTOCOL=TCP)(SERVICE=<SID>DB)" SCOPE=both;
    • If init.ora is in use, edit the init.ora file instead.
  6. As SYS, install the missing contents caused by Oracle Bug 9818995:
    • SHUTDOWN IMMEDIATE;
    • STARTUP UPGRADE
    • @?/rdbms/admin/xsdbmig.sql
    • SHUTDOWN IMMEDIATE;
    • STARTUP
  7. As SYS, ensure their are no invalid objects:
    • @?/rdbms/admin/utlrp.sql


After reinstall of XDB , finally ran the below again 


set echo on
spool ord.log
alter session set current_schema="ORDSYS";
@?/ord/im/admin/imxreg.sql;
@?/ord/im/admin/impbs.sql;
@?/ord/im/admin/impvs.sql;
@?/ord/im/admin/imtyb.sql;
@?/ord/im/admin/implb.sql;
@?/ord/im/admin/imxrepos.sql;
spool off



Finally that resolved the issue and Oracle multimedia became VALID