Quantcast
Channel: Jared Still – Official Pythian Blog
Viewing all articles
Browse latest Browse all 34

World Backup Day – Protect Your Oracle Databases

$
0
0

In recognition of World Backup Day, I will discuss backups as they pertain to Oracle databases.

There are a number of options for backing up, testing backups, and restoring them. In fact, the number of possibilities can be a little overwhelming. Backing up does not necessarily need to be too complex. It is best to start off simple, and then modify the backup procedures as needed.

This article is not a tutorial by any means, but just an overview of some of the methods for backing up your database, and testing those backups. For some readers, this may be a refresher, and for those new to Oracle, it is a brief introduction to some features of the Oracle RMAN utility.

Making a backup can be as simple as the RMAN command backup database. Most backups, however, are more complex than that. Rather than spend much time on backups though, I actually want to talk about testing backups.

Backups that do not work when a restore and recovery attempt is made are the things that nightmares are made of. Making sure the backups are actually useful is something that cannot be overstated.

My interest in backups

First a little background on why I tend to be rather particular about making backups and being certain that a backup can be used to successfully restore a database. The two incidents below are from long ago; pre-Pythian and pre-Oracle (at least for me). The technologies used will make this obvious.

Inventory System

Long ago I was responsible for an inventory system. The fact that this inventory software ran on 8-inch floppy disks will provide clues to a couple of things:

  • it was long ago
  • it could not have been a large inventory (thankfully, as you will see)

One day something happened to the inventory disk. I do not recall just what happened, but I do remember that the entire inventory had to be re-entered.

By hand.

By me.

You would think I would have learned my lesson and start making regular backups. But no. If you have never made backups on 8-inch floppy disks, you do not really understand the meaning of the word ‘tedium’.

But of course, there was another crash, the disk was trash, and I had to re-enter all of the inventory.

Again.

By hand.

That was powerful lesson. But wait, there’s more!

The Orders System

At a somewhat later stage, I was at a different job, working onsite at a client site where they ran our custom software. My job that day was to upgrade the version of Xenix (Google it!) on the client’s machine. And of course, before doing that I made a backup. This backup was to tape, and so fairly simple to make.

The backup was made via the Unix (Linux not yet invented) tar command, directly to the tape device. Pretty standard stuff for the time. When the backup was complete, I ran ‘tar tvf /dev/tape’ to get a directory listing of the backup as a means of verification. The command was successful, all the correct files were there.

Next steps:

  • format the system drive
  • install later version of Xenix
  • reload backup software and data files

The first two steps were fine. That final, and crucial step, however, was not so good. It seems a colleague from the employer had upgraded the RAM the previous day. One step in that process was missed. Everything appeared to be OK after that. The problem was the memory used by the tape driver was not handled correctly.

Every file on that backup was nothing but garbage, except for the header. The headers were fine, but the contents were not.

This was not a good day, week, or month.

Should I ever seem rather persnickety and nitpicky about backups and restores, there are good reasons for that.

Backups are not that important

I say that only in jest; of course backups are important. But are there some things more important than backups?

Restores: restores are very important. Even more important is the recovery, as the database cannot be opened until the recovery step completes.

How is recovery different than restore? Recovery is the process that brings the restored files to a point where the database can be opened. Often this point is a particular time or SCN (System Change Number) that is specified at the time the database is recovered. This is referred to as an ‘incomplete’ recovery.

A ‘complete recovery’ is one where ALL redo from the archive logs is applied to the datafiles, applying all changes to the database.

Sure, you need backups to do a restore. But if you never test your backups, you may as well just not make them. A backup that cannot be used when a restore is necessary can make life very difficult indeed. You want to do everything you can to protect your data.

There is much more to know about Oracle backups, but for the purpose of discussing the validation of backups, the backup will be kept simple.

Testing Oracle Backups

The best way to test a backup is to restore it.

While this is the ideal, it is not always practical, and sometimes not even possible due to the size of the database. Very few companies are willing to spend the money and time needed to have a server available just for testing a restore of a 152-terabyte database.

The next best thing is to validate the restores. Here, Oracle has you covered with RMAN methods to validate that backups can indeed be restored.

Backup Validate

First on the list is backup validate. I know, I said this would be about restoring databases. However, the backup validate command can help you determine the condition of your database, spotting physical corruption in the database, and if you choose, it can also locate and record logical discrepancies.

The command can be as simple as backup validate check logical database

This will take some time, depending on the size of the database, as Oracle will read all required database files. This operation can be performed online and is non-destructive, as no backup files are created, and no files are altered.

Read more here: Backup Validate

Make a Backup

The following is the RMAN script used to create a backup that is used for some following commands.

run {
  allocate channel ch1 device type DISK;
  allocate channel ch2 device type DISK;
  backup incremental level=0 tag "RMAN-TEST" format '/mnt/oracle-backups/rac19c01/%d_T%T_db_s%s_p%p_t%t' database filesperset 4 plus archivelog tag "RMAN-TEST";
  alter system switch logfile;
  backup format '/mnt/oracle-backups/rac19c01/%d_T%T_arch_s%s_p%p_t%t' archivelog all filesperset 4 tag "RMAN-TEST";
  backup format '/mnt/oracle-backups/rac19c01/%d_T%T_cf%s_p%p_t%t' current controlfile tag "RMAN-TEST";
}

Restore Preview

Restore preview is fast, very fast. It is fast because it really doesn’t do much. The preview simply reads the metadata from the backup catalogue. This is useful just to ensure that the catalogue has captured the files necessary for a restore. This command does not read any files, just the metadata.

RMAN> restore preview database;

Starting restore at 2020-03-02 17:30:50
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=300 instance=cdb1 device type=DISK


List of Backup Sets
===================


BS Key  Type LV Size       Device Type Elapsed Time Completion Time
------- ---- -- ---------- ----------- ------------ -------------------
24      Incr 0  840.32M    DISK        00:00:20     2020-03-02 17:14:57
        BP Key: 24   Status: AVAILABLE  Compressed: NO  Tag: RMAN-TEST
        Piece Name: /mnt/oracle-backups/rac19c01/CDB_T20200302_db_s33_p1_t1034010877
  List of Datafiles in backup set 24
  File LV Type Ckp SCN    Ckp Time            Abs Fuz SCN Sparse Name
  ---- -- ---- ---------- ------------------- ----------- ------ ----
  1    0  Incr 80321492   2020-03-02 17:14:37              NO    +DATA/CDB/DATAFILE/system.259.1021126503
  4    0  Incr 80321492   2020-03-02 17:14:37              NO    +DATA/CDB/DATAFILE/undotbs1.261.1021126575
  9    0  Incr 80321492   2020-03-02 17:14:37              NO    +DATA/CDB/DATAFILE/undotbs2.274.1021127203

...


List of Archived Log Copies for database with db_unique_name CDB
=====================================================================

Key     Thrd Seq     S Low Time
------- ---- ------- - -------------------
810     1    457     A 2020-03-02 17:14:19
        Name: +DATA/CDB/ARCHIVELOG/2020_03_02/thread_1_seq_457.346.1034010983

812     1    458     A 2020-03-02 17:16:23
        Name: +DATA/CDB/ARCHIVELOG/2020_03_02/thread_1_seq_458.342.1034010991

813     1    459     A 2020-03-02 17:16:31
        Name: +DATA/CDB/ARCHIVELOG/2020_03_02/thread_1_seq_459.340.1034010995

811     2    359     A 2020-03-02 17:14:19
        Name: +DATA/CDB/ARCHIVELOG/2020_03_02/thread_2_seq_359.344.1034010985

814     2    360     A 2020-03-02 17:16:25
        Name: +DATA/CDB/ARCHIVELOG/2020_03_02/thread_2_seq_360.339.1034010997

recovery will be done up to SCN 80321491
Media recovery start SCN is 80321491
Recovery must be done beyond SCN 80321782 to clear datafile fuzziness
Finished restore at 2020-03-02 17:30:52

At the end of the listing, you will find that media recovery must be done beyond SCN 80321782 for a successful recovery.

However, the listing does not include that SCN information. It is there, we just need to look at the archive logs explicitly:

RMAN> list backup of archivelog all;

List of Backup Sets
===================


BS Key  Size       Device Type Elapsed Time Completion Time
------- ---------- ----------- ------------ -------------------
3       325.40M    DISK        00:00:06     2020-03-02 17:05:26
        BP Key: 3   Status: AVAILABLE  Compressed: NO  Tag: RMAN-TEST
        Piece Name: /mnt/oracle-backups/rac19c01/CDB_T20200302_db_s11_p1_t1034010320

  List of Archived Logs in backup set 3
  Thrd Seq     Low SCN    Low Time            Next SCN   Next Time
  ---- ------- ---------- ------------------- ---------- ---------
  1    451     80282070   2020-03-02 15:00:03 80302686   2020-03-02 16:22:34
  1    452     80302686   2020-03-02 16:22:34 80318226   2020-03-02 17:05:18
  2    355     80302690   2020-03-02 16:22:36 80318223   2020-03-02 17:05:17

...

  List of Archived Logs in backup set 43
  Thrd Seq     Low SCN    Low Time            Next SCN   Next Time
  ---- ------- ---------- ------------------- ---------- ---------
  1    457     80321365   2020-03-02 17:14:19 80321856   2020-03-02 17:16:23
  1    458     80321856   2020-03-02 17:16:23 80321897   2020-03-02 17:16:31
  1    459     80321897   2020-03-02 17:16:31 80321912   2020-03-02 17:16:34
  2    360     80321860   2020-03-02 17:16:25 80321989   2020-03-02 17:16:37

...

We see 80321860, which is well past 80321782 and so we are confident that at least the required files have been recorded in the backup catalogue.

Note: By default, RMAN uses a date formate of ‘YYYY-MON-DD’.

This behavior can be altered by setting the NLS_DATE_FORMAT shell variable:

$ export NLS_DATE_FORMAT='YYYY-MM-DD HH24:MI:SS'

$ rman target /
...

Restore Validate

Does the knowledge that your backup files are recorded in the backup catalogue give you a warm, fuzzy feeling of contentment?

It shouldn’t. A database cannot be restored with metadata alone. Not knowing that backups have been tested is the kind of thing that should keep a DBA awake at night.

Knowing that you need proper rest, let’s see another method to test backups, restore validate.

RMAN can validate the files used for a restore. This is not just a test to see if the files are available, but a read of all blocks in the backup files to ensure they are valid.

The following is an example, using the backup that was made previously:

RMAN> restore validate database;

Starting restore at 2020-03-02 17:52:37
using channel ORA_DISK_1

channel ORA_DISK_1: starting validation of datafile backup set
channel ORA_DISK_1: reading from backup piece /mnt/oracle-backups/rac19c01/CDB_T20200302_db_s32_p1_t1034010877
channel ORA_DISK_1: piece handle=/mnt/oracle-backups/rac19c01/CDB_T20200302_db_s32_p1_t1034010877 tag=RMAN-TEST
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: validation complete, elapsed time: 00:00:01

...

channel ORA_DISK_1: starting validation of datafile backup set
channel ORA_DISK_1: reading from backup piece /mnt/oracle-backups/rac19c01/CDB_T20200302_db_s43_p1_t1034010975
channel ORA_DISK_1: piece handle=/mnt/oracle-backups/rac19c01/CDB_T20200302_db_s43_p1_t1034010975 tag=RMAN-TEST
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: validation complete, elapsed time: 00:00:07
Finished restore at 2020-03-02 17:54:11

This is a small test database; the validation required only about two minutes.

It’s okay to get some warm and fuzzies now, but don’t get too cozy. There are other types of files that must be restored for a recovery to take place.

Let’s check on the archive log files. We know that all the required archive logs were created after 13:00 today, as the backup was made at about 17:14, and the alert log shows several log switches between 17:00 and 17:14

2020-03-02T17:07:24.248403-08:00
Thread 1 advanced to log sequence 455 (LGWR switch)
  Current log# 1 seq# 455 mem# 0: +DATA/CDB/ONLINELOG/group_1.265.1021126651
  Current log# 1 seq# 455 mem# 1: +DATA/CDB/ONLINELOG/group_1.268.1021126653
2020-03-02T17:07:24.320618-08:00
ARC2 (PID:30093): Archived Log entry 805 added for T-1.S-454 ID 0x7f6be175 LAD:1

So now we can validate that archive logs can be read:

RMAN>  restore validate archivelog  from time "to_date('2020-03-02 17:00:00','yyyy-mm-dd hh24:mi:ss')";

Starting restore at 2020-03-02 18:09:58
using channel ORA_DISK_1

channel ORA_DISK_1: scanning archived log +DATA/CDB/ARCHIVELOG/2020_03_02/thread_1_seq_452.341.1034010319
channel ORA_DISK_1: scanning archived log +DATA/CDB/ARCHIVELOG/2020_03_02/thread_1_seq_453.349.1034010441
channel ORA_DISK_1: scanning archived log +DATA/CDB/ARCHIVELOG/2020_03_02/thread_1_seq_454.352.1034010445
channel ORA_DISK_1: scanning archived log +DATA/CDB/ARCHIVELOG/2020_03_02/thread_1_seq_455.351.1034010445
channel ORA_DISK_1: scanning archived log +DATA/CDB/ARCHIVELOG/2020_03_02/thread_1_seq_456.347.1034010859
channel ORA_DISK_1: scanning archived log +DATA/CDB/ARCHIVELOG/2020_03_02/thread_1_seq_457.346.1034010983
channel ORA_DISK_1: scanning archived log +DATA/CDB/ARCHIVELOG/2020_03_02/thread_1_seq_458.342.1034010991
channel ORA_DISK_1: scanning archived log +DATA/CDB/ARCHIVELOG/2020_03_02/thread_1_seq_459.340.1034010995
channel ORA_DISK_1: scanning archived log +DATA/CDB/ARCHIVELOG/2020_03_02/thread_2_seq_355.353.1034010317
channel ORA_DISK_1: scanning archived log +DATA/CDB/ARCHIVELOG/2020_03_02/thread_2_seq_356.319.1034010441
channel ORA_DISK_1: scanning archived log +DATA/CDB/ARCHIVELOG/2020_03_02/thread_2_seq_357.350.1034010447
channel ORA_DISK_1: scanning archived log +DATA/CDB/ARCHIVELOG/2020_03_02/thread_2_seq_358.348.1034010859
channel ORA_DISK_1: scanning archived log +DATA/CDB/ARCHIVELOG/2020_03_02/thread_2_seq_359.344.1034010985
channel ORA_DISK_1: scanning archived log +DATA/CDB/ARCHIVELOG/2020_03_02/thread_2_seq_360.339.1034010997
Finished restore at 2020-03-02 18:10:12

Let’s also see if the controlfile is there:

RMAN>  restore validate controlfile;

Starting restore at 2020-03-02 18:13:09
using channel ORA_DISK_1

channel ORA_DISK_1: starting validation of datafile backup set
channel ORA_DISK_1: reading from backup piece +DATA/CDB/AUTOBACKUP/2020_03_02/s_1034011019.338.1034011019
channel ORA_DISK_1: piece handle=+DATA/CDB/AUTOBACKUP/2020_03_02/s_1034011019.338.1034011019 tag=TAG20200302T171659
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: validation complete, elapsed time: 00:00:01
Finished restore at 2020-03-02 18:13:11

Finally, a list of the backup by tag (you do use tags, don’t you?) can be used to show that all files are present:

RMAN> list backupset tag "RMAN-TEST";


List of Backup Sets
===================


BS Key  Size       Device Type Elapsed Time Completion Time
------- ---------- ----------- ------------ -------------------
3       325.40M    DISK        00:00:06     2020-03-02 17:05:26
        BP Key: 3   Status: AVAILABLE  Compressed: NO  Tag: RMAN-TEST
        Piece Name: /mnt/oracle-backups/rac19c01/CDB_T20200302_db_s11_p1_t1034010320

...

BS Key  Size       Device Type Elapsed Time Completion Time
------- ---------- ----------- ------------ -------------------
44      4.32M      DISK        00:00:00     2020-03-02 17:16:54
        BP Key: 44   Status: AVAILABLE  Compressed: NO  Tag: RMAN-TEST
        Piece Name: /mnt/oracle-backups/rac19c01/CDB_T20200302_arch_s53_p1_t1034011014

  List of Archived Logs in backup set 44
  Thrd Seq     Low SCN    Low Time            Next SCN   Next Time
  ---- ------- ---------- ------------------- ---------- ---------
  1    453     80318226   2020-03-02 17:05:18 80318656   2020-03-02 17:07:20
  1    454     80318656   2020-03-02 17:07:20 80318683   2020-03-02 17:07:24
  2    356     80318223   2020-03-02 17:05:17 80318660   2020-03-02 17:07:21
  2    357     80318660   2020-03-02 17:07:21 80318695   2020-03-02 17:07:27

BS Key  Type LV Size       Device Type Elapsed Time Completion Time
------- ---- -- ---------- ----------- ------------ -------------------
45      Full    19.08M     DISK        00:00:01     2020-03-02 17:16:58
        BP Key: 45   Status: AVAILABLE  Compressed: NO  Tag: RMAN-TEST
        Piece Name: /mnt/oracle-backups/rac19c01/CDB_T20200302_cf54_p1_t1034011017
  Control File Included: Ckp SCN: 80322102     Ckp time: 2020-03-02 17:16:57

Now you can rest easy, as the data files, archivelogs and controlfile are all present in the backup made earlier.

The only way to be more sure of the backup is to use it to restore and recover the database.

Docs here: RESTORE

Recover Validate Header

While recover validate may sound like restore validate on steroids, it really isn’t.

According to the documentation, this command is nearly the same as restore preview. The difference is that when recover validate is used, the file headers are read to ensure they match what is found in the backup catalogue.

This would provide a little more assurance than the restore preview, but not as much as the restore validate.

The problem here is that in this 19.3 database, recover database validate header is examining only the archive log backups:

MAN> recover database validate header;

Starting recover at 2020-03-02 18:43:11
using channel ORA_DISK_1
using channel ORA_DISK_1


List of Backup Sets
===================


BS Key  Size       Device Type Elapsed Time Completion Time
------- ---------- ----------- ------------ -------------------
40      173.87M    DISK        00:00:04     2020-03-02 17:16:49
        BP Key: 40   Status: AVAILABLE  Compressed: NO  Tag: RMAN-TEST
        Piece Name: /mnt/oracle-backups/rac19c01/CDB_T20200302_arch_s49_p1_t1034011005

  List of Archived Logs in backup set 40
  Thrd Seq     Low SCN    Low Time            Next SCN   Next Time
  ---- ------- ---------- ------------------- ---------- ---------
  1    449     80240390   2020-03-02 12:18:51 80261546   2020-03-02 13:39:51

BS Key  Size       Device Type Elapsed Time Completion Time
------- ---------- ----------- ------------ -------------------
38      199.81M    DISK        00:00:05     2020-03-02 17:16:43
        BP Key: 38   Status: AVAILABLE  Compressed: NO  Tag: RMAN-TEST
        Piece Name: /mnt/oracle-backups/rac19c01/CDB_T20200302_arch_s47_p1_t1034010998

  List of Archived Logs in backup set 38
  Thrd Seq     Low SCN    Low Time            Next SCN   Next Time
  ---- ------- ---------- ------------------- ---------- ---------
  2    354     80240394   2020-03-02 12:18:51 80302690   2020-03-02 16:22:36
  1    450     80261546   2020-03-02 13:39:51 80282070   2020-03-02 15:00:03

BS Key  Size       Device Type Elapsed Time Completion Time
------- ---------- ----------- ------------ -------------------
39      174.02M    DISK        00:00:05     2020-03-02 17:16:43
        BP Key: 39   Status: AVAILABLE  Compressed: NO  Tag: RMAN-TEST
        Piece Name: /mnt/oracle-backups/rac19c01/CDB_T20200302_arch_s48_p1_t1034010998

  List of Archived Logs in backup set 39
  Thrd Seq     Low SCN    Low Time            Next SCN   Next Time
  ---- ------- ---------- ------------------- ---------- ---------
  1    451     80282070   2020-03-02 15:00:03 80302686   2020-03-02 16:22:34

BS Key  Size       Device Type Elapsed Time Completion Time
------- ---------- ----------- ------------ -------------------
41      151.39M    DISK        00:00:05     2020-03-02 17:16:50
        BP Key: 41   Status: AVAILABLE  Compressed: NO  Tag: RMAN-TEST
        Piece Name: /mnt/oracle-backups/rac19c01/CDB_T20200302_arch_s50_p1_t1034011005

  List of Archived Logs in backup set 41
  Thrd Seq     Low SCN    Low Time            Next SCN   Next Time
  ---- ------- ---------- ------------------- ---------- ---------
  1    452     80302686   2020-03-02 16:22:34 80318226   2020-03-02 17:05:18
  2    355     80302690   2020-03-02 16:22:36 80318223   2020-03-02 17:05:17

BS Key  Size       Device Type Elapsed Time Completion Time
------- ---------- ----------- ------------ -------------------
44      4.32M      DISK        00:00:00     2020-03-02 17:16:54
        BP Key: 44   Status: AVAILABLE  Compressed: NO  Tag: RMAN-TEST
        Piece Name: /mnt/oracle-backups/rac19c01/CDB_T20200302_arch_s53_p1_t1034011014

  List of Archived Logs in backup set 44
  Thrd Seq     Low SCN    Low Time            Next SCN   Next Time
  ---- ------- ---------- ------------------- ---------- ---------
  2    356     80318223   2020-03-02 17:05:17 80318660   2020-03-02 17:07:21
  1    453     80318226   2020-03-02 17:05:18 80318656   2020-03-02 17:07:20
  1    454     80318656   2020-03-02 17:07:20 80318683   2020-03-02 17:07:24
  2    357     80318660   2020-03-02 17:07:21 80318695   2020-03-02 17:07:27

BS Key  Size       Device Type Elapsed Time Completion Time
------- ---------- ----------- ------------ -------------------
42      15.46M     DISK        00:00:00     2020-03-02 17:16:53
        BP Key: 42   Status: AVAILABLE  Compressed: NO  Tag: RMAN-TEST
        Piece Name: /mnt/oracle-backups/rac19c01/CDB_T20200302_arch_s51_p1_t1034011013

  List of Archived Logs in backup set 42
  Thrd Seq     Low SCN    Low Time            Next SCN   Next Time
  ---- ------- ---------- ------------------- ---------- ---------
  1    455     80318683   2020-03-02 17:07:24 80318691   2020-03-02 17:07:24
  1    456     80318691   2020-03-02 17:07:24 80321365   2020-03-02 17:14:19
  2    358     80318695   2020-03-02 17:07:27 80321362   2020-03-02 17:14:19
  2    359     80321362   2020-03-02 17:14:19 80321860   2020-03-02 17:16:25

BS Key  Size       Device Type Elapsed Time Completion Time
------- ---------- ----------- ------------ -------------------
43      5.46M      DISK        00:00:00     2020-03-02 17:16:53
        BP Key: 43   Status: AVAILABLE  Compressed: NO  Tag: RMAN-TEST
        Piece Name: /mnt/oracle-backups/rac19c01/CDB_T20200302_arch_s52_p1_t1034011013

  List of Archived Logs in backup set 43
  Thrd Seq     Low SCN    Low Time            Next SCN   Next Time
  ---- ------- ---------- ------------------- ---------- ---------
  1    457     80321365   2020-03-02 17:14:19 80321856   2020-03-02 17:16:23
  1    458     80321856   2020-03-02 17:16:23 80321897   2020-03-02 17:16:31
  2    360     80321860   2020-03-02 17:16:25 80321989   2020-03-02 17:16:37
  1    459     80321897   2020-03-02 17:16:31 80321912   2020-03-02 17:16:34
validation succeeded for backup piece
recovery will be done up to SCN 2163919
Media recovery start SCN is 2163919
Recovery must be done beyond SCN 80321989 to clear datafile fuzziness
Finished recover at 2020-03-02 18:43:13

At this time I do not know if this is due to a bug in the code, or a bug in the documentation.

Either way, restore validate gives me the kind of assurance I need that my backups can be restored.

Learn more about Pythian’s Oracle Services.


Viewing all articles
Browse latest Browse all 34

Trending Articles