If you experience database integrity problems, there are three strategies to consider when approaching the recovery:
Recover with journaling
Restore from backup and redo any lost work
Repair the database
To achieve the intended result, correction of database errors requires careful planning. Each strategy differs from the others in the scope of damage it can handle, in skills needed, and in database availability.
Journaling is generally the most attractive approach to recovery from integrity problems. It allows management of recovery using logical rather than physical constructs, including suppression of updates based on time and/or source and preservation of application-level logical transactions. Backward journal recovery is generally the fastest means of repair. The cost of journaling is the added load it imposes on normal operation to make and store the journal files. For more information on journaling, refer to the "GT.M Journaling" chapter.
Restoring the database from the backup is the least technically sophisticated approach to handling integrity problems. This strategy is most beneficial when the data in the database is static or can be recomputed. In other cases, it requires operational controls to identify, and people to reenter, the work performed between the backup and the failure. For more information on MUPIP BACKUP, RESTORE, EXTRACT, and LOAD, refer to the "MUPIP" chapter. You may also use UNIX utilities such as tar, dump, and restore.
Some database regions may be setup to hold only temporary data, typically only valid for the life of a GT.M process or even just during some operation performed by a process. Rather than restoring such a region, it is generally more appropriate to delete it and recreate it using MUPIP CREATE.
Database repair with DSE requires more skill, and potentially more time than the other approaches. Using DSE requires vigilant attention to, and a clear understanding of, GDS. DSE can generally access and change almost any data in the database file. When using DSE, you assume the responsibility that GT.M normally carries for ensuring the integrity of the database structure. Because DSE may be used concurrently with other processes, updates by concurrent processes may interfere with repair actions. When possible, prevent other users from accessing the region during repairs.
If you elect to repair the database, you may want to seek assistance from an available source of expertise such as FIS or your GT.M Value Added Reseller (VAR). If your organization plans to perform repairs beyond straightforward corrections to the file header, FIS strongly recommends that the responsible person(s) familiarize themselves with the material in the INTEG section of the MUPIP chapter, the GDS and DSE chapters, and this chapter. FIS recommends using DSE on test files, in advance of any work on production files.
Once you understand the cause of a database integrity problem, you can correct or improve the environment to prevent or minimize future damage. These changes may include hardware reconfiguration, such as improving the quality of power; changes to the operational procedures, such as implementing journaling; and/or changes to the Global Directories, such as balancing data assignment into files of more manageable sizes.
Use the following tools to help determine the cause of a database integrity problem.
Knowledge of the application and how it is used
Context dumps produced by application programs
Core dumps produced by application programs
Core dumps produced by GT.M
Interviews with users to discover their actions
Review of all recent changes to hardware, UNIX, GT.M, the application, procedures, etc.
Copies of damaged files
The trail from DSE sessions in the form of notes, a script file recording the session, sequential files, and saved blocks.
The following questions may help you understand the type of information required to determine the nature of a database integrity problem.
How seriously are operations affected?
What level of urgency do you assign to getting the problem resolved?
What were the circumstances under which the database became damaged or inaccessible?
How was the problem first recognized?
Examine the accounting logs for information about recent process terminations. Capture information about what functions were in use. Look for any information which might be helpful in establishing patterns in case the problem is repetitive.
Has the system crashed recently? If so, what caused the crash?
Is there database damage?
What region(s) are affected? What globals?
What are the error messages?
What do you see when you examine the database?
Are you comfortable with fixing the problem?
What version of GT.M are you using? What version of UNIX? What UNIX platform are you running?
Bring down the damaged application using appropriate utilities, MUPIP RUNDOWN -REGION region or -FILE file-name naming the problem database. Restart the application. Consider writing programs or procedures to partially automate shutting down one or all applications; to reduce the chance of errors.
Make sure to transfer any relevant files or reports to FIS. Please also communicate any information regarding the circumstances surrounding the problem, including the answers to the questions above. Consider the following:
Has any hardware or software component of your system recently changed?
Was anyone doing anything new or unusual?
Was the problem preceded or followed by any other notable events?
Did you have any unusual problems during the analysis or recovery?
Do you have any suggestions about this procedure?