Repairing the Database with DSE

When doing repairs with DSE, understanding the nature of the information in the database provides a significant advantage in choosing an appropriate and efficient repair design.

For example, if you know that certain data is purged weekly, and you find damage in some of this type of data that is already five or six days old, you may be able to discard rather than repair it. Similarly, you might find damage to a small cross-index global and have a program that can quickly rebuild it.

When you know what the data "looks" like, you are in a much better position to recognize anomalies and clues in both keys and data. For example, if you understand the format of a particular type of node, you might recognize a case where two pieces of data have been combined into a single GDS record.

Using the Proper Database File

Because DSE lets you perform arbitrary actions without imposing any logical constraints, you must ensure that they are applied to the proper file.

First, verify that GTM$GBLDIR names an appropriate Global Directory. Check the definition with the SHOW LOGICAL command. You may create or use Global Directories that differ from the "normal" Global Directory. For instance, you might create a Global Directory that mapped all global names except a normally unused name to a file with integrity problems, and map that unused name to a new file. Then you could use MUPIP to CREATE the new file and use DSE to SAVE blocks from the damaged file and RESTORE them to the new file for later analysis.

When you initiate DSE, it operates on the default region specified by the Global Directory. Once DSE is invoked, use FIND /REGION to determine the available regions, and then to select the appropriate region. The technique of creating a temporary Global Directory, with the target region for the repair as the default region, prevents accidental changes to the wrong region.

Locating Structures with DSE

DSE provides the FIND command and the RANGE command for locating information.

FIND /REGION=redirects DSE actions to a specified region.

FIND /BLOCK= locates a block by using the key in the first record of the block to try to look up that block through the B-tree index. If the block is not part of the tree, or the indexing of the block is damaged, DSE reports that the search failed.

FIND /SIBLING /BLOCK= operates like FIND /BLOCK; however it reports the numbers of the blocks that logically fall before and after the specified block on the same level.

FIND /EXHAUSTIVE /BLOCK= locates a block by looking through the B-tree index for any pointer to the block. This should find the block in the case where the block is connected to the tree but the first key in the block does not match the index path. FIND /EXHAUSTIVE is useful in locating all paths to a "doubly allocated" block.

FIND /KEY= uses the index to locate the level zero (0) block , or data block, containing the key. If the key does not exist, it uses the index to locate the block in which it would reside. Note that FIND only works with the index as currently composed. In other words, it cannot FIND the "right" place, only the place pointed to by the index at the time the command is issued. These two locations should be, and may well be, the same; however, remind yourself to search for and take into account all information describing the failure.

FIND /FREE /HINT locates the "closest" free block to the hint. This provides a tool for locating blocks to add to the B-tree, or to hold block copies created with SAVE that would otherwise be lost when DSE exits. FIND /FREE relies on the bitmaps to locate its target, so be sure to fix any blocks incorrectly marked "FREE" before using this command.

The RANGE command sifts through blocks looking for keys. RANGE checks blocks without regard to whether they are in the B-tree, and without regard to whether they are marked free or busy in the bitmaps. RANGE provides a brute force way to find a key if it exists and can be very time consuming in a large database. Note that RANGE may report blocks that were previously used and were legitimately removed from the tree by an M KILL command.

Safety in Repairs

DSE is a powerful tool with few restrictions that places great responsibility on the user. Establishing the following habits can greatly increase the safety margin.

  • Plan your fallback strategy before starting repairs with DSE.

    This will enable you to make the best choice between repair and restore and/or recovery strategies as your analysis proceeds. In addition, you will be able to reasonably assess the potential risks of your decision.

  • Determine, at least approximately, the extent of the damage, and how much work has been done since the last backup.

  • Check the existence, dates, and sizes of all files; do not assume that everything is as it "should" be.

  • Estimate the time required to restore and redo the work. Determine if there are special circumstances, such as imminent deadlines.

  • Consider whether you have the disk space to pursue two courses in parallel.

  • Consider whether you should back up the damaged database for additional protection or for later analysis.

  • Before changing any block in the database, always use the DSE SAVE command to make an in-memory copy of that block.

    If a modification fails to accomplish its intended goal, you can use the DSE RESTORE command to get the block back to its previous state. For instance, a CHANGE /BSIZ= that specifies a smaller block size causes DSE to discard all information falling beyond the new size.

    An important aspect of this strategy is recognizing that testing some modifications requires using other tools such as MUPIP INTEG, but once you leave DSE to invoke MUPIP you lose anything saved in memory. To avoid this problem, use SPAWN to access those tools.

    To save a copy of the block for further analysis, SAVE it, and then RESTORE it to an empty block. The best place to put such a copy, using RESTORE /REGION=, is in a special region created just to receive such blocks.

    Alternatively, you can RESTORE it temporarily in a free block within the region, preferably near the end of the file. If you RESTORE the block to the original database, it may be overlaid when normal operation requires more blocks. You may prevent this overlay by using MAP /BUSY on the target block of the RESTORE. However this causes INTEG to report "incorrectly marked busy" errors.

  • After changing a block, always check the quality of the result by using the DSE INTEG command.

    DSE INTEG does not check the placement of the block in the tree. It checks only the single block specified explicitly with the /BLOCK= qualifier or implicitly (the current block) when /BLOCK= is omitted. If you need to verify the index structure related to a block, SPAWN and use MUPIP INTEG /REGION /FAST, possibly with the /BLOCK or /SUBSCRIPT qualifiers.

    Specifying /BLOCK= tends to avoid incorrect assumptions about which block DSE last handled. Not specifying /BLOCK= tends to minimize typographical errors in identifying the block.

Discarding Data

When you must discard a block or a record, take steps to preserve or create structures that have integrity.

DSE has no single command that discards a block. You must locate the last block in its path with FIND [/BLOCK] or FIND /EXHAUSTIVE and REMOVE the record that points to the block being discarded. Then MAP the deleted block /FREE.

When you discard the only record in any block you must MAP that block /FREE and REMOVE the record (up one level) that points to the deleted block. The only exception is when it is the only block pointed to by the root block of the tree. Leaving empty blocks (except as the data level of empty or undefined globals) violates standard operating assumptions of GDS databases.

When you must discard the top block in a Global Variable Tree, you can alternatively use the method employed by GT.M when it processes a KILL command. This method maintains a record of the global variable name. To use this method, use FIND /FREE to locate a free block, and MAP the new block /BUSY. Next, CHANGE the new block /BSIZ=header-size (7/8) /LEVEL=0. Finally, CHANGE the top level block /BSIZ=header-size (7/8) /LEVEL=1 and ADD /STAR /POINTER=the-new-block.

Never delete the only remaining record in block one (1). Block one (1) is the root block of the Directory Tree for the entire file.

Concurrent Repairs

DSE can operate concurrently with normal access by the GT.M run-time system. This lets you perform an investigation and some types of repairs with minimal disruption.

Some repairs should only be undertaken by a process that has standalone access to the database, while other repairs present no danger when performed with other users accessing the file. However, there is still some risk with the latter type of repairs, depending on the "placement" of the error and the likelihood of concurrent access to that area of the database.

Unless availability is a critical problem, FIS recommends performing all repairs in standalone mode to ensure the safety of data. For environments where availability is an issue, your knowledge of the application and how it is used are the best guides in assessing the risk of performing concurrent repairs. To help you assess the amount of risk, the following sections identify repairs that should only be undertaken with standalone access.

If you attempt concurrent repairs, plan the order of your updates carefully. Always REMOVE the index record that points to a block before using MAP /FREE on that block. Always MAP a block /BUSY and assure that it meets GDS design criteria and accomplishes the repair goal before using ADD to create an index record that points to that block.

Terminating Processes

In performing some types of repairs, you may have to stop one or more processes. You can choose from several methods.

  • If the process' principal device is not available, or the process does not respond to pressing <CTRL-C>, use MUPIP STOP. This allows GT.M to disengage the process from all shared resources, such as I/O devices and open database files.

  • The DSE command CRITICAL /INITIALIZE /RESET causes GT.M to terminate all images that are actively accessing the target database. This DSE command has a similar effect on processes to that of the DCL command STOP, except that it simultaneously terminates all processes actively using a database. However, DCL STOP should not, by itself, damage database contents.

  • Finally, if the process does not respond to MUPIP STOP, use the DCL command STOP. This terminates the process abruptly and may leave database files improperly closed and require a MUPIP RUNDOWN. However, DCL STOP should not, by itself, damage database contents.

When you have terminated all processes, do a MUPIP RUNDOWN on all database files:

To preserve database integrity, always verify that all GT.M images have terminated and all GDS databases are RUNDOWN before shutting down your system.