Sequential Master File Update


The Problem/Background

Businesses and organizations often keep data on their customers, inventory, employees, etc. and occassionally need to change that data. If revision of the data is NOT time-critical or the size of the file makes processing the revisions (updates) in real-time infeasible, the master file might be recorded in a sequential file and updated in batch mode.

A complete understanding of this problem includes understanding the nature of files and some COBOL features, i.e.,

The sequential master-file update process assumes several things about the files being used.

The transaction file can have records very similar to the master file (i.e., perhaps very large) or it could have a minimal amount of data. The minimal amount of data would include a key, a transaction code, and a new value for one of the data fields in the master file. In general there are three kinds of transactions:

We will discuss two algorithms for this problem. The first is an elegant approach (it is simple, straightforward, and appeals to my sense of aesthetics). It handles the easy stuff (simple additions, changes, & deletions) and also performs well when asked to make multiple changes to existing records. However, it cannot handle any transactions on a newly added record. Unfortunately, that situation is a reasonable one for many applications.

It might be possible to modify this algorithm and make it work, but I doubt it. The algorithm might be useful for other processes and certainly is useful for updates in which changes to new additions do not occur.

Here is the main logic. (Isn't it sweet?) It makes use of a trick when reading the files--when you get to the end of the file, place a value higher than any key-value into the key field for that file. In COBOL, to use this trick requires "reading into" or reading and then moving data to a record in the working storage section. (Learned from George Struble while TAing at the Unviersity of Oregon.)

read-an-old-master-record
read-a-transaction-record

while  not-at-end-of-BOTH-files

    if  master-key < transaction-key
    then
	write-master-file-from-master-record
	read-an-old-master-record
    else
    if  master-key = transaction-key
    then
	attempt-to-apply-transaction-to-master-record
	read-a-transaction-record
    else 
    if  master-key > transaction-key
    then
	attempt-to-add-transaction-record
	read-a-transaction-record
    end-if end-if
end-while

Here are some of the routines called by the main logic. I went ahead and wrote the read routines in COBOL.

to read-an-old-master-record

    READ  MASTER-FILE  INTO  MASTER-RECORD-WS
	AT END
	    MOVE  HIGH-VALUES  TO  MASTER-KEY
    END-READ.


to read-a-transaction-record

    READ  TRANSACTION-FILE  INTO  TRANSACTION-RECORD-WS
	AT END
	    MOVE  HIGH-VALUES  TO  TRANSACTION-KEY
    END-READ.


to attempt-to-apply-tranaction-to-master-record

    if  transaction-code = add
    then
	report-error(adding record with existing key)
    else 
    if  transaction-code = change
    then
	apply-the-change-to-the-master-record
    else 
    if  transaction-code = delete
    then
	read-an-old-master-record
    else
	report-error(invalid transaction code)
    end-if end-if end-if


to attempt-to-add-transaction-record

    if  transaction-code = add
    then
	write-master-file-from-transaction-record
    else 
    if  transaction-code = change
    then
	report-error(changing non-existent record)
    else 
    if  transaction-code = delete
    then
	report-error(deleting non-existent record)
    else
	report-error(invalid transaction code)
    end-if end-if end-if

An algorithm for overcoming the inability to modify newly added records is the balance-line algorithm (I don't know where the name came from). I think there are two notable differences between it and the one above. First, in the main logic, instead of repeatedly processing a file record, we repeatedly process a key (all the records with a given key). The second difference is that we use a "current record" to hold the data instead of either the record from the (old) master file or the record from the transaction file. Using this holding record requires quite of bit of maneuvering to handle all the situations that can arise.

This algorithm is not as elegant as the previous one -- I have to work much to harder to understand it. It does have one advantage -- it works. First, the main logic. (Adapted from: Barry Dwyer, One More Time--How to Update a Master File, Communications of the ACM, 24(1).

to update-a-sequential-master-file

    open-the-three-files

    read-a-transaction-record
    read-an-old-master-record

    choose-the-current-key

    while  current-key NOT= high-values
	process-transactions-with-current-key
	choose-the-current-key
    end-while

The "read" routines are the same as above. Listed below are some other significant modules in the algorithm.

to choose-the-current-key

    if  transaction-key < old-master-key
    then
	move  transaction-key  to  current-key
    else
	move  old-master-key  to  current-key
    end-if


to process-transactions-with-current-key

    establish-current-master-record

    while  transaction-key = current-key
	process-one-transaction
    end-while

    if  master-record-established
    then
	write-new-master-file-from-current-record
    end-if


to establish-current-master-record

    if  old-master-key = current-key
    then
	move  old-master-record  to  current-record
	move  true  to master-record-established
	read-an-old-master-record
    else
	move  false  to  master-record-established
    end-if


to process-one-transaction

    attempt-to-apply-the-transaction

    read-a-transaction-record

The "attempt-to-apply-the-transaction" module will deal with all transacations. It will also need to make sure that the transactions are actually valid (i.e., not adding an existing record and not changing or deleting a non-existing record).

Please, let me know if find problems with either of these algorithms or the discussion of them.