Businesses and organizations often keep data on their customers, inventory, employees, etc. and occassionally need to change that data. If revision of the data is NOT time-critical or the size of the file makes processing the revisions (updates) in real-time infeasible, the master file might be recorded in a sequential file and updated in batch mode.
A complete understanding of this problem includes understanding the nature of files and some COBOL features, i.e.,
The sequential master-file update process assumes several things about the files being used.
The transaction file can have records very similar to the master file (i.e., perhaps very large) or it could have a minimal amount of data. The minimal amount of data would include a key, a transaction code, and a new value for one of the data fields in the master file. In general there are three kinds of transactions:
We will discuss two algorithms for this problem. The first is an elegant approach (it is simple, straightforward, and appeals to my sense of aesthetics). It handles the easy stuff (simple additions, changes, & deletions) and also performs well when asked to make multiple changes to existing records. However, it cannot handle any transactions on a newly added record. Unfortunately, that situation is a reasonable one for many applications.
It might be possible to modify this algorithm and make it work, but I doubt it. The algorithm might be useful for other processes and certainly is useful for updates in which changes to new additions do not occur.
Here is the main logic. (Isn't it sweet?) It makes use of a trick when reading the files--when you get to the end of the file, place a value higher than any key-value into the key field for that file. In COBOL, to use this trick requires "reading into" or reading and then moving data to a record in the working storage section. (Learned from George Struble while TAing at the Unviersity of Oregon.)
read-an-old-master-record read-a-transaction-record while not-at-end-of-BOTH-files if master-key < transaction-key then write-master-file-from-master-record read-an-old-master-record else if master-key = transaction-key then attempt-to-apply-transaction-to-master-record read-a-transaction-record else if master-key > transaction-key then attempt-to-add-transaction-record read-a-transaction-record end-if end-if end-while
Here are some of the routines called by the main logic. I went ahead and wrote the read routines in COBOL.
to read-an-old-master-record READ MASTER-FILE INTO MASTER-RECORD-WS AT END MOVE HIGH-VALUES TO MASTER-KEY END-READ. to read-a-transaction-record READ TRANSACTION-FILE INTO TRANSACTION-RECORD-WS AT END MOVE HIGH-VALUES TO TRANSACTION-KEY END-READ. to attempt-to-apply-tranaction-to-master-record if transaction-code = add then report-error(adding record with existing key) else if transaction-code = change then apply-the-change-to-the-master-record else if transaction-code = delete then read-an-old-master-record else report-error(invalid transaction code) end-if end-if end-if to attempt-to-add-transaction-record if transaction-code = add then write-master-file-from-transaction-record else if transaction-code = change then report-error(changing non-existent record) else if transaction-code = delete then report-error(deleting non-existent record) else report-error(invalid transaction code) end-if end-if end-if
An algorithm for overcoming the inability to modify newly added records is the balance-line algorithm (I don't know where the name came from). I think there are two notable differences between it and the one above. First, in the main logic, instead of repeatedly processing a file record, we repeatedly process a key (all the records with a given key). The second difference is that we use a "current record" to hold the data instead of either the record from the (old) master file or the record from the transaction file. Using this holding record requires quite of bit of maneuvering to handle all the situations that can arise.
This algorithm is not as elegant as the previous one -- I have to work much to harder to understand it. It does have one advantage -- it works. First, the main logic. (Adapted from: Barry Dwyer, One More Time--How to Update a Master File, Communications of the ACM, 24(1).
to update-a-sequential-master-file open-the-three-files read-a-transaction-record read-an-old-master-record choose-the-current-key while current-key NOT= high-values process-transactions-with-current-key choose-the-current-key end-while
The "read" routines are the same as above. Listed below are some other significant modules in the algorithm.
to choose-the-current-key if transaction-key < old-master-key then move transaction-key to current-key else move old-master-key to current-key end-if to process-transactions-with-current-key establish-current-master-record while transaction-key = current-key process-one-transaction end-while if master-record-established then write-new-master-file-from-current-record end-if to establish-current-master-record if old-master-key = current-key then move old-master-record to current-record move true to master-record-established read-an-old-master-record else move false to master-record-established end-if to process-one-transaction attempt-to-apply-the-transaction read-a-transaction-record
The "attempt-to-apply-the-transaction" module will deal with all transacations. It will also need to make sure that the transactions are actually valid (i.e., not adding an existing record and not changing or deleting a non-existing record).
Please, let me know if find problems with either of these algorithms or the discussion of them.