ABSTRACT: This research describes and analyzes schemes for managing decision support databases that are extracted from a central database and "downloaded" to personal workstations. Unlike a (true) distributed database system, where updates are propagated to maintain consistency, these remote "snapshots" are updated only periodically ("refreshed") upon command of the remote workstation user. This approach to data management has many of the same advantages of a distributed database over a centralized database (e.g., reduced communication costs, improved response time for retrievals, and reduction in contention), but it avoids the high overhead for concurrency control associated with updating in a distributed database. The added cost is in reduced data consistency. The schemes analyzed include full regeneration, the scheme used by System R*, and two new schemes. One new scheme--called modified regeneration--is a variation on simple full regeneration of the snapshot, but transmits only relevant changes to the snapshot. The other new scheme uses a difference table of relevant updates. Algorithm descriptions, models of processing and communication costs, analytical and numerical comparison of performance, and qualitative evaluation are included. Our conclusions are that the difference-table approach is the most robust scheme; the System R* scheme has lowest cost for only rather limited environments; and the modified-regeneration scheme is attractive due to its simplicity and flexibility. The results and models presented here could be used by a DBMS "refresh optimizer" to determine the best scheme to employ as a function of refresh frequency, update rate, and various processing and communication cost parameters.
Key words and phrases: decision support systems, distributed databases, database snapshots, consistency in distributed databases