Derived Data Maintenance Architecture and Methodology Supported

The framework assumes the existence of one or more prime facts and dimensions that are maintained from external sources. Transformations and aggregations are encapsulated in defining views which in turn are materialized into physical tables. Two types of such derived structures are supported:

  • First tier data marts, aggregates or MQTs over the prime facts and dimensions.
  • Second tier MQTs or aggregations built on top of first tier objects.

Methods for maintaining the first tier of data marts and derived tables can be:

Full Refresh

Periodically refresh the target from the defining view over the source. This approach has the following characteristics

  • Simplicity
  • Robustness and reliability
  • Isolation from the mechanism that maintains the source. The process if fully autonomous and does not depend on knowledge or coordination with the source maintenance process
  • It is self-correcting (i.e. even if the target is inconsistent with the source prior to the maintenance operation it is consistent after its completion)
  • The disadvantage of this approach is its slow performance and lack of scalability.

Incremental pull (with full comparison)

This approach first compares the state of the target with the defining view over the source, and it only applies deletes, inserts and updates to the target. This approach has the following, advantages and disadvantages:

  • Medium complexity
  • Robustness and reliability
  • Isolation from the mechanism of source maintenance.
  • It is self correcting
  • It is relatively fast compared to the full refresh; however it still depends on a full comparison of target to defining view. The cost of this comparison is proportional to the size of the target.

Incremental delta push

This is also an incremental method; however the deltas are determined at the source and prepared as a side-effect of the ETL process maintaining the prime fact. As long as the delta set is true and complete, and it contains all differences then this method avoids the cost of comparison:

  • Higher complexity
  • Its correctness depends on the completeness of the externally defined delta set.
  • Coupling with the ETL process – the generation of the complete set of deltas is critical to the correctness of the process. Any changes in the ETL process, corrections of prime data, reruns and irregular conditions may cause incomplete sets of deltas to be generated.
  • Idempotent and restartable – the process can execute multiple times and can be restarted after failure without causing distortion or errors in the target. It is however not self-correcting, thus preserving preexisting inconsistencies.
  • It is very fast and scalable as the cost of maintenance is proportional to the size of the changes; not the source or the target.

The recommended approach for new projects is start with self-correcting processes i.e. full refresh or incremental pull with full comparison and delay the implementation of incremental push methodology until ETL processes are stable and operations have matured. Then incremental push methodology will offer best performance and by shortening its frequency can achieve near-time latency. Maintenance of the second tier MQTs, defined on top of the first tier tables, can be delegated to DB2 in the following circumstances:

  1. Replicated dimensions (replicated MQTs)
  2. MQTs that require no recasting
  3. MQTs where the impact of automatic maintenance to their based-on tables is tolerable

User-maintained techniques, supported by more complex templates are appropriate in the following circumstances:

  1. MQTs that require recasting and whose DB2 automatic maintenance overhead is not acceptable.
  2. MQTs on insert-only facts, VDMGEN-Release 1 templates provide the most efficient daily maintenance. If based on retroactively updateable facts, however, use release 2.2 incremental pull or full refresh templates.
  3. MQTs that cannot be defined as system-maintained use user-maintained full refresh or incremental templates.

Custom templates may be needed for cases that fall outside these categories, e.g. MQTs over large volatile facts that are not convertible to insert-only, and have to incur retroactive updates and deletes, and whose automatic maintenance impact to based-on tables is unacceptable require custom templates.

VDM Access: