Balanced Configuration Unit (BCU)

BCU Overview

The BCU provides a scalable performance ratio of disk I/O to memory to CPU to network. A balanced configuration minimizes bottlenecks that limit the overall performance of a data warehouse. Of course, it is important to remember that bottlenecks exist in all computer systems. For example, if a server can drive a peak I/O capability of 3 GB per second yet it is attached to a storage subsystem that is only capable of half that throughput, the server might be up to 50% under-utilized when the workload on the system require table scans. When attempting to balance a system it is important to consider the intended workload. If the applications demand high number of concurrent users or use complex queries, and result in a more CPU intensive than I/O intensive workload, a higher CPU-to-database-partition ratio might be the correct balanced configuration. It is also true, that configuring storage so that it is capable of supporting the full I/O demand possible from a server might be expensive, and seldom required. In that case, the I/O capability may be reduced resulting in cost savings. The standard BCUs recommended by IBM use a wide spectrum of workload requirements and are based on the characteristics observed from many IBM data warehouse customers. BCUs are not monolithic, and I/O, memory and CPU proportions can, to some extent, be adjusted depending on type of workload and concurrency requirements. Although these proportions can be tweaked, drastic departure from the standard inevitably results in bottlenecks and costly underutilization of resources. The system is sized by arriving at the appropriate number of BCUs based on active data size, type of workload and concurrency. This leads to configurations that are likely to perform well and do not waste resources. As an example consider the standard BCU 7000.2 offered by IBM(c) for enterprise warehouses, as defined in 2007, based on the following specification:

Processor and Memory Capacity

The CPU and memory requirements for the BCU are met by ...

  • On 2008 technology, one server with eight processors P5 @ 2.2 GHz with 32 GB memory,
  • On 2010 technology, half a server with eight processors P6 @ 5 GHz with 64 GB memory, which represents an upgrade in CPU capability
  • On 2012 technology, appliance-based solutions are based on P7 servers with 384 GB memory, and 48 cores handling a much denser partition configuration in terms of CPU capability

Total Storage 12 TB

This is the storage accessible by each BCU and includes not only the active data being queried, but also inactive data, temporary tables, system space etc. Later P7-based systems using one to four cores per partition can support 24 to 48 or even more disk storage per server.

Active Data - 3 to 4 TB after compression

- this is the size of data that is actually queried. It does not include system and temporary data, passive data used during loading, archived history, or other tables that are rarely or never accessed. Active data size is a good predictor for I/O required. More recent P7-48-core based configurations push this to 24 and higher limits.

I/O Throughput of 1500 MB/sec (32 Gbps)

This rate must be sustainable with the cache saturated. It roughly corresponds to 16 Gbps, or 4x4Gbps FC ports fully utilized. This throughput must be supported by all the components in the path from server to disks, including the host adapters on the servers, the SAN, the storage system adapters and controllers. As the number of BCUs increase, then controllers may become bottlenecks and storage must be spread over multiple units. A lower end BCU 7000.1 was defined with 750 GB/sec throughput and was intended to support departmental marts and simple workloads - e.g. call centers. A number of early installations have been built on this specification, but it is inappropriate for enterprise level data warehouses. For data mining and higher demand workloads, more recent BCU standards provide for higher I/O throughput for comparable CPU capability. It is important to mention that I/O throughput in the evolution of technology is improving over time at a lesser pace than CPU capacity, so other options such as optical local disk cache are introduced in recent BCU models to accommodate the widening gap.

I/O transfer rate - 50000 IOPS (IO transfers per second)

This is a different way of measuring I/O capability, and is appropriate for transactional activity where the amount of data moved with each request is small and the limits of the I/O capability depend on the number of transfers, not the size. For data warehouse workloads with large amounts of data as the norm the IOPS limits imposed by the storage systems are less likely to be a bottleneck. Consequently the dominant measure of I/O capability is throughput. Other reporting work loads with more complex, less well-clustered and smallest higher performance requests, as those supporting decision support or reporting systems can benefit from lower latency and higher IOPS - however the cost of reducing the IOPS is also moving at slower pace in the industry. Look for Infiniband and New Ethernet protocols addressing some of this shortcomings.

I/O Discussion

The most stringent requirement for configuring the I/O subsystem is the sustained throughput. The "share nothing" requirement already imposes the use of a large number of expensive, small, fast drives to spread the work so that there are no bottlenecks. These disks, can be packed onto dedicated fully loaded storage units. Each such unit typically has two I/O controllers and adequate ports to sustain a certain level of I/O capability. This capability is usually limited and in order to achieve the specified aggregate capability we need to use multiple storage units, each with its own controllers, ports etc. among which the disks must be divided. This leads to a more expensive solution because units are partially filled with disk. In order to satisfy the minimum 1500 GB/sec for 10 BCUs (5 servers) we need to ensure that the storage units can sustain an aggregate of 15 GB/sec throughput. We can scale the system by assigning two servers (4 BCUs) to one I/O storage unit as shown in the figure below, if each storage unit can sustain 6-8 GB/sec. If the limit is 3-4 GB/sec then each server must have its own storage unit. An in-between solution (3 servers to 2 units) is also possible but more complex to grow. BCU: Two servers per Storage Unit example It is important to point out that the 1500 GB/sec used here is according to the 2007 definition of a BCU and should be viewed as conservative for the 2010 P6 servers.

Quantifying the Risks

If the disk configuration is such that it limits the I/O capability of the system, the cost of repairing the configuration and adding the appropriate bandwidth will cause disruption and rework. By adopting a comfortable ratio of servers to storage units, we run less risk of having to correct the situation in the future. On the other hand, if we actually don't reach the utilization assumed by the BCU specification, we may have overpaid for a capability until we add the next application or expansion of the Data Warehouse, since at that time, we will be able to extend the units to a cost effective level based on actual measurements. How much more is it going to cost? It depends on how we can break-up the frames and sustain the minimum required throughput. In an instance I am aware of, this difference ranged from as low as $100K or reach over $250K.

AttachmentSize
BCU_io_SAN.jpg92.85 KB
VDM Access: