Following a rigorous methodology is key to delivering customer satisfaction and expanding analytics use cases across the business.
The optimal planned configuration of database servers to support the operational workloads requires sizing. Database sizing involves estimating the types and sizes of the components of a data architecture to produce a sizing document. An effective sizing activity involves a number of different roles including data architects, database administrators, and business analysts. Database administrators may also require size estimates for database capacity planning.
The first step in database sizing is to review the system requirements to define:
One way to estimate projections of data growth over time is to use scenario analysis. As an example, for scenario analysis of a sales tracking data mart a number of sales transactions to be stored can be used as the basis for the sizing estimate. In the first year, 10 million sales transactions are expected; this equates to 10 million fact-table records.
Next, use the sales growth forecasts for the upcoming years for database growth calculations. That is, an annual sales growth rate of 10 percent translates into 11 million fact table records for the next year. At the end of five years, the fact table is likely to contain about 60 million records. Calculate other estimates based on five-percent annual sales growth (case 1) and 20-percent annual sales growth (case 2). Multiple projections for best and worst case scenarios can be very helpful.
Consider that growth may not be a linear percentage. Looking at Fibonacci or exponential growth number sets for a period of time may help plan and cost some aspects of data growth.
Oracle (10g and onwards) provides a mechanism to predict the growth of a database. This feature is useful for predicting table space requirements.
Oracle incorporates a table space prediction model in the database engine that provides projected statistics for space used by a table. The following Oracle 10g query returns projected space usage statistics for tables with data:
The QUALITY column indicates the quality of the output as follows:
Next, use the physical data models for the sources and the target architecture to develop a baseline sizing estimate. The administration guides for most databases contain sizing guidelines for the various database structures such as tables, indexes, sort space, data files, log files, and database cache.
Develop a detailed sizing using a worksheet inventory of the tables and indexes from the physical data model, along with field data types and field sizes. Various database products use different storage methods for data types. For this reason, be sure to use the database manuals to determine the size of each data type. Add up the field sizes to determine row size. Then use the data volume projections to determine the number of rows to multiply by the table size.
The default estimate for index size is to assume same size as the table size. Also estimate the temporary space for sort operations. For data warehouse applications where summarizations are common, plan on large temporary spaces. The temporary space can be as much as 1.5 times larger than the largest table in the database.
Another approach that is useful is to load the data architecture with representative data and determine the resulting database sizes. This test load can be a fraction of the actual data and is used only to gather basic sizing statistics. Then need to apply growth projections to these statistics. For example, after loading ten thousand sample records to the fact table, size is determined to be 10MB. Based on the scenario analysis, expect this fact table to contain 60 million records after five years. So, the estimated size for the fact table is about 60GB [i.e., 10 MB * (60,000,000/10,000)]. Do not forget to add indexes and summary tables to the calculations.
When there is not enough information to calculate an estimate as described above, use educated guesses and rules of thumb to develop as reasonable an estimate as possible.
It is very often the case that there is much more data than expected. Therefore Informatica recommends adding a reasonable margin of safety to the calculations. From a risk perspective it is better to over purchase some storage, especially if it is of the inexpensive kind, than to find the project is held up later and additional hardware related costs are incurred as storage has to again be acquired.
‘Database Sizing Model’ calculator is contained in ‘Database Capacity Planning’ accelerator.
Success
Link Copied to Clipboard