Successfully identify the need and scope of reusability. Create inventories of reusable objects with in a folder or shortcuts across folders (Local shortcuts) or shortcuts across repositories (Global shortcuts).
Successfully identify and create inventories of mappings based on business rules.
Prior to creating an inventory of reusable objects or shortcut objects, be sure to review the business requirements and look for any common routines and/or modules that may appear in more than one data movement. These common routines are excellent candidates for reusable objects or shortcut objects. In PowerCenter, these objects can be created as:
Please note that shortcuts are not supported for workflow level objects (Tasks).
Identify the need for reusable objects based on the following criteria:
Identify the Scope based on the following criteria:
Note: Shortcuts cannot be created for workflow objects.
Creating and testing common objects does not always save development time or facilitate future maintenance. For example, if a simple calculation like subtracting a current rate from a budget rate that is going to be used for two different mappings, carefully consider whether the effort to create, test, and document the common object is worthwhile. Often, it is simpler to add the calculation to both mappings. However, if the calculation were to be performed in a number of mappings, if it was very difficult, and if all occurrences would be updated following any change or fix, then the calculation would be an ideal case for a reusable object. When you add instances of a reusable transformation to mappings, be careful that the changes do not invalidate the mapping or generate unexpected data. The Designer stores each reusable transformation as metadata, separate from any mapping that uses the transformation.
The second criterion for a reusable object concerns the data that will pass through the reusable object. Developers often encounter situations where they may perform a certain type of high-level process (i.e., a filter, expression, or update strategy) in two or more mappings. For example, if you have several fact tables that require a series of dimension keys, you can create a mapplet containing a series of lookup transformations to find each dimension key. You can then use the mapplet in each fact table mapping, rather than recreating the same lookup logic in each mapping. This seems like a great candidate for a mapplet. However, after performing half of the mapplet work, the developers may realize that the actual data or ports passing through the high-level logic are totally different from case to case, thus making the use of a mapplet impractical. Consider whether there is a practical way to generalize the common logic so that it can be successfully applied to multiple cases. Remember, when creating a reusable object, the actual object will be replicated in one to many mappings. Thus, in each mapping using the mapplet or reusable transformation object, the same size and number of ports must pass into and out of the mapping/reusable object.
Document the list of the reusable objects that pass this criteria test, providing a high-level description of what each object will accomplish. The detailed design will occur in a future subtask, but at this point the intent is to identify the number and functionality of reusable objects that will be built for the project. Keep in mind that it will be impossible to identify one hundred percent of the reusable objects at this point; the goal here is to create an inventory of as many as possible, and hopefully the most difficult ones. The remainder will be discovered while building the data integration processes.
In some cases, we may have to read data from different sources and go through the same transformation logic and write the data to either one destination database or multiple destination databases. Also, sometimes, depending on the availability of the source, these loads have to be scheduled at different time. This case would be the ideal one to create a re-usable session and do Session overrides at the session instance level for the database connections/pre-session commands / post session commands.
Logging load statistics, failure criteria and success criteria are usually common pieces of code that would be executed for multiple loads in most Projects. Some of these common tasks include:
Re-usable worklets can be developed to encapsulate the above-mentioned tasks and can be used in multiple loads. By passing workflow variable values to the worklets and assign then to worklet variables, one can easily encapsulate common workflow logic.
A mapping is a set of source and target definitions linked by transformation objects that define the rules for data transformation. Mappings represent the data flow between sources and targets. In a simple world, a single source table would populate a single target table. However, in practice, this is usually not the case. Sometimes multiple sources of data need to be combined to create a target table, and sometimes a single source of data creates many target tables. The latter is especially true for mainframe data sources where COBOL OCCURS statements litter the landscape. In a typical warehouse or data mart model, each OCCURS statement decomposes to a separate table.
The goal here is to create an inventory of the mappings needed for the project. For this exercise, the challenge is to think in individual components of data movement. While the business may consider a fact table and its three related dimensions as a single object in the data mart or warehouse, five mappings may be needed to populate the corresponding star schema with data (i.e., one for each of the dimension tables and two for the fact table, each from a different source system).
Typically, when creating an inventory of mappings, the focus is on the target tables, with an assumption that each target table has its own mapping, or sometimes multiple mappings. While often true, if a single source of data populates multiple tables, this approach yields multiple mappings. Efficiencies can sometimes be realized by loading multiple tables from a single source. By simply focusing on the target tables, however, these efficiencies can be overlooked.
A more comprehensive approach to creating the inventory of mappings is to create a spreadsheet listing all of the target tables. Create a column with a number next to each target table. For each of the target tables, in another column, list the source file or table that will be used to populate the table. In the case of multiple source tables per target, create two rows for the target, each with the same number, and list the additional source(s) of data.
The table would look similar to the following:
Number |
Target Table |
Source |
1 |
Customers |
Cust_File |
2 |
Products |
Items |
3 |
Customer_Type |
Cust_File |
4 |
Orders_Item |
Tickets |
4 |
Orders_Item |
Ticket_Items |
When completed, the spreadsheet can be sorted either by target table or source table. Sorting by source table can help determine potential mappings that create multiple targets.
When using a source to populate multiple tables at once for efficiency, be sure to keep restartabilty and reloadability in mind. The mapping will always load two or more target tables from the source, so there will be no easy way to rerun a single table. In this example, potentially the Customers table and the Customer_Type tables can be loaded in the same mapping.
When merging targets into one mapping in this manner, give both targets the same number. Then, re-sort the spreadsheet by number. For the mappings with multiple sources or targets, merge the data back into a single row to generate the inventory of mappings, with each number representing a separate mapping.
The resulting inventory would look similar to the following:
Number |
Target Table |
Source |
1 |
Customers Customer_Type |
Cust_File |
2 |
Products |
Items |
4 |
Orders_Item |
Tickets Ticket_Items |
At this point, it is often helpful to record some additional information about each mapping to help with planning and maintenance.
First, give each mapping a name (apply naming standards). These names can then be used to distinguish mappings from one other and also can be put on the project plan as individual tasks.
Next, determine for the project a threshold for a high, medium, or low number of target rows. For example, in a warehouse where dimension tables are likely to number in the thousands and fact tables in the hundred thousands, the following thresholds might apply:
Assign a likely row volume (high, medium or low) to each of the mappings based on the expected volume of data to pass through the mapping. These high level estimates will help to determine how many mappings are of high volume; these mappings will be the first candidates for performance tuning.
Add any other columns of information that might be useful to capture about each mapping, such as a high-level description of the mapping functionality, resource (developer) assigned, initial estimate, actual completion time, or complexity rating.