Tuning Mappings for Better Performance

Success

Manage your Success Plans and Engagements, gain key insights into your implementation journey, and collaborate with your CSMs

Success

Success Accelerators

Accelerate your Purchase to Value engaging with Informatica Architects for Customer Success

My Engagements

All your Engagements at one place
Communities

A collaborative platform to connect and grow with like-minded Informaticans across the globe

Communities

Product Communities

Connect and collaborate with Informatica experts and champions

Discussions

Have a question? Start a Discussion and get immediate answers you are looking for

User Groups

Customer-organized groups that meet online and in-person. Join today to network, share ideas, and get tips on how to get the most out of Informatica

Get Started

Community Guidelines
Knowledge Center

Troubleshooting documents, product guides, how to videos, best practices, and more

Knowledge Center

Knowledge Base

One-stop self-service portal for solutions, FAQs, Whitepapers, How Tos, Videos, and more

Support TV

Video channel for step-by-step instructions to use our products, best practices, troubleshooting tips, and much more

Documentation

Information library of the latest product documents

Velocity (Best Practices)

Best practices and use cases from the Implementation team
Learn

Rich resources to help you leverage full capabilities of our products

Learn

Trainings

Role-based training programs for the best ROI

Certifications

Get certified on Informatica products. Free, Foundation, or Professional

Product Learning Paths

Free and unlimited modules based on your expertise level and journey

Experience Lounge

Self-guided, intuitive experience platform for outcome-focused product capabilities and use cases
Resources

Library of content to help you leverage the best of Informatica products

Resources

Tech Tuesdays Webinars

Most popular webinars on product architecture, best practices, and more

Product Availability Matrix

Product Availability Matrix statements of Informatica products

SupportFlash

Monthly support newsletter

Support Documents

Informatica Support Guide and Statements, Quick Start Guides, and Cloud Product Description Schedule

Product Lifecycle

End of Life statements of Informatica products

Ideas

Events

Change Request Tracking

Marketplace

| Sign up

Velocity
Strategy

Strategy

Data Strategy

Centers of Excellence

Enterprise Data Governance

Enterprise Architecture

Program & Change Management
Solutions

Solutions

Cloud Data Warehouse & Data Lake

Data Lake

Data Warehouse Modernization

Analytics Modernization

Application Integration

360 Engagement

Multidomain MDM

Customer 360 SaaS

Product 360

Supplier 360

Reference 360

Data Governance & Privacy

Data Catalog & Metadata Management

Data Privacy

Regulatory Compliance

Data Quality

Data Access and Provisioning
Stages

Stages

Cloud Data Warehouse & Data Lake

360 Engagement

Data Governance & Privacy

Following a rigorous methodology is key to delivering customer satisfaction and expanding analytics use cases across the business.
More
- Success
  
  Manage your Success Plans and Engagements, gain key insights into your implementation journey, and collaborate with your CSMs
  
  Manage your Success Plans and Engagements, gain key insights into your implementation journey, and collaborate with your CSMs
  
  Success Accelerators
  
  Accelerate your Purchase to Value engaging with Informatica Architects for Customer Success
  
  My Engagements
  
  All your Engagements at one place
- Communities
  
  A collaborative platform to connect and grow with like-minded Informaticans across the globe
  
  A collaborative platform to connect and grow with like-minded Informaticans across the globe
  
  Product Communities
  
  Connect and collaborate with Informatica experts and champions
  
  Discussions
  
  Have a question? Start a Discussion and get immediate answers you are looking for
  
  User Groups
  
  Customer-organized groups that meet online and in-person. Join today to network, share ideas, and get tips on how to get the most out of Informatica
  
  Get Started
  
  Community Guidelines
- Knowledge Center
  
  Troubleshooting documents, product guides, how to videos, best practices, and more
  
  Troubleshooting documents, product guides, how to videos, best practices, and more
  
  Knowledge Base
  
  One-stop self-service portal for solutions, FAQs, Whitepapers, How Tos, Videos, and more
  
  Support TV
  
  Video channel for step-by-step instructions to use our products, best practices, troubleshooting tips, and much more
  
  Documentation
  
  Information library of the latest product documents
  
  Velocity (Best Practices)
  
  Best practices and use cases from the Implementation team
- Learn
  
  Rich resources to help you leverage full capabilities of our products
  
  Rich resources to help you leverage full capabilities of our products
  
  Trainings
  
  Role-based training programs for the best ROI
  
  Certifications
  
  Get certified on Informatica products. Free, Foundation, or Professional
  
  Product Learning Paths
  
  Free and unlimited modules based on your expertise level and journey
  
  Experience Lounge
  
  Self-guided, intuitive experience platform for outcome-focused product capabilities and use cases
- Resources
  
  Library of content to help you leverage the best of Informatica products
  
  Library of content to help you leverage the best of Informatica products
  
  Tech Tuesdays Webinars
  
  Most popular webinars on product architecture, best practices, and more
  
  Product Availability Matrix
  
  Product Availability Matrix statements of Informatica products
  
  SupportFlash
  
  Monthly support newsletter
  
  Support Documents
  
  Informatica Support Guide and Statements, Quick Start Guides, and Cloud Product Description Schedule
  
  Product Lifecycle
  
  End of Life statements of Informatica products
  
  Ideas
  
  Events
  
  Change Request Tracking
  
  Marketplace

Last Updated Date Mar 22, 2022 |

Stages Cloud Data Warehouse & Data Lake Best Practice

Challenge

In general, mapping-level optimization takes time to implement, but can significantly boost performance. Sometimes the mapping is the biggest bottleneck in the load process because business rules determine the number and complexity of transformations in a mapping.

Before deciding on the best route to optimize the mapping architecture some basic issues need to be resolved. Tuning mappings is a grouped approach. The first group can be of assistance almost universally, bringing about a performance increase in all scenarios. The second group of tuning processes may yield only small performance increase, or can be of significant value, depending on the situation.

Some factors to consider when choosing tuning processes at the mapping level include the specific environment, software/ hardware limitations, and the number of rows going through a mapping. This Best Practice offers some guidelines for tuning mappings.

Description

In addition to analyzing mappings for tuning, remember to tune the target and source for peak performance. To optimize mappings by reducing the number of transformations required, reduce the number of rows of data that must be processed as early in the mapping as possible. In addition to deleting unnecessary links between transformations ensure that transformations are performed in an optimal order to reduce I/O requirements.

There are other techniques, typically implemented during the design and build of the mapping to optimize performance from the start. These are items that should be reviewed when performance tuning. Tuning involves isolating the cause of a bottleneck, and then correcting the issue, if possible. Be cautious regarding techniques that change the logic in the mapping, as this could have unintended consequences.

Aggregators

For transformations that use data cache (such as Aggregator, Joiner, Rank, and Lookup transformations), limit connected input/output or output ports. Doing so can reduce the amount of data the transformations store in the data cache. Having too many Lookups and Aggregators can encumber performance because each requires index cache and data cache. Since both are fighting for memory space, decreasing the number of these transformations in a mapping can help improve speed. Splitting them up into different mappings is another option, as is adding memory to the session or task and to the individual transformations, which generally is a big boost to performance.

Limit the number of Aggregators in a mapping. A high number of Aggregators can increase I/O activity on the cache directory. Unless the seek/access time is fast on the directory itself, having too many Aggregators can cause a bottleneck. Similarly, too many Lookups in a mapping causes contention of disk and memory, which can lead to thrashing, leaving insufficient memory to run a mapping efficiently. Be sure to use Sorters before the Aggregator or Joiner to improve the performance.

Consider Single-Pass Reading

If several mappings use the same data source, consider a single-pass reading, where a single mapping is used to read the data rather than several mappings. If there are several sessions that use the same sources, consolidate the separate mappings with either a single Source Qualifier Transformation or one set of Source Qualifier Transformations as the data source for the separate data flows.

Similarly, if a function is used in several mappings, a single-pass reading reduces the number of times that function is called in the session. For example, to subtract percentage from the PRICE ports for both the Aggregator and Rank transformations, work can be minimized by subtracting the percentage before splitting the pipeline.

Optimize SQL Overrides

When SQL overrides are required in a Source Qualifier, Lookup Transformation, or in the Update Override of a target object, be sure the SQL statement is tuned. The extent to which and how SQL can be tuned depends on the underlying source or target database system.

Scrutinize Datatype Conversions

PowerCenter Server automatically makes conversions between compatible datatypes. When these conversions are performed unnecessarily, performance slows. For example, if a mapping moves data from an integer port to a decimal port, then back to an integer port, the conversion may be unnecessary.

In some instances, datatype conversions can help improve performance. This is especially true when integer values are used in place of other data types for performing comparisons using Lookup and Filter transformations.

Eliminate Transformation Errors

Large numbers of evaluation errors significantly slow performance of the PowerCenter Server. During transformation errors, the PowerCenter Server engine pauses to determine the cause of the error, removes the row causing the error from the data flow, and logs the error in the session log. Review the session log and correct the errors that are flagged.

Transformation errors can be caused by many things including: conversion errors, conflicting mapping logic, any condition that is specifically set up as an error, and so on. The session log can help point out the cause of these errors. If errors recur consistently for certain transformations, re-evaluate the constraints for these transformations. If a session is needed that generates many transformation errors, performance may be improved by setting a lower tracing level. However, this is not a long-term response to transformation errors. Any source of errors should be traced and eliminated.

Optimize Memory Settings

The session log is a treasure trove of information for performance tuning. In addition to transformation errors, there is information on memory used for lookups, sorting, joining, aggregations, etc. If messages indicate 2-pass sorting (or more) or that the algorithms used for aggregating is not sort-merge, then review the logic in the mapping to add sorters and to add memory to the transformations or to the Auto Memory settings for the session overall.

Optimize Lookup Transformations

There are a several ways to optimize lookup transformations that are set up in a mapping.

When to Cache Lookups

Cache small lookup tables. When caching is enabled, the PowerCenter Server caches the lookup table and queries the lookup cache during the session. When this option is not enabled, the PowerCenter Server queries the lookup table on a row-by-row basis.

Note: The tuning options in this Best Practice assume that memory and cache sizing for lookups are sufficient to ensure that caches will not page to disk.

A better rule of thumb than memory size, is to determine the size of the potential lookup cache by the number of rows expected to be processed. For example, consider the following example.

In Mapping X, the source and lookup contain the following number of records:

ITEMS (source):	5000 records
MANUFACTURER:	200 records
DIM_ITEMS:	100000 records

Number of Disk Reads

	Cached Lookup	Un-cached Lookup
LKP_Manufacturer
Build Cache	200	0
Read Source Records	5000	5000
Execute Lookup	0	5000
Total # of Disk Reads	5200	10000
LKP_DIM_ITEMS
Build Cache	100000	0
Read Source Records	5000	5000
Execute Lookup	0	5000
Total # of Disk Reads	105000	10000

Consider the case where MANUFACTURER is the lookup table. If the lookup table is cached, it will take a total of 5200 disk reads to build the cache and execute the lookup. If the lookup table is not cached, then it will take a total of 10,000 total disk reads to execute the lookup. In this case, the number of records in the lookup table is small in comparison with the number of times the lookup is executed. This lookup should be cached. This is the more likely scenario.

Consider the case where DIM_ITEMS is the lookup table. If the lookup table is cached, it will result in 105,000 total disk reads to build and execute the lookup. If the lookup table is not cached, then the disk reads would total 10,000. In this case the number of records in the lookup table is not small in comparison with the number of times the lookup will be executed. Thus, the lookup should not be cached.

Use the following eight step method to determine if a lookup should be cached:

Code the lookup into the mapping.
Select a standard set of data from the source. For example, add a "where" clause on a relational source to load a sample 10,000 rows.
Run the mapping with caching turned off and save the log.
Run the mapping with caching turned on and save the log to a different name than the log created in step 3.
Look in the cached lookup log and determine how long it takes to cache the lookup object. Note this time in seconds: LOOKUP TIME IN SECONDS = LS.
In the non-cached log, take the time from the last lookup cache to the end of the load in seconds and divide it into the number or rows being processed: NON-CACHED ROWS PER SECOND = NRS.
In the cached log, take the time from the last lookup cache to the end of the load in seconds and divide it into number or rows being processed: CACHED ROWS PER SECOND = CRS.
Use the following formula to find the breakeven row point:

(LS*NRS*CRS)/(CRS-NRS) = X

Where X is the breakeven point. If the expected source record is less than X, it is better to not cache the lookup. If the expected source record is more than X, it is better to cache the lookup.

For example:

Assume the lookup takes 166 seconds to cache (LS=166).
Assume with a cached lookup the load is 232 rows per second (CRS=232).
Assume with a non-cached lookup the load is 147 rows per second (NRS = 147).

The formula would result in: (166*147*232)/(232-147) = 66,603.

Thus, if the source has less than 66,603 records, the lookup should not be cached. If it has more than 66,603 records, then the lookup should be cached.

Sharing Lookup Caches

There are several methods for sharing lookup caches:

Within a specific session run for a mapping, if the same lookup is used multiple times in a mapping, the PowerCenter Server will re-use the cache for the multiple instances of the lookup. The lookup must be the same in order to reuse the cache. It must return the same data and have the same lookup columns. There may be a need to return additional data in the lookup in order to satisfy all the various lookup needs from this table, but it will improve performance over creating multiple sets of lookup cache with different subsets of columns.
Across sessions of the same mapping, the use of an unnamed persistent cache allows multiple runs to use an existing cache file stored on the PowerCenter Server. If the option of creating a persistent cache is set in the lookup properties, the memory cache created for the lookup during the initial run is saved to the PowerCenter Server. This can improve performance because the Server builds the memory cache from cache files instead of the database. This feature should only be used when the lookup table is not expected to change between session runs.
Across different mappings and sessions, the use of a named persistent cache allows sharing an existing cache file.

Reducing the Number of Cached Rows

There is an option to use a SQL override in the creation of a lookup cache. Options can be added to the WHERE clause to reduce the set of records included in the resulting cache.

Optimizing the Lookup Condition

In the case where a lookup uses more than one lookup condition, set the conditions with an equal sign first in order to optimize lookup performance.

Indexing the Lookup Table

The PowerCenter Server must query, sort, and compare values in the lookup condition columns. As a result, indexes on the database table should include every column used in a lookup condition. This can improve performance for both cached and un-cached lookups.

In the case of a cached lookup, an ORDER BY condition is issued in the SQL statement used to create the cache. Columns used in the ORDER BY condition should be indexed. The session log will contain the ORDER BY statement.
In the case of an un-cached lookup, since a SQL statement is created for each row passing into the lookup transformation, performance can be helped by indexing columns in the lookup condition.

Use a Persistent Lookup Cache for Static Lookups

If the lookup source does not change between sessions, configure the Lookup transformation to use a persistent lookup cache. The PowerCenter Server then saves and reuses cache files from session to session, eliminating the time required to read the lookup source.

Optimize Filter and Router Transformations

Filtering data as early as possible in the data flow improves the efficiency of a mapping. Instead of using a Filter Transformation to remove a sizeable number of rows in the middle or end of a mapping, use a filter on the Source Qualifier or a Filter Transformation immediately after the source qualifier to improve performance.

Avoid complex expressions when creating the filter condition. Filter transformations are most effective when a simple integer or TRUE/FALSE expression is used in the filter condition.

Filters or routers should also be used to drop rejected rows from an Update Strategy transformation if rejected rows do not need to be saved.

Replace multiple filter transformations with a router transformation. This reduces the number of transformations in the mapping and makes the mapping easier to follow.

Optimize Aggregator Transformations

Aggregator Transformations often slow performance because they must group data before processing it.

Use simple columns in the group by condition to make the Aggregator Transformation more efficient. When possible, use numbers instead of strings or dates in the GROUP BY columns. Also avoid complex expressions in the Aggregator expressions, especially in GROUP BY ports.

Use the Sorted Input option in the Aggregator. This option requires that data sent to the Aggregator be sorted in the order in which the ports are used in the Aggregator's group by. The Sorted Input option decreases the use of aggregate caches. When it is used, the PowerCenter Server assumes all data is sorted by group and, as a group is passed through an Aggregator, calculations can be performed, and information passed on to the next transformation. Without sorted input, the server must wait for all rows of data before processing aggregate calculations. Use of the Sorted Inputs option is usually accompanied by a Source Qualifier which uses the Number of Sorted Ports option.

Use incremental aggregation when capturing changes from the source that change less than half the target. When using incremental aggregation, apply captured changes in the source to aggregate calculations in a session. The PowerCenter Server updates the target incrementally, rather than processing the entire source and recalculating the same calculations every time the session runs.

Joiner Transformation

Joining Data from the Same Source

Data from the same source can be joined in the following ways:

Join two branches of the same pipeline.
Create two instances of the same source and join pipelines from these source instances.

There may be a need to join data from the same source in order to perform a calculation on part of the data and join the transformed data with the original data. When the data is joined using this method, the original data can be maintained and parts of that data can be transformed within one mapping.

When joining data from the same source, two branches of the pipeline can be created. When branching a pipeline, a transformation must be added between the Source Qualifier and the Joiner transformation in at least one branch of the pipeline. Sorted data must be joined and configure the Joiner transformation for sorted input.

To join unsorted data, create two instances of the same source and join the pipelines.

For example, for a source with the following ports:

Employee
Department
Total Sales

To be able to view in the target table the employees who generated sales that were greater than the average sales for their respective departments, create a mapping with the following transformations:

Sorter transformation. Sort the data.
Sorted Aggregator transformation. Average the sales data and group by department. When this aggregation is performed, the data for individual employees is lost. To maintain employee data, pass a branch of the pipeline to the Aggregator transformation and pass a branch with the same data to the Joiner transformation to maintain the original data. When both branches of the pipeline are joined, the aggregated data is joined with the original data.
Sorted Joiner transformation. Use a sorted Joiner transformation to join the sorted aggregated data with the original data.
Filter transformation. Compare the average sales data against sales data for each employee and filter out employees with less than above average sales.

Note: Data from output groups of the same transformation can also be joined such as the Custom transformation or XML Source Qualifier transformations. Place a Sorter transformation between each output group and the Joiner transformation and configure the Joiner transformation to receive sorted input.

Joining two branches can affect performance if the Joiner transformation receives data from one branch much later than the other branch. The Joiner transformation caches all the data from the first branch and writes the cache to disk if the cache fills. The Joiner transformation must then read the data from disk when it receives the data from the second branch. This can slow processing.

Same source data can also be joined by creating a second instance of the source. After the second source instance is created, join the pipelines from the two source instances.

Note: When joining data using this method, the PowerCenter Server reads the source data for each source instance, so performance can be slower than joining two branches of a pipeline.

Use the following guidelines when deciding whether to join branches of a pipeline or join two instances of a source:

Join two branches of a pipeline when there is a large source or if the source data can only be read once. For example, source data from a message queue can only be read once.
Join two branches of a pipeline when using sorted data. If the source data is unsorted and a Sorter transformation is used to sort the data, branch the pipeline after you sort the data.
Join two instances of a source to add a blocking transformation to the pipeline between the source and the Joiner transformation.
Join two instances of a source if one pipeline may process much more slowly than the other pipeline.

Performance Tips

Use the database to do the join when sourcing data from the same database schema. Database systems usually can perform the join more quickly than the PowerCenter Server, a join condition should be used when joining multiple tables from the same database schema. Note: Avoid using SQL Overrides, even though they can do some of the same joins, etc., as they hide metadata and might be difficult to maintain.

Use Normal joins whenever possible. Normal joins are faster than outer joins and the resulting set of data is also smaller.

Join sorted data when possible. Session performance can be improved by configuring the Joiner transformation to use sorted input. When configuring the Joiner transformation to use sorted data, the PowerCenter Server improves performance by minimizing disk input and output. The greatest performance improvement is seen when working with large data sets.

For the Joiner transformation, always designate as the master source the source with fewer rows. For optimal performance and disk storage, designate the master source as the source with the fewer rows. During a session, the Joiner transformation compares each row of the master source against the detail source. The fewer unique rows in the master, the fewer iterations of the join comparison occur, which speeds the join process.

For a sorted Joiner transformation, designate as the master source the source with fewer duplicate key values. For optimal performance and disk storage, designate the master source as the source with fewer duplicate key values. When the PowerCenter Server processes a sorted Joiner transformation, it caches rows for one hundred keys at a time. If the master source contains many rows with the same key value, the PowerCenter Server must cache more rows, and performance can be slowed.

Optimizing sorted joiner transformations with partitions. When using partitions with a sorted Joiner transformation, optimize performance by grouping data and using n:n partitions.

Add a hash auto-keys partition upstream of the sort origin. To obtain expected results and get best performance when partitioning a sorted Joiner transformation, data must be grouped and sorted. To group data, ensure that rows with the same key value are routed to the same partition. The best way to ensure that data is grouped and distributed evenly among partitions is to add a hash auto-keys or key-range partition point before the sort origin. Placing the partition point before sorting the data ensures that grouping is maintained and that data is sorted within each group.

Use n:n partitions. Performance may be improved for a sorted Joiner transformation by using n:n partitions. When n:n partitions are used, the Joiner transformation reads master and detail rows concurrently and does not need to cache all the master data. This reduces memory usage and speeds processing. When 1:n partitions are used, the Joiner transformation caches all the data from the master pipeline and writes the cache to disk if the memory cache fills. When the Joiner transformation receives the data from the detail pipeline, it must then read the data from disk to compare the master and detail pipelines.

Optimize Sequence Generator Transformations. Sequence Generator transformations need to determine the next available sequence number; thus, increasing the “Number of Cached Values” property can increase performance. This property determines the number of values the PowerCenter Server caches at one time. If it is set to cache no values, then the PowerCenter Server must query the repository each time to determine the next number to be used. Consider configuring the Number of Cached Values to a value greater than 1000. Note that any cached values not used in a session are lost since the sequence generator value in the repository is set when it is called next time to give the next set of cache values. While this seems trivial, not setting the value high enough for the needs of the mapping can significantly slow performance.

Field-Level Transformation Optimization

As a final step in the tuning process, expressions used in transformations can be tuned. When examining expressions, focus on complex expressions and try to simplify them when possible.

To help isolate slow expressions, do the following:

Time the session with the original expression.
Copy the mapping and replace half the complex expressions with a constant.
Run and time the edited session.
Make another copy of the mapping and replace the other half of the complex expressions with a constant.
Run and time the edited session.

Processing field level transformations takes time. If the transformation expressions are complex, then processing is even slower. Likely candidates for optimization are the fields with the most complex expressions. Keep in mind that there may be more than one field causing performance problems.

Factoring Out Common Logic

Factoring out common logic can reduce the number of times a mapping performs the same logic. If a mapping performs the same logic multiple times, moving the task upstream in the mapping may allow the logic to be performed just once. For example, a mapping has five target tables. Each target requires a Social Security Number lookup. Instead of performing the lookup right before each target, move the lookup to a position before the data flow splits.

Similarly, within an Expression transformation, use a variable to hold the results of expression logic that is used multiple times in the expression, rather than computing the results each time.

Minimize Function Calls

Anytime a function is called it takes resources to process. There are several common examples where function calls can be reduced or eliminated.

Aggregate function calls can sometime be reduced. In the case of each aggregate function call, the PowerCenter Server must search and group the data. Thus, the following expression:

SUM(Column A) + SUM(Column B)

Can be optimized to:

SUM(Column A + Column B)

In general, operators are faster than functions, so operators should be used whenever possible. For example, for an expression that involves a CONCAT function such as:

CONCAT(CONCAT(FIRST_NAME, ), LAST_NAME)

It can be optimized to:

FIRST_NAME || LAST_NAME

Remember that IIF() is a function that returns a value, not just a logical test. This allows many logical statements to be written in a more compact fashion. For example:

IIF(FLG_A=Y and FLG_B=Y and FLG_C= Y, VAL_A+VAL_B+VAL_C,
IIF(FLG_A=Y and FLG_B=Y and FLG_C= N, VAL_A+VAL_B,
IIF(FLG_A=Y and FLG_B=N and FLG_C= Y, VAL_A+VAL_C,
   IIF(FLG_A=Y and FLG_B=N and FLG_C= N, VAL_A,
    IIF(FLG_A=N and FLG_B=Y and FLG_C= Y, VAL_B+VAL_C,
      IIF(FLG_A=N and FLG_B=Y and FLG_C= N, VAL_B,
       IIF(FLG_A=N and FLG_B=N and FLG_C= Y, VAL_C,
        IIF(FLG_A=N and FLG_B=N and FLG_C= N, 0.0))))))))

Can be optimized to:

IIF(FLG_A=Y, VAL_A, 0.0) + IIF(FLG_B=Y, VAL_B, 0.0) + IIF(FLG_C= Y, VAL_C, 0.0)'

The original expression had 8 IIFs, 16 ANDs and 24 comparisons. The optimized expression results in three IIFs, three comparisons, and two additions.

Be creative in making expressions more efficient. The following is an example of rework of an expression that eliminates three comparisons down to one:

IIF(X=1 OR X=5 OR X=9, 'yes', 'no')

Can be optimized to:

IIF(MOD(X, 4) = 1, 'yes', 'no')

Use of the DECODE function is more efficient than the IIF function, as the IIF analyzes the results of every part of the statement, even if the logic shows a result. The DECODE stops as soon as a condition is satisfied.

Calculate Once, Use Many Times

Avoid calculating or testing the same value multiple times. If the same sub-expression is used several times in a transformation, consider making the sub-expression a local variable. The local variable can be used only within the transformation in which it was created. Calculating the variable only once and then referencing the variable in following sub-expressions improves performance.

Choose Numeric vs. String Operations

The PowerCenter Server processes numeric operations faster than string operations. For example, if a lookup is performed on a large amount of data on two columns, EMPLOYEE_NAME and EMPLOYEE_ID, configuring the lookup around EMPLOYEE_ID improves performance.

Optimizing Char-Char and Char-Varchar Comparisons

When the PowerCenter Server performs comparisons between CHAR and VARCHAR columns, it slows each time it finds trailing blank spaces in the row. To resolve this, treat CHAR as the CHAR On Read option in the PowerCenter Server setup so that the server does not trim trailing spaces from the end of CHAR source fields.

Reduce the Number of Transformations in a Mapping

Because there is always overhead involved in moving data among transformations, try, whenever possible, to reduce the number of transformations. Also, resolve unnecessary links between transformations to minimize the amount of data moved. This is especially important with data being pulled from the Source Qualifier Transformation.

Use Pre- and Post-Session SQL Commands

Pre- and post-session SQL commands can be specified in the Properties tab of the Source Qualifier transformation and in the Properties tab of the target instance in a mapping. To increase the load speed, use these commands to drop indexes on the target before the session runs, then recreate them when the session completes.

Apply the following guidelines when using SQL statements:

Use any command that is valid for the database type. However, the PowerCenter Server does not allow nested comments, even though the database may.
Use mapping parameters and variables in SQL executed against the source, but not against the target.
Use a semi-colon (;) to separate multiple statements.
The PowerCenter Server ignores semi-colons within single quotes, double quotes, or within /* ...*/.
To use a semi-colon outside of quotes or comments, escape it with a back slash (\).
The Workflow Manager does not validate the SQL.

Use Environmental SQL

For relational databases, SQL commands can be executed in the database environment when connecting to the database. Use this for source, target, lookup, and stored procedure connections. For instance, isolation levels can be set on the source and target systems to avoid deadlocks. Follow the guidelines listed above for using the SQL statements.

Use Local Variables

Use local variables in Aggregator, Expression, and Rank transformations.

Temporarily Store Data and Simplify Complex Expressions

Rather than parsing and validating the same expression each time, define these components as variables. This also allows simplification of complex expressions. For example, the following expressions:

AVG( SALARY, ( ( JOB_STATUS = 'Full-time' ) AND (OFFICE_ID = 1000 ) ) )
SUM( SALARY, ( ( JOB_STATUS = 'Full-time' ) AND (OFFICE_ID = 1000 ) ) )

can use variables to simplify complex expressions and temporarily store data:

Store Values Across Rows

Variables can be used to store data from prior rows. This can help with procedural calculations. To compare the previous state to the state just read:

IIF( PREVIOUS_STATE = STATE, STATE_COUNTER + 1, 1 )

Capture Values from Stored Procedures

Variables also provide a way to capture multiple columns of return values from stored procedures.

Tuning Mappings for Better Performance

Cloud Data Warehouse & Data Lake

Challenge

Description

Aggregators

Consider Single-Pass Reading

Optimize SQL Overrides

Scrutinize Datatype Conversions

Eliminate Transformation Errors

Optimize Memory Settings

Optimize Lookup Transformations

When to Cache Lookups

Sharing Lookup Caches

Reducing the Number of Cached Rows

Optimizing the Lookup Condition

Indexing the Lookup Table

Use a Persistent Lookup Cache for Static Lookups

Optimize Filter and Router Transformations

Optimize Aggregator Transformations

Joiner Transformation

Joining Data from the Same Source

Performance Tips

Field-Level Transformation Optimization

Factoring Out Common Logic

Minimize Function Calls

Calculate Once, Use Many Times

Choose Numeric vs. String Operations

Optimizing Char-Char and Char-Varchar Comparisons

Reduce the Number of Transformations in a Mapping

Use Pre- and Post-Session SQL Commands

Use Environmental SQL

Use Local Variables

Temporarily Store Data and Simplify Complex Expressions

Store Values Across Rows

Capture Values from Stored Procedures

Table of Contents