-
Success
Manage your Success Plans and Engagements, gain key insights into your implementation journey, and collaborate with your CSMsSuccessAccelerate your Purchase to Value engaging with Informatica Architects for Customer SuccessAll your Engagements at one place
-
Communities
A collaborative platform to connect and grow with like-minded Informaticans across the globeCommunitiesConnect and collaborate with Informatica experts and championsHave a question? Start a Discussion and get immediate answers you are looking forCustomer-organized groups that meet online and in-person. Join today to network, share ideas, and get tips on how to get the most out of Informatica
-
Knowledge Center
Troubleshooting documents, product guides, how to videos, best practices, and moreKnowledge CenterOne-stop self-service portal for solutions, FAQs, Whitepapers, How Tos, Videos, and moreVideo channel for step-by-step instructions to use our products, best practices, troubleshooting tips, and much moreInformation library of the latest product documentsBest practices and use cases from the Implementation team
-
Learn
Rich resources to help you leverage full capabilities of our productsLearnRole-based training programs for the best ROIGet certified on Informatica products. Free, Foundation, or ProfessionalFree and unlimited modules based on your expertise level and journeySelf-guided, intuitive experience platform for outcome-focused product capabilities and use cases
-
Resources
Library of content to help you leverage the best of Informatica productsResourcesMost popular webinars on product architecture, best practices, and moreProduct Availability Matrix statements of Informatica productsMonthly support newsletterInformatica Support Guide and Statements, Quick Start Guides, and Cloud Product Description ScheduleEnd of Life statements of Informatica products
- Success Portal
- Learning Path
- Cloud Data Ingestion and Replication
- Database Ingestion
Database Ingestion and Replication is part of the Intelligent Data Management Cloud (IDMC) Cloud Data Ingestion and Replication service. It allows large-scale data ingestion from common relational databases to various targets, including cloud-based and big-data targets. This feature requires a separate license and offers a user-friendly interface for setting up, deploying, running, and monitoring ingestion jobs.
There are three types of load operations that Database Ingestion and Replication can perform:
- Initial load: Transfers data from a source to a target at a single point in time. This is useful for migrating data to a cloud-based system, materializing targets for incremental updates, or adding data to data lakes or warehouses.
- Incremental load: Continuously updates the target with data changes since the last run or from a specified start point. This is ideal for keeping reporting, analytics, and online machine learning systems current.
- Initial and incremental load: Begins with an initial load and then switches to continuous incremental updates to the same source tables giving flexibility to perform both initial and incremental load in the same job.
The service automatically maps the source to target tables and fields based on name matching, with options to customize target table names through defined rules.
Database Ingestion and Replication can be used to address multiple inclusions. Some of the most common use cases include:
- Offline reporting: Move user reporting activity from a mission-critical production database system to a separate reporting system to avoid degrading database performance.
- Machine Learning and Generative AI: Help build data warehouses and data lakes by transferring data from multiple databases, including on-premises databases. Keep them current by continuously replicating data after the initial load. This data warehouse or data lake can then be used to power enterprise Machine Learning and Generative AI flows.
- Migration to cloud-based systems: Migrate data from on-premises database systems to cloud-based systems.
Database Ingestion and Replication operates on a Secure Agent, which must be installed on a Linux or Windows machine. Once the Secure Agent is started for the first time, the Database Ingestion and Replication agent and packages are installed locally, allowing for the configuration, deployment, and monitoring of database ingestion tasks and jobs through the Intelligent Data Management Cloud (IDMC) Web-based interface.
Deploying a task creates an executable job on the Secure Agent system. When a database ingestion job is run, task metadata is sent from the IDMC Cloud instance to the Secure Agent, which processes the data accordingly.
Change data capture (CDC) allows users to detect and manage incremental changes at the data source. Data consumers can absorb changes in real-time with minimal impact on the data source or the transport system between the data source and the consumer. CDC captures changes from database transaction logs which are then published to a destination such as a cloud data lake, cloud data warehouse, or message hub. The benefits of CDC include:
- Greater efficiency: With CDC, only data that has changed is synchronized which saves time and enhances the accuracy of data and analytics
- Lower impact on production databases: CDC has minimal impact on the source. This facilitates high-volume data transfers to the analytics target.
- Improved time to value and lower TCO: CDC helps build data pipelines faster, saving time for data engineers and architects thereby reducing total cost of ownership (TCO).
Types of CDC
- Timestamp-based CDC: Leverages a table timestamp column and retrieves only those rows that have changed since the data was last extracted. This is the simplest method to extract incremental data with CDC but can slow down production performance by consuming source CPU cycles
- Trigger based CDC: It defines triggers that fire before or after INSERT, UPDATE, or DELETE commands and are used to create change logs in a change table. This increases processing overheads and slows down source production operations.
- Log-based CDC: Transactional databases store all changes in a transaction log that helps the database to recover in the event of a crash. With log-based CDC, new database transactions (inserts, updates, and deletes) are read from source databases’ transactions without making application-level changes and without having to scan operational tables.
Benefits of Log-based CDC:
- This is the most preferred and fastest CDC method
- It is non-intrusive and least disruptive for production database sources
- No overhead on the database server performance
Database ingestion supports a wide variety of sources and targets. These include database sources such as SQL Server, SAP Hana, and mainframe sources like Db2 etc. The supported targets include a wide range of data warehouses, data lakes, and event queues. Some of the famed supported targets include Amazon Redshift, Amazon S3, Apache Kafka, Microsoft SQL Server, Databricks Delta, Google BigQuery, Microsoft Fabric OneLake, and Snowflake among others. See the full list and the load types supported for all the targets below:
Before configuring a Database Ingestion task for initial, incremental, or combined initial and incremental operations, you need to prepare the source databases and targets to ensure they are ready for the Database Ingestion and Replication task and to avoid any unexpected results. Informatica Database Ingestion supports a wide variety of sources and targets. Detailed steps to prepare these sources and targets for a database ingestion job can be found below.
Apply Modes
For incremental load and combined initial and incremental load jobs, Apply Mode indicates how source DML changes (inserts, updates, and deletes) are applied to the target. Apply Mode options include:
- Standard: Accumulates the changes in a single apply cycle and intelligently merges them into fewer SQL statements before applying them to the target. For example, if an update followed by a delete occurs on the source row, no row is applied to the target. If multiple updates occur on the same column or field, only the last update is applied to the target. If multiple updates occur on different columns or fields, the updates are merged into a single update record before being applied to the target.
- Soft Deletes: This Apply mode marks the deleted row as deleted without removing it from the database. It marks a “D” in the INFA_OPERATION_TYPE column against the deleted record. You must not perform an update on the primary key in a source table when using soft deletes or else data corruption can occur on the target.
- Audit: This applies an audit trail of every DML operation made on the source tables to the target. A row for each DML change on a source table is written to the generated target table along with the audit columns you select under the Advanced section. The audit columns contain metadata about the change such as DML operation type, time, owner, transaction ID, generated ascending sequence number, and before image.
Please note: Audit modes are not supported for Query-based CDC.
Configuration of a Database Ingestion task requires the following steps:
- Preliminary Checks
- Configuring Basic Task Information
- Configuring Source Information
- Configuring Target Information
- Configuring the Runtime Options
Click on the below resources to get detailed information about how to configure the Database Ingestion task for your choice of source and targets.
Deploying a Database Ingestion task
After defining a database ingestion task and saving it, you can deploy the task to create an executable job instance on the on-premises system that contains the Secure Agent and the Database Ingestion agent service and packages. You must deploy the task before you can run the job. The deployment process also validates the task definition. If the deployment process fails, the job status switches to Failed. You can then Undeploy the job and review the error logs from the Operational Insights page. After you resolve the problem, deploy the task again from the database ingestion and replication task wizard.
Running a Database Ingestion task
You can run a deployed database ingestion and replication job from one of the monitoring interfaces, or when you create a database ingestion and replication initial load task, you can specify a schedule for running the job instances associated with a task. Please refer to the articles below:
Superpipe
For users with Snowflake Data Cloud Targets, moving large-scale data involves computation challenges and could impact performance for high-scale/low-latency processing. This creates complexities in connecting data sources for analytics in Snowflake Data Cloud. Superpipe is a joint innovation between Informatica and Snowflake which enables customers to, replicate and stream both initial and incremental data changes up to 3.5x faster than standard CDC approaches.
Superpipe leverages Snowflake’s Snowpipe streaming and Deferred Merge for high-performance real-time ingestion into Snowflake Data Cloud. This ensures real-time data view is always available regardless of CDC’s deferred merge interval set. This change table view along with view created on the final target table lets users always have a near-real-time view. Deferred merge applies changes on a periodic basis instead of transactional boundaries which results in up to 40% lesser Snowflake credit consumption.