Data Engineering Integration: Advanced

Advanced: Data Engineering Integration (Big Data Management) 6 hours

Start Learning

Informatica Data Engineering Integration (Big Data Management) delivers high-throughput data ingestion and data integration processing so business analysts can get the data they need quickly. Hundreds of prebuilt, high-performance connectors, data integration transformations, and parsers enable virtually any type of data to be quickly ingested and processed on big data infrastructure, such as Apache Hadoop, NoSQL, and MPP appliances. Beyond prebuilt components and automation, Informatica provides dynamic mappings, dynamic schema support, and parameterization for programmatic and templatized automation of data integration processes.

The Advanced Level will help you develop expertise in DEI. It constitutes of many videos, documents, and articles that will take you through performance tuning, monitoring, and troubleshooting in Sqoop, Blaze, Spark, DIS and more.

After completing this level, you will earn an Informatica Badge for Data Engineering Integration (Big Data Management). So continue with your product learning and earn your badge!

Performance Tuning

Sqoop Performance Tuning

5 minutes

1.2

Performance Tuning and Sizing

5 minutes

1.3

SPARK Performance Tuning

5 minutes

1.4

Data Integration Service Tuning

5 minutes

1.5

Model Repository Service Tuning

5 minutes

Monitoring

2.1

Monitoring Blaze Engine

5 minutes

2.2

Monitoring Tab Settings

5 minutes

2.3

Configuring the Monitoring Settings to Enable MRS to Store Historical Data

5 minutes

Working with Complex Datatypes

3.1

Using Complex datatypes on SPARK - Arrays in DEI

5 minutes

3.2

Complex Datatypes on SPARK - Structs

10 minutes

3.3

ML-Based Parsing with DEI and Intelligent Structure Discovery

10 minutes

3.4

Complex Datatypes on SPARK - Nested Port in DEI

10 minutes

3.5

Machine Learning with Python Transformation in DEI

5 minutes

REST Operations Hub

4.1

REST Operations Hub in DEI

5 minutes

SPARK

5.1

Spark History Server

5 minutes

5.2

Stateful Computing in Spark

5 minutes

Deployment (CI/CD)

6.1

Deployment Automation

15 minutes

6.2

Creating/Deploying an application

5 minutes

Advanced Concepts

7.1

Analyst Service

5 minutes

7.2

Search Service

16 minutes

7.3

DIS on GRID

10 minutes

7.4

Monitoring Tool

20 minutes

7.5

Meet the Expert: SPARK

60 minutes

7.6

Databricks Clusters using Warm Pools

3 minutes

7.7

Mapping Audits

2 minutes

7.8

Webinar: Blaze

60 minutes

Informatica Tools

8.1

Tools

Troubleshooting

9.1

Log Collection

41 minutes

In this video, you will learn how to tune SQOOP for a sample Informatica mapping by initially setting the mappers count followed by a demo.

Informatica is constantly innovating tools/systems to help you leverage the full capabilities of our products. This section will provide you the list of all such tools supporting DEI products.

InfaDump: It is a Shell script to collect the jstack, pmstack, and heapdump on running process. Helps to collect dumps at regular intervals for 'n' iterations. Click here to view the document.
pmstack: It is a tool to capture the native thread stacks on dtm/ Java process or any process running on Linux/AIX/Solaris server. Click here to view the document.
PlatformLogCollector: This Java based tool collects the Logs from the Informatica server machine. Click here to view the document.
InfaLogs: InfaLogs is a tool that can be used to collect application service logs written to file system, log service agent, workflow, and session logs. Click here to view the document.
ssgodbc: It is an interactive query tool for testing ODBC connection. You can enter a SQL statement, such as a SELECT query, and view the results. You can execute data definition language (DDL) statements to create tables and other objects. Click here to view the document.
KUtil: It is a program used to run other programs in Kerberos context. Note that KUtil honors the conf properties loaded into it and will not run the Child utility/program in Kerberos context if the property hadoop.security.authentication != kerberos. Click here to view the document.
sysmon: System Health Tracker: There are various system tools provided by Linux server to capture different system diagnostics information. This is a wrapper to invoke needed commands that collects various diagnostics information from Linux server. Click here to view the document.
File Manager Command Utility: In 10.5, you can perform file preprocessing such as list, copy, rename, move, remove, and watch on cloud ecosystems Microsoft Azure and Amazon AWS. Refer to this document for more information.
InfaCoreFileDepPackager: This tool helps in collecting the dependent libraries associated with the coredump generated from an Informatica process. Click here to view the document.

Related Tech Tuesdays Webinars

Product Feature

Informatica 10.5.x CICD Features

May 10, 2022

8:00 AM PST

Learn More →

Contents

Informatica 10.5.x CICD Features