Overview

Lake System scanners such as Amazon S3, ADLS Gen2 and Google Cloud Storage supports profiling below file types:

  • Avro
  • Parquet
  • CSV

Data profiling in CDGC involves evaluating the quality of metadata extracted from the respective source system. Running a profile on Avro or Parquet files in CDGC requires an advanced cluster configuration due to the complexity of these file formats. Avro and Parquet are optimized for big data processing, with Avro being row-based and Parquet being columnar. 

An advanced cluster is a Kubernetes cluster which is crucial for profiling complex file types (Avro and Parquet) in CDGC due to its distributed processing capabilities. These clusters manage large-scale data processing efficiently by distributing tasks across multiple nodes. This setup is essential for handling the complexities and sizes of Avro and Parquet files, enabling efficient data analysis.

Pre-Requisites
Self-Service Resources
Goals
  • Learn how to setup an advanced cluster to profile AVRO and Parquet file in CDGC based on the business requirement. 
  • Learn how to configure data profiling for complex file types 
Outcome
  • Gain comprehensive understanding of setting up an advanced cluster and use it to profile complex file types such as Avro and Parquet in CDGC.
Required Roles/Personas
Actions
Add to Favorites
Engagement Details
Catalog Type

Ask An Expert

Engagement Category

Feature Clarity

Products

Cloud Data Governance and Catalog

Engagement Type

Ask An Expert

Adoption Stage

Configure

Implement

Focus Area

Adoption - Technical

Functional

Engagement ID

AAE-CDGC-029

Disclaimer

  • All the topics covered in the Success Accelerators/Ask An Expert sessions are intended for guidance and advisory only. This is implicit and it will not be called out under the scope of each engagement.
  • Customers need to include their relevant technical/business team members highlighted in each engagement topic to derive the best out of each engagement.
  • Customers need to perform any hands-on work by themselves leveraging the guidance from these engagements.
  • Customers need to work with Informatica Global Customer Support for any product bugs/issues and troubleshooting.

Success

Link Copied to Clipboard