Challenge
In order to minimize risk and downtime, an Enterprise Data Catalog (EDC) upgrade requires detailed planning prior to execution . The timeline of the upgrade varies based on the type of upgrade. An EDC upgrade can be either an EBF/service pack patch or Hotfix upgrade, or a major GA release upgrade. This best practice briefly describes upgrade approaches, drivers, and other complexities.
Description
Factors Driving the EDC Upgrade
Below are potential driving factors for an EDC upgrade
- Compatible scanner availability: A compatible version of the EDC scanner is not available in the current EDC version and is only available in a newer release.
- New feature requirement: The need to use a new feature that is only available in latest release. In some case it may be helping in to reduce the scope of custom code.
- Dependent technology upgrade: To support dependent technology or Infrastructure (e.g., an external cluster or Axon upgrade).
- Critical bug: In some cases when using an older release of EDC and encountering a critical bug (that has been fixed in a later release), Informatica Global Customer Support (GCS) may suggest an upgrade as a resolution option.
- Mergers & Acquisitions: A product version consistency requirement may trigger an upgrade project in the case of a merger or an acquisition.
Upgrade Team
Below are the roles needed to carry out the required tasks for an EDC upgrade.
- Project Sponsor
- EDC Platform Administrator
- Hadoop Cluster Administrator (if external cluster is used)
- Database Administrator
- System Administrator
- EDC Data Governance User (to validate after the upgrade).
Upgrade Type
There are three types of EDC upgrades:
- In-Place Upgrade- During an in-place upgrade, the hardware will be re-used, and application binaries will be upgraded to the newer version.
- Parallel Upgrade: In a parallel upgrade, new hardware will be procured for the new EDC platform. EDC will first be installed on the new hardware and then backups of the old EDC server will be restored on the new server.
- Clone Upgrade: In a clone upgrade, the EDC server, database, and cluster will be cloned on newly procured hardware. Once the existing EDC server is cloned, an in-place upgrade will be performed on top of it.
Upgrade Planning
Below are activities the admin team needs to perform during the planning phase of an upgrade:
- Review the compatibility of the newer version with the hardware and cluster. This should include OS version, patch version, database client, etc.
- If EDC is integrated with other data governance solutions, review version compatibility with Axon, Data Quality and DPM.
- Review the EDC scanner compatibility with existing resources for any potential impact(check for older versions of data sources which may have been deprecated and plan accordingly).
- Review the existing disk structure and make sure that the new server will have similar disks mounted.
- Review the installation and upgrade guides for detailed steps.
- Review the cluster pre-requisite’s for the newer version for any additional package or newer version of Postgres.
- Review the EDC platform-side changes.
- Review the release notes of all the versions in between the current and future EDC version.
- In case any previous EBF is provided by Informatica GCS, ensure they all are ported into the newer version.
- Review the upgrade path and identify the upgrade type best suited for the upgrade.
- Prepare the upgrade plan with a detailed list of steps which includes a backup strategy, ipgrade processes, a testing strategy, and rollback options to mitigate the risk.
- If there is any client software dependency on the EDC server (e.g., native client, ODBC client, Power exchange client, etc.) add them intp the upgrade plan and assign the task to respective admins.
- If any security related changes are needed, add those additional tasks into the plan.
- Assign an owner to each task in the project plan and share the plan with all stakeholders.
- If SSO authentication is enabled for EDC users and a parallel/clone upgrade approach is selected, plan to work with the infosec team to configure the new EDC servers for SSO.
- Review the timeline for the upgrade and set appropriate expectations for downtime with stakeholders.
- If any custom model, synonym files or custom lineage files are used, be sure to back up those files before the upgrade.
- If SSO is enabled in the EDC domain and the upgrade process is a parallel upgrade, add additional tasks for the SSO setup in the new domain.
- If any application is using EDC’s internal APIs ensure that the API hasn’t changed in the new version.
- Notify developers to upgrade the client after the upgrade process.
- In the case of an in-place upgrade, take screenshots of critical resources and lineages so they can be validated after the upgrade.
Upgrade Tips
- When executing the upgrade, backup everything that can be backed up. If the infrastructure is virtual, check for vm snapshot backups with the infrastructure team. Back up all database schemas using Informatica provided utilities and with the help of database administrators.
- Back up the Informatica application binary mount point.
- Back up the keystore, truststore, keytab, and Krb5.conf files from the EDC and cluster servers.
- Take a backup of the cluster using “infacmd.sh ldm backupContents” as well as a snapshot of the HDFS directory in the local machine.
- If the upgrade requires the installation of a newer version of the cluster (e.g., HDP 2.6.5 to HDP 3.1) then read the pre-requisite for the newer version and prepare a list of additional packages needed.
- If the existing Postgres needs to be deleted and replaced with a a newer version (e.g., Postgres 9.2 to Postgres 9.6 or higher) carefully delete the binaries so that dependent binaries such as ambari-server won’t get deleted. If necessary, use “rpm -e -nodeps” to delete the Postgres.
- If the existing environment has EBF applied and the admin team is not able to figure out the EBF status in the newer release, then raise a case with Informatica GCS with an ebfhistory.info file.
- If any pre-validation check is disabled in the previous version of EDC, disable it in the newer version as well.
- If the existing environment has a custom JDBC jar added after the initial deployment, add the custom JDBC jar in the new INFA_HOME location as well.
- If CA signed certificates are planned as a part of the upgrade, add additional steps in the plan. Create the CA signed keystore and truststore in advance before starting the upgrade to avoid any delays.
- If UCF resources are used in the earlier version, then follow the below steps:
- Copy the jar from the existing installation to the newer version.
- Get a new license from Informatica GCS for the UCF license.
- Apply the license in EDC and restart the catalog service.
- In case of an internal kerberized cluster, delete the http keytab for the existing host from the existing merged keytab and add the http keytab of the new host server.
- Informatica recommends not changing the custom.properties files located under $INFA_HOME/services/CatalogService/Binaries during the upgrade period.
- After the upgrade, if there are any changes suggested by Informatica GCS for the custom.properties file, then apply those changes to the new server as well.
- If EDC for Cloud Data Integration is set up in the IICS or EDC data provisioning is configured in the old environment, install the agent in the new EDC server and modify the settings in the IICS accordingly.
- Notify applications integrated with EDC using the REST API (e.g., Axon, integration tools like DEI, the Chrome Plugin, etc.)
Testing Strategy
Identify key areas that will need to be tested after the EDC upgrade. Some of the testing activities include:
- Verify Resources: Verify all resources and compare them against the old environment.
- Re-run Resources: Re-run resources in the new environment and confirm they are succeedingand able to fetch metadataand profile the data as they did in the old environment.
- Verify Critical Lineages: After the re-run of jobs, verify critical lineages in the catalog.
- Verify User-Group Permissions on Resources: After the upgrade is complete, verify resource level permissions for the catalog admin.
- Verify the Inbound Applications: Verify all applications which are connected to EDC using the REST API, Chrome Plugin, IICS, Axon, etc.