Address validation is a complex process which varies based on the reference data available for individual countries. There are also many misconceptions about what address validation can actually do. This article discusses how to best leverage Address Validation (AV) within Informatica Data Quality and DaaS to obtain and maintain address quality.
Informatica’s Address Validation Transform enables a user to validate any provided postal address against the postal standard for the originating country (for those countries supported by DaaS). Refer to the DaaS website for current country level support coverage.
Given that there are over 253 countries in the world and they all have multiple, differing and changing ways of forming addresses, this is quite complex logic and it would be a very expensive and daunting challenge to create validations from scratch.
Address Validation is an extremely powerful feature of Informatica Data Quality. However, there are often misperceptions about how it should be used. It is not the right fit for all purposes and is not intended to be. AV (out-of-the-box and properly configured) is best suited to:
The Address Validation Transform will figure it out and correct addresses as needed, supplying missing information and rearrangement as required within the address block.
Not True: Address Validation expects provision of the entire address in the format required by the country. Many countries have address formats significantly different from United States Postal Service (USPS) requirements. They include dependent localities, double dependent localities, dependent streets and dependent places. These may need to be included and arranged differently in the address block. Common variations include the placement of the postal city, postal code and province on the “geography” line of the address block. Although many countries require that the address lines in the address block be ordered from specific detail to wider location detail, some countries reverse the order (e.g., Japan). Some countries even reverse the order and require a single line format (e.g., China). With the AV Transform, input data can be provided in either multi-line, discrete element, or hybrid (aka address lines, city, postal code, and province) format.
The best practice for international (non-US and Canada) addresses when all of the discrete elements of an address have not been identified for a country (having all the discrete elements available as input is extremely rare) is to use the formatted multi-line unstructured address input. International addresses can contain multiple and dependent localities, multiple thoroughfares and dependent thoroughfares, places within places, and multiple provinces as well as no locality or postal code depending on the country. And at times, some of the empty fields can be derived from other populated fields (such as postal city and postal province for postal code).
The challenge remains to format the address correctly for the country from the input as many applications either do not provide a field for all of the elements or are not used in a manner to store the whole address block in order as it should appear on a mail piece in accord with Universal Postal Union standards.
In some cases, the data needs to be expanded into other fields or split where identified into additional lines. Most source application instances do support some number of multiple address lines with postal province, postal city and postal code stored in separate fields. In this case, meeting the international mailing standards for input can be achieved by grouping these elements in accord with country specific last line formatting requirements (a much smaller set of logic) into the multi-line input format.
Not True: Some postal systems permit and deliver to addresses that include a specific organization name and postal code (e.g., UK & CA). This usually happens in the case of large volume mail receivers.
A Blank Address Line, Blank City or Blank Organization is Not Sufficient for Mailing Purposes.
Not True: Although this can be true more often than not, it is not always true. There are cases when you do not need an address line or organization name. The postal code alone may suffice (e.g., US Internal Revenue Service). Many US PO Boxes can be reached with just the 9 digit postal code, as the last 4 digits can reflect a PO Box.
Not True: Address Validation is limited in what remediation it is permitted to do by postal authorities. For example, in certified level countries such as the USA, if AV made a substantial change to the address, it would not be permitted to grade an address list for postal discounting eligibility. Commonly known interchangeable terms can be replaced with standard terms that the postal service publishes, such as ROAD to RD, but whole cloth changes such as “Corner of California & Maple Streets; San Francisco CA” to “3700 CALIFORNIA ST; SAN FRANCISCO CA” are not permitted to be made by the AV Transform.
Note: This type of change can be made with other transforms in IDQ as is done when correcting the source of a record with automated remediation, but not as part of Address Validation for list grading purposes.)
The original purpose for Address Validation is to qualify an address provided in terms of sufficiency for handling by the postal service. Each postal service has its own rules for forming an address. Qualifying an address means grading it but may not change it unless the change is deemed cosmetic and not a substantial change.
Not True: Address Validation is permitted to be wrong by postal authorities for some percentage of addresses AND in accord with some postal requirements must be wrong in some specific cases. Address Validation is only as accurate as the data provided by the postal system for the country and this data changes on a regular basis as address changes are introduced. Even postal authorities that provide certification programs for AV software do not require the software to be perfect. There is an acceptable percentage of correctness to achieve certification.
The Address Validation Certification process requires that the software provided by the address validation solution vendor pass a test. Some percentage less than 100% is a passing grade. There may be some level of non-compliance that is acceptable.
Not True: The Address Validation Transform only applies to addresses the Postal Service will attempt to deliver to. This can but may not include some physical locations such as those reflective of shipping destinations. This may not include legal addresses as defined by property boundaries. This may not include physical locations in hazardous places. This can but may not apply to certain business storefronts. This can but may not apply to garage addresses. This may not apply to military addresses.
Sufficiency for handling by the postal service implies it must be an address that the postal service will deliver to. Many addresses may not be deliverable by the postal service. These often include: Shipping Addresses, Property Description Addresses, Garage Addresses, Internal Routing Addresses, Construction Addresses, and Military Addresses as well as Hazardous Area Addresses. Billing and Mailing addresses are usually deliverable by the postal service (but they do not have to be if the underlying activity is conducted electronically). However, some corporate addresses may not be deliverable by the postal service as a corporate campus may span multiple physical addresses with only one, if any, used to receive mail. It is usually a best practice to augment Shipping Addresses with geographic coordinates such as latitude and longitude to assist delivery.
Not True: Finding no fault with a mailing address during Address Validation does not mean the recipient will actually be found at the address. The recipient could move. The mailbox could be missing or destroyed. The road could be washed out. Depending on the level of support requested and available for address validation, the house number may not actually exist.
Not True: Different types of data may be provided from Address Validation based on that which is available for the country supported. Not all countries provide postal data at the same level. For example, the United States provides postal data down to the delivery point, but South Africa only provides data down to the locality and postal code.
The quality of an address is best determined in terms of its fitness for use. For example, in the United States, addresses for mailing can be verified at the range level AND/OR at the delivery point level. At the range level, you may specify a house number or suite number in the range of those expected but it may not actually exist. That may be good enough for marketing mailing purposes but not for billing purposes. At the delivery point level, the house or suite number is verified to exist for delivery purposes. You may want a higher degree of confidence the bill would be delivered.
In India, you may lack reference data for particular elements of an address. When reference data is unavailable from AV, the most likely accurate source is the input address as provided. The AV Transform provides the Element Relevance, Element Result Status, and Element Input Status columns as well as an Address Type column to help in assessing the results of address validation. Depending on the country, you may also be provided country specific supplementary qualifiers to help in your qualification of the address for your purpose as acceptable or unacceptable.
Given the caveats on use, Address Validation is most trustworthy when it is properly configured for the data provided and provides evidence that the address provided is good.
If errors or poor quality is detected, then it makes sense to revisit the address in context and determine whether the error may be ignored, warrants automated remediation, or requires manual remediation.
If the error is to be remediated manually, determine whether it can be fixed once or needs to be handled on a re-occurring basis. If it can be fixed once and for all in the source of record, it is best to fix it in the source of record. If it is re-occurring then it can be fixed as part of a process that can include the re-application of the same “manual” fix when the same data is received to minimize manual maintenance.