In my previous article, I touched on some high level actions you can take after you've discovered you have data quality issues in your business analytics solution. In this article, I want to delve a little deeper into the causes of data quality issues. Specifically, I’d like to discuss data quality symptoms that one might encounter if they happen to have multiple source systems containing the same information.
Understanding the root causes of data quality issues is essential to any data quality initiative because you need to understand the cause in order to stop the flow of bad data. If root causes aren’t addressed, then you’ll be forced to perpetually clean up unacceptable data issues. In the case of multiple source systems, data quality issues can be avoided through careful planning during the design phase of a business analytics solution. Additionally, discrepancies could be addressed through a master data management initiative.
Data Quality Issues Caused by Multiple Source Systems
Many companies these days have multiple ERP’s to store information for different regions or for different lines of business. Whatever the reason, it is very likely that the data contained within these systems has some level of redundancy that can only lead to problems when attempting to create a business analytics platform.
The Complexities of Transaction Attribution
There are two big problems when dealing with data in multiple systems. The first is that transactions cannot be properly attributed to a single entity because that entity exists independently in both systems. For the sake of example, let’s assume that a company has the same vendor listed in two separate ERP systems. In the first system, it’s Vendor A, and in the second it’s listed as Vendor 3.
When the data is consolidated into a single analytics solution, if specific reconciliation steps aren’t taken, then this vendor will be listed as two completely separate companies. The result is that there may be a complete picture of the overall business, but the values for any specific vendor cannot be trusted as they may be significantly understated.
Imagine the additional negotiation leverage one might have in discussions with a vendor if you inform them that your company purchases 1 million total units of a specific part from them annually, rather than just 335,000 units as listed in the first ERP system. Very likely the price per unit could be reduced, which would have a significant impact on the bottom line.
Reconciling Contradicting Information
The second problem that occurs when dealing with multiple source systems is contradicting information. In the vendor example above, the transactions would be distinct by line of business or by geography (depending on the ERP implementation), so combining them is a safe activity. However, what about descriptive information about the vendor?
Address, for example would not be guaranteed to be the same. The vendor could have numerous site addresses, and very likely the site address in one system won’t match up with the site address in the other system.
Even if the site addresses were entered identically, what occurs once the vendor moves locations and the address changes? Perhaps one of the ERP data input teams is very diligent and the other not so. In this scenario one of the ERP addresses would be updated while the other would not be.
Compounding Data Quality Issues
Notice that you likely cannot solve the attribution problem without first addressing the contradicting information issue. Very likely the address will need to be used to tie the two vendor records together, and in turn the transactions for the two vendors could be properly attributed to the combined vendor record, providing the correct comprehensive view of vendor activity.
The issue with this, as we already established, is that the address information is very likely not in sync and therefore will not allow the vendor records to be easily aligned. The real effort will be in cleaning up the address information to ensure that entity alignment can take place cleanly and effectively.
Parts of the address cleanup activity could be automated, but there will be situations where manual review will be required. Additionally, expect this process to become more and more complicated the more systems that are added. The greater the number of variables, the greater likelihood that you won’t be comparing apples to apples, and a much larger cleanup effort will be required.
In summary, storing the same information in multiple source systems can cause several data quality problems in downstream business analytics solutions. There are two key data quality problems that must be considered when a business analytics solution is being created.
First, the problem of properly attributing transactions to an identical entity between multiple source systems. Second, the problem of identifying common entities between multiple source systems in the face of contradicting records.
The two problems are very closely related, but are two distinct issues that must be carefully considered. Depending on the magnitude of the issues, a complete master data management solution may be required.
At a minimum, the business analytics development team will need to take steps to properly reconcile the systems at design time. Depending on complexity, it may be more prudent to implement a master data management solution in order to address these issues. One thing is certain, by properly addressing incorrect entity information, the transaction attribution problem can be addressed as well.
Reconciling data between multiple source systems can be a complex activity. Success depends on fully understanding the issues at hand, and properly researching and planning before building a business analytics solution. Take care to address these issues in your business analytics solution and you can avoid these data quality pitfalls before they occur.
At my upcoming webinar on July 16th I'll discuss how poor data quality issues are robbing businesses of their hard earned resources, and how to address the root causes of data quality issues head on. This is part 1 of a 3 part series on data quality that I'll present this summer.
Good luck on your data quality initiatives, and thanks for reading.