It Grows All The Time
If you have ever worked with large information systems within your organization, you’ve most likely heard the term “data quality.” Big companies have decades of information stored in their databases. That information was usually generated over time, and the policies that dictate how the data should be stored have also mutated over time to reflect the changing needs and goals of the company. Sometimes, these companies had the money and determination to update all historical data to be compliant with new policies, but often times, it made more sense to add a cheap, quick and effective workaround instead of trying to change the data itself. And often, different divisions within the same company had distinctive data policies for the presentation of the same type of data. Mergers and acquisitions meant adding new data sources, which were managed under a whole set of other policies. There are a myriad of reasons for having such issues with data, but the outcome was inevitably the same.
It all adds up. In the end, you’re looking at a huge maze of data sources, which are related to one other, sometimes even representing the same information in an entirely different way. Along with the data, you get a whole shelf of policy documents, each managing part of that monstrosity. Larger enterprises have a more complex maze. Maintaining the maze costs an organization time and money. It can also be a source of frustration for people working with all these data sources. To ease the pain, companies frequently invest the time and resources of the IT department to build a variety of tools and applications, which provide access to these data sources. Or, if the company doesn’t have an IT department or IT is not able to perform the task, the company will pay an external software company to do it for them. However, users are still forced to work with multiple authorization systems and data access utilities. Companies are burning time, money and resources simply looking for information.
Finding Bad Apples
When a product is designed for users to see all their data in a single consistent way, you usually discover incorrect data early on. Unified data view means that users can easily spot many data issues. Some of that data might go against the specification/policy that was supposed to clearly define it. The specification itself might contain invalid requirements. The data might be not accurate, not consistent, not complete, or simply not relevant. And no one spotted this before, which could happen because the tool or application responsible for presenting that data had hidden some of these issues through some useful workarounds, which also made the users believe everything was OK.
You could say this was actually expected, given the amount of data some enterprises have. Human mistakes are a fact that every organization should be aware of and some of the data the organization works with could be flawed. And that’s where Encompass Data Loader (EDL) Framework is valuable (read Ryan’s post on the EDL). Not only do we integrate and extract meaningful information out of all data sources in your organization, we can also detect if there’s something wrong with the data itself. Data aggregation matched with the ability to locate these bad apples means that the enterprise can do an even better job of setting up company-wide standards in the naming and numbering of parts.
In my next blog , I’ll show you how Encompass deals with cases when data is not what our customers want it to be. Check back for our next installment in our three-part series on how Encompass can tackle the challenges of data quality, data cleaning and data integration.