In the previous installment of this miniseries, we talked about why Perception has taken keen interest in data quality. We know part numbering and part descriptions contribute to majority of the problems around part data quality. So, let’s take a look at an actual example from our experience. Meet the part description string.
Also known as specification string. In short, this is a single string containing a comma-separated list of attribute values for the given part. Take these examples representing LED parts:
SINGLE COLOR LED, RED, 5.8 mm
T-1 3/4 SINGLE COLOR LED, PURE GREEN, 5 mm
4 mm, 1 ELEMENT, INFRARED LED, 950 nm
As you can see, part description is an perfect example of unstructured data. It contains multiple part attribute values, and it’s not always obvious to the eye what attributes it represents in which order. Manufacturers can rely on detailed specifications, which state precisely what can be expected in a part description of a specific type of part. These specifications may define following characteristics:
1. Order of the attribute values and their respective attribute names.
2. Status of each expected attribute, required or optional.
3. List of allowed values for each attribute, which can be either a finite list of values or a more general formula.
4. Type of attribute value (string, numeric, other).
Some additional characteristics might be also defined, depending on the type of part and company needs.
Read The Friendly Manual, Again
While this looks like a pretty solid foundation for working with part description strings, there are a few issues. One that you can probably see right away is you need to be familiar with specification in order to know which attributes are present on a given part description string. Furthermore, you may be dealing with multiple part types from multiple manufacturers, each with their own specification. Sometimes it’s easy. For example, if you are looking for voltage, chances are that, say, ”12V” is exactly what you’re after. But what if all you can see is ”12” and there are few other plain numeric values within the same part description string? You need to go and take a look at the specification yourself. And here’s the fun part: even then, you might still need to consult someone else, because the spec is either not precise enough or the data you’re looking at is not perfectly compliant with the spec. Or in the worst case, the spec might be outdated or even wrong. And all you wanted was to simply find the voltage for that part.
Frustrated by issues similar to the one described above, one of our customers laid out a request. What they needed from us was the ability to do this:
They gave us a ton of documentation and some time to figure out something to easily automate this process. Obviously, what they needed required a bit more than just splitting the string into pieces. In the next installment of this miniseries we’ll take a look at how we solved this issue and how it helped us with a more general task of dealing with data quality.