Insyte and Resources
How can you solve your data quality areas of pain using the international data quality standard, ISO 8000?
What is a Data Dictionary?
Abstract: This series of articles taken together explain how ISO 8000 can be part of your digital strategy. This International Standard can help to increase productivity in your organization, and cut the cost of a data cleansing or data onboarding projects. In this article, part one, of the five-part series, we explain both what a data dictionary is, and what international registration data identifiers are.
There are numerous studies examining the cost of poor quality data to organizations. Harvard Business Review put the cost at $3 trillion per year. Rule number 1 of any quality system is to get quality right first time. It has been claimed that the impact of bad data amplifies ten-fold and manifests as the 1-10-100 rule: if the cost to fix a data error at the time of entry is $1, the cost to fix it an hour after it’s been entered escalates to $10. Fix it later, and the cost becomes $100. In terms of productivity losses, the amount of time that knowledge workers waste in hidden data factories, hunting for data, finding and correcting errors, and searching for confirmatory sources for data they do not trust is estimated at 50%. These figures, of course, do not relate solely to materials data, but they are indicative of the wider issues caused by poor data quality.
Costs vary across organizations and sectors, but a speaker from a global super-major oil and gas company stated at a conference more than ten years ago, that it cost them $250 to create a master data record after accounting for all the internal and external costs. As you read this series of articles, you should consider how much it costs you to correct data, and the time wasted in your organization trying to find items that are not described correctly. In most cases the wasted costs far outweigh the cost of implementing the processes we will outline in this series.
The data dictionary and international registration data identifiers
Rarely is there just one simple method to solving any particular problem. There are, however, established best practices that can help you on your journey. The best ISO standards are a consensus of international best practices, and that includes ISO 8000, the data quality standard.
The data quality standard is published in a number of parts covering both data quality management and the exchange of characteristic data. This article is the first of a series to help you to understand how to improve the quality of your master data, starting with the foundation of good data quality, the data dictionary.
The basic ISO 8000 data architecture is outlined in this model, and you will see that it addresses universal data quality issues, such as the provenance, the accuracy, and the completeness of the data:
Figure 1 – Data Architecture
At first glance, this may look like a complicated model, and it certainly uses terms that may be unfamiliar to a lot of people otherwise familiar with data issues (links to definitions are included in this document). So, a good place to start is to explain why this model is so useful if your organization battles with using multiple terms for the same item, and how to use the same solution to manage multiple languages across your organization.
What is a data dictionary?
This is the first area, ironically, where there is often confusion. In most cases, when data cleaners refer to their “dictionary”, they are in fact referring to a collection of specification templates. In the diagram above, a single template is referred to as a “data specification”. Data specifications are discussed in more detail in part – 2.
The type of data dictionary most commonly used to provide ISO 8000 compliant data is a concept dictionary, of which the most common specialization is an open technical dictionary. Concepts in a dictionary are referred to as a data dictionary entry. In the example below, the “concept” is “(bearing) bore diameter” which is a characteristic, or “property” of a bearing, for which the property value and unit of measure may be “20mm”.
A concept links words, phrases, or a “term” of the same semantic equivalence, together with their “definition”. In the example below, you can see how the term (bearing) bore diameter, with the appropriate definition from the appropriate international standard, is linked to terms in multiple languages.
A data dictionary can be: an open technical dictionary (OTD) as in ISO 22745; a concept dictionary as in ISO 29002; a parts library (PLIB) as in ISO 13584; a reference data library (RDL) as in ISO 15926; or any data dictionary that describes products and services by means of ontologies of classes and properties. To be a source for ISO 8000 compliant data, the data dictionary must include an international registration data identifier (IRDI) which is explained later in this article.
(bearing) bore diameter – Concept IRDI: 0194-1#02-05ZBLR#1
Concept Type: Property
Another common issue when trying to decide what term your organization should standardize on, is that there are also often several different terms in common use in the same language. The example below using the same concept, (bearing) bore diameter, illustrates how a concept dictionary deals with this scenario:
(bearing) bore diameter – Concept IRDI: 0194-1#02-05ZBLR#1
Concept Type: Property
The example above shows how multiple terms are linked to definitions in a “concept”. The concept dictionary has therefore provided a link between terms and provided an authoritative definition, linked to a source that gives provenance to the concept that all the stakeholders can agree on and easily refer to, and the dictionary must also contain an identifier for each concept that is internationally recognised, an IRDI.
What is an international registration data identifier?
To be a source for ISO 8000 compliant data, the data dictionary must include an international registration data identifier (IRDI) which is globally unique for each data entry in its metadata. IRDIs are assigned to class and property names, values and units of measure. The protocol for IRDIs is documented in ISO/IEC standard 11179, Part 3. An example of an IRDI for a concept is in shown in the figure above (0194-1#02-05ZBLR#1). The IRDIs come from the identification scheme in the data architecture model above.
For ISO 8000 complaint data, an IRDI serves as the key when exchanging data among information systems, organizations, or other parties who wish to share a specific administered item, but who might not utilize the same names or contexts.
Summary – part 1
The concept dictionary can therefore rightly be proclaimed as the basic building block of a data quality program. ISO 8000 makes it clear that the key to good quality data is managing data from the “bottom up”, i.e., from the smallest meaningful elements, the property value and the unit of measure. This control is gained by the use of data types for each property. Data types are assigned to each property at the data specification stage, not the data dictionary stage. There will be more on this in part – 2. In the case of the example above, that would be the numerical value and unit of measure of the property “(bearing) bore diameter”. Inconsistent property values and units of measure are a clear indicator of poor data quality practices: using a concept dictionary to normalize the values and units of measure ensures that you are building quality into your master data.
Traditionally, most end-users have tackled the issue of normalizing master data by using third-party cataloguing companies who provide specification templates to create items of supply. For those of you that have been through this process I am sure you will recall many “happy” hours spent discussing which of the many (semantically identical) terms that are used in your organization should be used in those templates. Frequently, the decisions are made by an actor who was not necessarily the data user, this often leads to a tense atmosphere in future discussions. You will also have probably spent time in the same meetings discussing the order in which the properties in the specification temple should appear. As we discussed in the various parts of this series, this type of discussion is, thankfully, consigned to history if you adopt the process based on the standards we are discussing such as the concept dictionary.
In this part, part – 1 of the series, we have explained the role that data dictionaries play in improving your data quality. In part – 2, we will explain the role of the data specification in improving data quality. In part 3, we will explain how to create a data specification in a way that ensures data quality is built into the final master data record. In part – 4 we will explain how to create a catalogue item, and how to render short and long descriptions from that catalogue item. In part – 5, the last part of this series, we will explain and how cataloguing at source can simultaneously cut costs and increase the quality output of your data cleansing or data onboarding project.
To sum up part one of this series, in order to improve the quality of your data:
- You shall use a data dictionary with an international registration data identifier (IRDI) for each data dictionary entry.
About the author
Chief Executive MRO Insyte
Peter Eales is a subject matter expert on MRO (maintenance, repair, and operations) material management and industrial data quality. Peter is an experienced consultant, trainer, writer, and speaker on these subjects. Peter is recognised by BSI and ISO as an expert in the subject of industrial data. Peter is a member ISO/TC 184/SC 4/WG 13, the ISO standards development committee that develops standards for industrial data and industrial interfaces, ISO 8000, ISO 29002, and ISO 22745, and is also a committee member of ISO/TC 184/WG 6 that is developing the standard for Oil and Gas Interoperability, ISO 18101.
Peter has previously held positions as the global technical authority for materials management at a global EPC, and as the global subject matter expert for master data at a major oil and gas owner/operator. Peter is currently chief executive of MRO Insyte, and chairman of KOIOS Master Data.
ECCMA is a membership organization and is the project leader for ISO 22745 and ISO 8000 KOIOS Master Data is a world leading cloud MDM solution enabling ISO 8000 compliant data exchange MRO Insyte is an MRO consultancy advising organizations in all aspects of materials management