Back to Columns
What is Data Quality? Monday, February 2nd, 2009
Sentinel RCM™, HealthBIT™, and Sentrex™ are all products that focus on providing business and clinical intelligence to hospitals, pharmacies, and other healthcare entities. In order to present this information to end users, we receive data feeds from a host of clinical, procurement, and financial systems. Problems quickly occur if you can’t determine the quality of the data you’re receiving, and as a result, grading data has proven to be absolutely critical to the performance of our products. But how do you grade data in a complex environment like healthcare?It’s All About the Platform
First, you need a platform that accommodates your goals. Our business intelligence infrastructure, Datanex™, at its core provides several components to utilizing downstream applications such as our products Sentinel RCM™, Sentrex™, or HealthBIT™:- The classic Extract, Transform, and Load (ETL) components that every Business Intelligence platform utilizes.
- Application support functions which include managing user accounts, providing a common Application Programming Interface (API), and standard interface support.
- Secure, redundant, highly available storage and physical infrastructure.
- Healthcare specific high-order concepts such as managing patients, providers, clinical services, drugs, payors, procurement, and locations.
How We Grade Your Data during Extract, Transform and Load (ETL)
Quality needs to be considered for every component, as mistakes or errors on any level that creep in can impact overall performance. Focusing on the ETL process, we’ve defined several classes of quality that Datanex™ constantly monitors:- Presence/Timeliness – do we have the files/feeds we expect, when we expect them? This is easy enough to check, but often ignored by overworked staff or systems that can’t be configured to provide notifications of failure.
- Format – the easiest determination to make, but still critical to catch errors, and when any sort of maintenance work is performed, formats can unintentionally change, causing downstream problems that often aren’t identified.
- Type Validity – are we getting the correct type of information? For example, do we get a number in an age field, or garbage text? For predefined fields like sex or diagnosis codes, are we getting valid values?
- Link Quality – for those fields that are expected to be identifiers that link data across sources, are we getting the “hit rates” that we expect when matching identifiers from source A against identifiers from source B?
- Correctness – is what we’re seeing in this field consistent with what we’ve seen before? Anomalies can occur at this layer that can seriously impact performance of downstream applications.




