By Shannon Shallcross on September 15, 2016
Big Da·ta noun
An overused buzzword, which, despite its lofty sound, basically means "lots and lots of data." A Mount Everest of tangled data.
The term “Big Data” gets thrown around all too often these days, but anyone who works closely with healthcare data is intimately aware of its shortcomings. From lack of sharing patient data between providers to inconsistencies with recording patient data, the more we know about the problem, the more impossible it seems to unlock the powerful potential that lies in healthcare data. But at the heart of the issue, there are 2 main reasons why people don’t get accurate insights from their data.
Reason #1 Your Data Lies: It’s Dirty
Software expert Hollis Tibbets, formerly the Global Director of Marketing at Dell, estimated that duplicate data and bad data combined cost the U.S. economy over $3 trillion every year. This staggering number is just about two times the national deficit.
Unfortunately, the healthcare industry in particular is a breeding ground for duplicate data. The U.S. Attorney's office estimated that 14% of healthcare spending is wasted due to dirty data; this includes duplicate and/or incomplete data. With 16% of the U.S. Gross Domestic Product attributed to healthcare spending - or $2.14 Trillion total spend - that would mean that duplicate and dirty data costs the healthcare industry over $300 billion every year. And the sad reality of this issue is that 50% of IT Budgets are spent on data rehabilitation.
Larry English, an acclaimed information quality expert and creator of the Total Information Quality Methodology (TIQM) has estimated that that 15-20% of a company's operating budget can be wasted due to dirty data. This number is quantified by the exhaustive effort to extract, manipulate, append and scrub data via SQL, Excel or other means. And this estimate is independent of the fact that 30% of healthcare provider records are inaccurate or missing information due to inconsistent entry of codes and inaccurately transposing metrics or patient identifiers.
Reason #2 Your Data Lies: It’s Interpreted by People Who Do Not Understand It
A study by McKinsey has projected that “by 2018, the U.S. alone may face a 50 percent to 60 percent gap between supply and requisite demand of deep analytic talent.” The shortage is already taking hold across industries, including healthcare, finance, aerospace, insurance, and pharmaceuticals. In April 2014, the consulting firm Accenture surveyed its clients on their big-data strategies, and more than 90 percent said they planned to hire more employees with expertise in data science—most within a year. However, 41 percent of the more than 1,000 survey respondents said a lack of talent was their main hurdle.
Data Scientists are important in the process of data cleansing, appending and analysis because they work with unstructured data. These are the people who write algorithms to extract insights from the mounds of disparate data sources, including e-mails, text notes, photos and other user-generated content. They sort through the mess of dirty (messy, incomplete, and inaccurate) data and neatly append it to uncover the true insights.
All analytics must start with data investigation. Since data is inherently messy, the analysis process must start with a multi-faceted cleansing process by someone who, while working with health data, has deep clinical understanding. This knowledge enables them to identify and appropriately treat negative values, reversals, duplication, adjustments, and they understand how to handle data anomalies. This experience also enables them to check for clues throughout the process as to why data may not make sense. For example, thoroughly examining data may reveal issues with recycling patient IDs and inadvertently mixing patient data together. Yes, this happens. Dirty data is not to be trusted…ever.
Bring Truth Out of Data
It is easy to get caught up in the buzz of “Big Data.” You may have a strategy for collecting data…and maybe even an analytics department. But neither of these efforts means your data is telling the truth. If a significant part of your data management strategy is not allocated to 1) scrubbing data and 2) ensuring those who work with the data truly understand it, your data’s actionable insights (read: truth) may still be hiding.
If you liked this post, please share it!
Shannon Shallcross is the CEO of BetaXAnalytics, a company that leverages data insights to improve clinical outcomes, improve patient well being and decrease health care costs. They deliver custom tools and data analytics to managed care organizations, providers and employers to reduce costs and improve the quality of healthcare and pharmacy services.
 Tibbetts, H., 2011. $3 Trillion Problem: Three Best Practices for Today's Dirty Data Pandemic. [Online] Available at: http://hollistibbetts.sys-con.com/node/1975126.
 A Business Case for Fixing Provider Data Issues: Save Money, Reduce Waste and Improve Member Services: Proactive Provider Data Management. [Online] Available at: https://www.lexisnexis.com/risk/downloads/whitepaper/fixing-provider-data-issues-whitepaper-wp.pdf.
 Orihuela, Rodrigo and Dina Bass. Help Wanted: Black Belts in Data. [Online] Available at: http://www.bloomberg.com/news/articles/2015-06-04/help-wanted-black-belts-in-data.