Businesses have always relied on data. But the past several years have been marked by advancements in the technology that drives their analysis. Plus, there's been an ever-rising volume of collected information.
As a result, International Data Corporation has projected a 9.4 percent annual growth rate for the business analytics software market through 2018.
As investment in these tools increases, many more employees across a range of organizations will have access to analytics software.
While this is promising for increasingly data-driven organizations, it comes with a major caveat: many analytics-oriented initiatives fail because of problems with data quality, thus leaving software users with inaccurate or outdated information.
Bad data is common in many organizations, and the costs can be far reaching. These challenges can lead to unnecessary IT spending, lower business user productivity, and unreliable business intelligence.
Without accurate data, it’s difficult—sometimes even impossible—for your business to make good business decisions, leading to mistakes and missed opportunities.
What's Bad Data?
Bad data is incomplete or inaccurate information, and it stems from a variety of sources.
Growing data has made it difficult for many businesses to ensure accuracy. Meanwhile, business data is often stored in different platforms or in different formats. If the information isn't consistent across different departments, it's impossible to leverage it for business intelligence.
Purchase order information may be stored in one database, and related product sales information in another. It is common for different units of measure to be used in each area and, in many cases, product numbers (SKUs), etc. are different too.
In a manufacturing scenario, it can be even more complex, with little correlation between the raw materials and finished product. This complexity significantly complicates any analytic effort and often results in errors in the interpretation of the data.
Additionally, this may mean that many errors will not be readily apparent. It may be months or even years before such issues can be identified and corrected.
Furthermore, almost all data has both known and unknown data quality problems. It's easy to notice a problem when it affects the normal, intended use of a piece of data. But it's difficult to detect a problem when everyday processes aren't affected. Many data quality issues occur at a point beyond these critical processes, and as such it is less likely to be identified—until it is used for a new purpose, such as analytics.
Unfortunately, many organizations blindly trust their data. And that just leads to disaster.
Bigger Data = Bigger Problems
The majority of organizations that undertake big data initiatives run into challenges that can lead to big data becoming bad data.
For instance, a July 2014 report from Paradigm4 found that 71 percent of data scientists experienced difficulty with their analytics initiatives because of big data. Utilizing data from a variety of different data sources was cited as a central challenge.
Bringing insight together from many different sources can be time consuming and lead to errors that go unnoticed. Another challenge is that big data may also require investment into new database technology and changes related to how the data is stored, as relational databases are often ill equipped to handle the volume of big data.
Challenges like these can lead to inaccurate or incomplete data.
As a 2015 Experian Marketing Services report showed, U.S. organizations estimate that about a quarter of their data, on average, is inaccurate, and 91 percent believe this problem directly affects revenue.
In the report, revenue impact came from several primary sources:
- Wasted resources from incorrect decision making
- Lost productivity
- Wasted marketing or communications spend
Furthermore, these issues can often go unnoticed because they may cause numerous, but small, losses in productivity or resources.
Inaccurate customer contact information forces employees to spend time tracking down the appropriate information. Without appropriate organization-wide insight into critical data, it is also easy for more severe issues to emerge. Inaccuracies in an inventory database, for instance, can lead to overinvestment in product stock of supplies.
Although problems like these can affect all types of organizations, research from Dun & Bradstreet found that the problem could be even larger among business-to-business (B2B) organizations. Researchers estimated that as much as 75 percent of the information contained in B2B databases may be inaccurate, indicating that the bad data problem could be even more prevalent among some organizations.
Causes of Bad Data
Bad data can stem from numerous sources, ranging from manual data entry errors to hardware failure. Inaccurate data often persists for long periods of time, especially when no one is actively using it. With the increasing sophistication of automation, many organizations are making decisions based on such data without knowing it.
Below are some of the common causes of bad data.
One of the big problem areas for data health is in entry.
Many of the processes for inputting data require manual input and are prone to error. This problem is made worse by the variety of data entry tasks that non-technical users may be required to do.
If a user doesn’t know the correct value for a required field, he or she may input garbage information in order to progress. This bad information then trickles throughout the organization, since the user who entered it has no idea it's wrong.
Another common problem many businesses face stems from data loss or corruption due to hardware failure.
In a 2014 Kroll Ontrack survey, hardware failure was the leading cause of data loss with nearly 70 percent of respondents reporting they had lost data from a hard-disk drive failure. Nearly 30 percent of businesses also said they lost revenue from this disruption. For some organizations, data loss actually prevented them from delivering products or services to their customers.
Although storage failure typically signals a need to review critical data, the combination of other factors that influence data quality (e.g. incomplete or inaccurate data) make it more difficult and time intensive to determine what information is affected in the event of a hardware failure.
Even when data is entered with complete accuracy, certain types of information have expiration dates. Contact information, for example, can easily become outdated when a contact switches phone numbers or changes addresses.
While it is important to ensure that vital information is kept up-to-date, many businesses struggle to do so.
According to Experian’s “The State of Data Quality” white paper, outdated data is among the top three causes of poor data health among organizations.
Insufficient or Inaccurate Metadata
Metadata provides context to its related data, and can range from technical information such as the data’s source to business information that helps users understand what the data is for.
Inaccurate or incomplete metadata creates situations in which it is easy to follow incorrect processes for working within a database.
Because metadata also provides insight into the environment the information resides in, poor metadata can also make compliance and auditing activities more time consuming.
How Data Warehouses Solve the Challenge of Bad Data
One solution to many of the challenges associated with bad data is a modern, well-designed data warehouse.
Businesses use data warehouses to collect organizational data with a single version of the truth and give users information—without fearing that it may be inaccurate or incomplete.
There are several use cases related to information management solutions.
For example, collecting relevant customer information can require data from numerous databases. This not only cost business users more time but led to IT spending unnecessary resources maintaining multiple databases that were not needed.
By centralizing data, your organization can likely reduce IT spending due to database management by a significant portion.
Businesses with disjointed data sources experience productivity losses, as a significant amount of technical users’ time is spent making critical information usable. However, this problem also leads to missed opportunities, particularly as so many organizations increase their focus on analytics initiatives.
A data warehouse unifies information in one place. This makes your business insight more accurate and more fruitful because companies will be able to leverage more of their data.
Why Organizations Can’t Rest on Bad Data
It does not take big data to cause bad data; it is a problem that all organizations can—and often do—struggle with. In addition to the right technological solutions, it is important to consider the business processes that will help lead to more accurate and reliable data.
The exact return on investment for building an environment ideal for good data will vary by organization. However, potential advantages start with improved business user productivity and reduced IT spending. Additional value comes from the ability to bring organizational data together and use it to power more robust insight.
An effective data warehouse, like Sequel Data Warehouse, can solve your bad data problems. With this IBM i data warehouse, organizations will be able to:
- Allow users to easily build and maintain data warehouse tables, indexes, and views
- Detect common errors from ETL processes automatically
- Allow users to define unlimited business rules to implement their own data quality checks
- Identify the issues with bad data via detailed reports accessible by users—as bad data is not loaded to the warehouse environment
- Handle complex calculations and transformations with ease
- Include extensive metadata to ensure that all users will be able to understand the information and how to use the data
- Include all the tools needed to manage the development, deployment, and ongoing maintenance of the data warehouse
- Benefit from a solution that is native to IBM i