Unlocking the power of analytics through data pipelines

By Amitabh Mathur, Digital Transformation Adviser at DXC Technology

The old adage of looking for a needle in a haystack perfectly describes an almost impossible task. But what if you’re faced with a huge haystack, and you don’t know what that object is you’re looking for? In many ways this describes the challenge that businesses face when it comes to revealing insight from business data. The act of data analytics often requires the organisation to notice things it isn’t necessarily looking for. To compound this challenge, the rate of data growth is increasing every year as a result of connected services and the growth of the Internet of Things. It’s a sobering thought that 90% of global data was generated in the past two years alone!

Data, a diamond in the rough

So how can businesses locate vital pieces of business intelligence in their haystack of data? There are lessons to be learned from other industries; for instance, diamond mining. Just like businesses searching for data insights, miners must go through mountains of ore to find the gems they are looking for. Diamonds are mined out of huge carrot-shaped ore deposits (kimberlite); digging out these ore deposits means there is a lot of excess material that needs to be mined alongside the diamonds. On average, 1750 metric tons of material must be extracted to find a single carat of diamond. To deal with these huge quantities of material, diamond miners uses processing pipelines to mechanically crush, sort, and filter the ore until “roughs” have been separated and can be hand finished. The roughs are checked and graded, with only the best selected for the grinding and polishing process that ultimately turns a dull, translucent rock into a gleaming diamond.

Learning to use pipelines

At this point you might be asking how we can adapt this model to extract the hidden gems of insight from our data? One approach is to use a copycat processing pipeline to process data in multiple stages to extract meaning and insights. These stages convert data to information, intelligence, decisions, and actions. At each stage in the process, the volume of data drops by an order of magnitude or move, which makes the whole process easier to manage. This data processing pipeline concept is not new and is already used in some shape or form in most organisations. What is new is the amount of data and our inability to extract competitive advantage from it.

Letting AI do the heavy lifting

Dealing with this data deluge requires systems to not only look for known patterns, but also unusual or previously unknown patterns. Traditional data processing is not up to this task. People are great at this, but the sheer volume of data makes using people for this prohibitively expensive – not to mention there’s a shortage of trained data scientists who can undertake this work. Artificial intelligence (AI) could provide the solution to this problem; it is ideal for handling large volumes of data and is capable of sifting through mountains of data to identify anomalous patterns. It may not understand what these patterns represent, but AI can be used to scan data to uncover new or unusual patterns for further analysis.


So, what does this all mean for data scientists? Wouldn’t this make them redundant? Far from it — they are becoming even more invaluable. While an AI-driven data processing pipeline can extract intelligence from data and even make some routine discoveries, identifying new insights and making decisions will continue to require data specialists. The only difference is that their work is going to be a lot more engaging as they will get to deal with pre-sifted data, allowing them to focus on the most interesting information.


Data is one of the most important resources that businesses have access to. If businesses can focus on how it can be made usable through processing with AI and breaking down the process of analysis, it could give businesses the competitive advantage they are looking for. The fact is that putting data to work does not have to be an expensive or resource intensive task. However, it does require businesses to adopt new strategies. A data processing pipeline built using AI is a great place to start.