Many businesses have achieved what I like to think of as “basic analytics:” they can access and analyze select segments of operational data in order to make data-driven business decisions. And this is a great start to competing more effectively.
Yet it’s not enough to analyze what already happened (basic, forensic analytics) on only portions of data; you need to become more proactive, able to identify what will happen in the future (predictive, advanced analytics) by understanding all your data.
This forward-looking ability is the next frontier of enterprise analytics. It’s already well underway, and it’s one that could make or break your business. But before you start, beware these three common obstructions that can kill any advanced analytics project you undertake—and what you can do instead.
Killer #1: Building a data lake.
Many companies rely on a data warehouse as the backbone of their forensic analytics efforts, despite its high cost and time-consuming requirements. And Extract, Transform, and Load (ETL) is the first phase of creating that big data warehouse. This is because when developers following a traditional approach build an application that consumes data from different databases, that data needs to be transformed—or “normalized”—so algorithms can perform analytics efficiently.
But—in addition to their many other downsides—data warehouses don’t easily support advanced analytics, like predictive analytics or machine learning. As a result, companies tackling advanced analytics have to invest in new technical infrastructure, such as a data lake, which holds a vast amount of raw data in its native format until needed.
Both data lakes and data warehouses are expensive to build and maintain, and they don’t easily integrate. So, in essence, you complicate your already complex and costly data warehouse infrastructure when you add a costly, complex data lake layer to it. Trying to tackle advanced analytics with this overly complex architecture only exacerbates the issues already inherent within it.
Thankfully, this isn’t the only way to get advanced analytics anymore.
My company Incorta developed a modern analytics platform that seamlessly integrates forensic/basic analytics with advanced capabilities, such as machine learning and predictive analytics. Our approach uses Apache Spark instead of traditional ETLs, so complex data doesn’t need to be moved, pre-aggregated, and normalized. This means you no longer need to move data between the data warehouse and the data lake. In fact, you don’t need a data warehouse or a data lake at all!
Killer #2: Over-burdening and de-focusing scarce data scientists.
An “insight” is defined as “meaningful and useful information used to enable more effective strategic, tactical, and operational insights and decision-making.” Finding effective insights in your organizational data requires real data science. And real data science requires a unique combination of computer science, mathematics, and business acumen.
This skillset combination is hard to find. And the technologies required to employ these capabilities are difficult to use, requiring much manual work by data scientists. As a result, very few data scientists exist who possess the right skillset blend and have access to the tools they need to uncover meaningful insights.
We at Incorta overcame this issue, too. With Incorta, it’s easier to pull together and work with data for the purposes of advanced analytics. Your IT team can tease out predictive and machine learning insights—and expose them to analysts for consumption—by applying out-of-the-box, advanced enrichment “cartridges” to data brought into the Incorta platform. These cartridges feature pre-built scripts that automate common predictive and machine learning functions, which speed the time to value (TTV) for your advanced enterprise analytics efforts.
Incorta also leverages the open-standard Parquet format to seamlessly integrate with Spark for advanced data transformations, machine learning, and predictive analytics. This way, data scientists can focus on uncovering insights—where they truly add value—rather than be consumed by tedious data collection and updates.
Killer #3: Re-shaping data prior to analysis.
According to Forbes, data scientists spend about 80 percent of their time preparing and managing data for analysis—things like flattening, pre-joining, and “re-shaping” data into a workable format.1 This same approach has historically been used with advanced analytics, too, and it’s a huge time drain on already-scarce data scientist resources.
How Incorta solves this problem is our core strength. With Incorta, you can analyze data without having to re-shape it. This means you can run advanced analytics against original data—data in its original format—which drastically reduces the amount of time required for data prep.
Incorta also makes it simple to filter down to a meaningful subset of data, for targeted data science. Both of these innovative approaches empower data scientists to focus on insights, rather than operations.
Conclusion
Avoid the three common obstructions that kill advanced analytics and prevent you from gaining the very insights you desire. Now that you’re ready to reach beyond basic analytics, it’s time to consider a new and better approach—Incorta can help.
1 Forbes, “Cleaning Big Data: Most Time-Consuming, Least Enjoyable Data Science Task, Survey Says.” Mar. 23, 2016.
Want to learn more about how Incorta drives advanced enterprise analytics? Contact me directly at mike.rogers@incorta.com.