web analytics
Press "Enter" to skip to content

Hadoop is Dead. DataFlow is Alive!

Lars Fiedler 0

We’ve given Hadoop almost 10 years to mature, invested billions, and very few companies are seeing the return on investment. Several companies have tried to make Hadoop a real-time analytical platform, incorporating SQL-like facades on top, but the latency is still not where it needs to be for interactive applications. Even Google, a true big data user, has moved on and is using more dataflow / flow-based programming approaches. Why? It just makes sense…

Why should I store all my data in the HDFS?

How many adaptors do I need to write to get data into the HDFS?

Why is my data stagnant when my business process is fluid?

I already have massive amounts of data in my existing triple stores and relational dbs… you want me to do what?

Instead of saving all data into an HDFS, and then trying to run full parallel table scans to reduce the data, alternative systems can answer the same questions while the data is in motion. We live in an extremely chaotic data environment. Data is everywhere, exists in many different formats, and comes at us from many different directions. It’s unforseeable that we’ll be able to duplicate all of this data into a single HDFS.

By defining your processing scheme as functional blocks with inputs and outputs, and then describing dependencies between the functional blocks as dataflow connections, a platform can systematically parallelize the execution of the defined processing scheme. As data is flowing, it can be filtered, routed, aggregated and disseminated to other systems. This results in a system that is making decisions and getting answers out of its data in real-time.

It’s platforms like Composable Analytics – https://composableanalytics.com, which are going to be the big players in this next big data wave. Flow-based programming approaches provide agile and fluid mechanics for processing data.

Data got you overwhelmed? Go with the flow!

Lars Fiedler

Lars has comprehensive expertise building large complex software systems, and has served as a Software Engineer at MIT’s Lincoln Laboratory since 2010, where he began developing Composable Analytics. Prior to joining Lincoln Laboratory, Lars worked as a Software Engineer at Microsoft Corporation from 2006 to 2010. Lars received his MS in Computer Science from Georgia Institute of Technology in 2004, and his BS in Computer Science from Georgia Tech in 2003.

Comments are closed.