The big data market rises and must converge -- just not yet

The diverging big data market, Google's new 'search,' Spark versus MapReduce: The Data Mill reports from the MIT Sloan CIO Symposium.

Is the big data market converging? That was one of the first questions Tom Davenport, fellow for the MIT Center for Digital Business, asked the big data panel he moderated at the MIT Sloan CIO Symposium. The resounding answer from panelists? Not yet.

For all the big data hype and the urgency companies feel to capitalize on big data, the vendor market is immature, the panelists agreed. The field is replete with specialized tools that don't always play nice with megavendor stacks and legacy systems. Plus, even when businesses find an off-the-shelf product from big software vendors -- Oracle, IBM and SAP -- the technology tends to work well for some industries but not others.

The panelists said the current state of market flux is to be expected. "It's what happens in our industry when you've got fundamental cultural changes. And that's what you've got here," said Barry Morris, founder and CEO of cloud database provider NuoDB Inc. Cultural change -- which big data represents -- leads to innovation, Morris said, which invariably leads to market segmentation and even fragmentation. Ultimately, that's an unsustainable situation because it leaves businesses "to build these vertical stacks at enormous economic costs," he said.

Or at least most of the market isn't converging. Puneet Batra, a data scientist and co-founder of LevelTrigger, a startup based in Cambridge, Massachusetts, illustrated the growing complexity of the term "big data" when he talked about how the market appears to be responding to a "bifurcation" of use cases.

For operational use cases, the market is fragmented by necessity. "There's always going to be a need for very customized solutions that can serve up quick performances," he said, pointing to technology like Hadapt. But with exploratory use cases, "we're starting to see some convergence," he said. Certain technologies such as a database or a NoSQL store are bubbling up to the surface as preferred -- and soon to be standard? -- features.

Spark vs. MapReduce

Speaking of big data technology, Morris and Batra both observed that MapReduce users are beginning "to migrate" to Apache Spark, an open source, in-memory cluster computing framework for analytics.

My colleague, Jack Vaughan, wrote about Spark in March and explained that, like the original MapReduce processing engine, the technology sits on top of the Hadoop Distributed File System and other Hadoop data sources. But unlike MapReduce, Spark processes data faster and "supports more generalized computing methods."

Spark was developed at the AMPLab at UC Berkeley. It came out of stealth mode in June 2013 and became an Apache top level project in February of this year.

Google's new 'search'

Google's leap from Internet search to driverless car isn't as big as it sounds. Instead of computer code, Google is just crawling a different terrain. "I think of it as physical search engines -- search of the physical world," said John Leonard, professor of mechanical and ocean engineering, at MIT's Computer Science and Artificial Intelligence Laboratory, at the symposium.

Doing so creates precise maps that can help driverless cars prepare for the expected. But the bigger challenge lies in reacting to the unexpected. "The ability to understand what's happening in the physical world and to predict what might happen next, say at a busy intersection, is really challenging for machines," he said.

Will Google be the company to figure it out? Leonard's on the fence about that. "I have two visions of the future: One is that Google's going to win," he said. "And the other is we're not quite there yet."

The digital dynamic

The days of competing within a single vertical are over and you can thank digitization for that. An example? Peter Weill, senior researcher and chairman of the MIT Center for Information Systems Research, pointed to an Australian supermarket chain to illustrate how a company is using technology to  upend another industry. "Coles sold more insurance -- or at least an equal amount of insurance -- than any of the leading insurance companies for the first time" this year, he said.

Previously on
The Data Mill

CFOs get schooled on big data analytics

The road to innovation is paved with APIs

Ten open data examples to get CIOs thinking

While digitization is flattening the competition landscape, it's also inspiring "interesting, new product development," he said. Just look at Orange Money by France Telecom. In Africa, where more than 50% of residents have a mobile phone but less than 10% have a bank account, Orange Money enables customers "to open an account, transfer funds, receive their salary, pay bills," according to the company website.

Say what?!?

"The world is becoming programmable. It's not just digital products; it's all products."
-- Narinder Singh, president of CloudSpokes and TopCoder; co-founder and chief strategy officer at Appirio

"You have to be willing to be wrong. You should never, ever grasp on to any of your own ideas as religious in nature that you would never let them go." -- Mark Holst-Knudsen, president, ThomasNet

"My first job is to put people like me out of business." -- Puneet Batra, data scientist, founder of LevelTrigger

"The problem is law is always written with slop in there and human judgment in there. And that human judgment is disappearing as everything becomes datafied." -- Alex "Sandy" Pentland, professor, MIT Media Lab

"Technology is moving fast and people don't change as fast, it's pretty clear. Turns out the group that's most flexible tends to be 30 to 40 year olds because, in many respects, they're doing this for the second time." -- Peter Burris, vice president and research director, Forrester Research Inc.

Welcome to The Data Mill, a weekly column devoted to all things data. Heard something newsy (or gossipy)? Email me or find me on Twitter at @TT_Nicole.

Dig Deeper on