Cloudera open source start-up offers Hadoop cloud software for 'mere mortals'

Well backed Silicon Valley start-up Cloudera has now released a free, private cloud-oriented distribution of a Linux software environment first built by major Web enterprises for "big data."

"Hadoop offers capabilities for capturing, storing and analyzing data that are unmatched. But it's something that enterprises have shied away from until now," said Michael Olson, a former VP at Oracle and now CEO and co-founder of Burlingame, CA-based Cloudera, in a briefing with Betanews.

"Before, you practically needed to have a bunch of Ph.D's around to use Hadoop. Hospitals and banks, for example, don't tend to have these guys at hand. Hadoop configuration and management could be a real pain. But what we're offering is a lot easier for 'mere mortals' to use," according to Olson.


Web sites like Facebook, Google and Yahoo originally developed Hadoop as a way to pull together text and log data across thousands of computers on their sites and make observations about user behavior, Olson said. But any company dealing with multiple terabytes of unstructured information can also use Hadoop for any of a wide range of purposes.

One early customer, for example, is using the new Cloudera Distribution for Hadoop to analyze gene sequences, noted Christophe Cisciglia, also during the briefing. Another company co-founder, Cisciglia formerly led a initiative at Google that teamed up with the National Science Foundation (NSF) to make Google-hosted Hadoop clusters available for research and education.

Cloudera plans to earn its revenues from services and support around the software, as opposed to selling the software itself, according to Olson. "What Red Hat did for Linux, we will do for Hadoop," he said.

The company is distributing the software in the RPM packages already familiar to most Linux administrators and developers -- as well as in Amazon EC2 licenses -- under an Apache software license, Bisciglia said.

Major components of the software include Hadoop Distributed File System, a fault-tolerant file system designed to be able to cope with failures of commodity hardware; a data warehousing infrastructure known as Hive; and an implementation of MapReduce software for dividing applications into many small blocks of work for automatic parallelization and execution in large clusters.

To ease installation and configuration of its Hadoop distribution, Cloudera has set up a new portal -- also available free of charge -- called where people can use a Web-based configuration tool to produce custom packages optimized to their specific requirements. Users can also set and save cluster settings for automatic updates.

Also for free, Cloudera is providing basic training in implementing the software, along with a downloadable VMware image for testing the software on a choice of a Linux, Windows, or Macintosh desktop.

Other company co-founders include Jeff Hammerbacher, who previously launched and led a Hadoop data team at Facebook; and Dr. Amr Awadallah, former director of engineering at Yahoo.

Also this week, Cloudera announced that is has closed a $5 million round of Series A funding led by Accel Partners. The start-up is financed, as well, by a dozen private investors, including Mike Abbott, senior VP at Palm; Diane Greene, former CEO of VMware; Dr. Qi Lu, president of the Online Services Group at Microsoft and former executive VP at Yahoo; and Gideon Yu, CFO at Facebook and former senior VP at Yahoo.

[ERRATUM: Last week, we attributed Cloudera CEO Michael Olson's comments to Jeff Hammerbacher. We have made corrections above and regret the error.]

2 Responses to Cloudera open source start-up offers Hadoop cloud software for 'mere mortals'

© 1998-2022 BetaNews, Inc. All Rights Reserved. Privacy Policy - Cookie Policy.