Companies looking to deal with big-data on the cheap have a lot more options than they used to, and that's helping firms deal with storing, organizing, searching, and analyzing a large volume of information, often in unstructured form. Blog posts, videos, weather reports, customer surveys, Twitter feeds, PDFs, network use statistics, medical records -- the stuff just piles up.
Here are some of the alternatives that have emerged to help IT cope.
As prices fall for flash memory, expect to see more vendors offering products specifically meant to address the storage and management of big-data, such as the tiered storage wares now on the market. In this approach, more expensive (and faster) storage is provided for the most in-demand information. Data that is needed the least is moved to less expensive but slower alternatives. For example, you can use flash memory for the data needed immediately, disks for data needed soon, and tape storage for data that you might need someday.
You can set the system up manually, or you can use tools to allocate data dynamically to the storage option that fits it best. All the major vendors, including EMC, IBM, HP, and Hitachi, offer tiered storage solutions.
Falling storage prices aren't limited to traditional hardware vendors. The cloud storage providers are also cutting their rates. In the spring, for example, both Amazon and Microsoft cut their prices. And Rackspace recently got into the game by offering cloud storage using the open-source OpenStack platform.
- Hadoop: This open-source project is the engine that drives a lot of big-data initiatives, taming information and bringing it under control while scaling to astounding sizes. The Hadoop market is forecast to grow at a compound annual rate of 58 percent to $2.2 billion in 2018. Hadoop is supported by the major vendors, including IBM and Microsoft.
- Splunk: This general-purpose data analysis software allows companies to process large amounts of big-data quickly in real-time. It's already used by thousands of corporate customers, including giants like Bank of America, Comcast, Viacom, and Zynga. Splunk went public in the spring. There's a bunch of prebuilt modules that companies can use (350 in all), including one for enterprise security and one for Web intelligence. Like many other lower-cost data analytics tools, Splunk plays well with Hadoop.
- Platfora: Another startup trying to make big-data easier to use is Platfora. This is a front end that sits on top of Hadoop and makes it possible for business analysts to use Hadoop in real-time, without requiring a technical background.
As big-data projects explode across major corporations, so do the wars over talent. The McKinsey Global Institute estimates that the US will face a shortfall of 140,000 to 190,000 qualified big-data analysts by 2018. As a result, some colleges are adding data science to their computer science curricula.
Courses are also available online, such as the Introduction to Data Science course from the University of Washington. This course and several related ones can be taken through Coursera for free.
— Maria Korolov is president of Trombly International, an editorial services company that provides coverage of emerging technologies and markets. She has been a journalist for more than 20 years.