Companies looking to deal with big-data on the cheap have a lot more options than they used to, and that's helping firms deal with storing, organizing, searching, and analyzing a large volume of information, often in unstructured form. Blog posts, videos, weather reports, customer surveys, Twitter feeds, PDFs, network use statistics, medical records -- the stuff just piles up.
Here are some of the alternatives that have emerged to help IT cope.
Storage hardware
As prices fall for flash memory, expect to see more vendors offering products specifically meant to address the storage and management of big-data, such as the tiered storage wares now on the market. In this approach, more expensive (and faster) storage is provided for the most in-demand information. Data that is needed the least is moved to less expensive but slower alternatives. For example, you can use flash memory for the data needed immediately, disks for data needed soon, and tape storage for data that you might need someday.
You can set the system up manually, or you can use tools to allocate data dynamically to the storage option that fits it best. All the major vendors, including EMC, IBM, HP, and Hitachi, offer tiered storage solutions.
Cloud storage
Falling storage prices aren't limited to traditional hardware vendors. The cloud storage providers are also cutting their rates. In the spring, for example, both Amazon and Microsoft cut their prices. And Rackspace recently got into the game by offering cloud storage using the open-source OpenStack platform.
Management tools
Hadoop: This open-source project is the engine that drives a lot of big-data initiatives, taming information and bringing it under control while scaling to astounding sizes. The Hadoop market is forecast to grow at a compound annual rate of 58 percent to $2.2 billion in 2018. Hadoop is supported by the major vendors, including IBM and Microsoft.
Splunk: This general-purpose data analysis software allows companies to process large amounts of big-data quickly in real-time. It's already used by thousands of corporate customers, including giants like Bank of America, Comcast, Viacom, and Zynga. Splunk went public in the spring. There's a bunch of prebuilt modules that companies can use (350 in all), including one for enterprise security and one for Web intelligence. Like many other lower-cost data analytics tools, Splunk plays well with Hadoop.
Platfora: Another startup trying to make big-data easier to use is Platfora. This is a front end that sits on top of Hadoop and makes it possible for business analysts to use Hadoop in real-time, without requiring a technical background.
New talent
As big-data projects explode across major corporations, so do the wars over talent. The McKinsey Global Institute estimates that the US will face a shortfall of 140,000 to 190,000 qualified big-data analysts by 2018. As a result, some colleges are adding data science to their computer science curricula.
Courses are also available online, such as the Introduction to Data Science course from the University of Washington. This course and several related ones can be taken through Coursera for free.
— Maria Korolov is president of Trombly International, an editorial services company that provides coverage of emerging technologies and markets. She has been a journalist for more than 20 years.
The number of opportunities in the big data space is incredibly exciting - and incredibly challenging for organizations to overcome. Simultaneously, many IT positions across the board are seeing pay increases: Software and networking pay grew 3.1% on average, according to a Kenexa study. The average salary for software engineers was $101K, Dr Dobbs found in its annual salary report. I believe that's about the starting salary for big data specialists... pretty healthy!
How right you are, Mitch! My current client has about 1.8 petabytes on some of the bigggest arrays that money can buy from vendors like EMC and IBM. Compare and contrast that with a former employer, who used commodity servers and Hadoop to house about 1.1 petabytes of data. The client's bill for all that storage runs upwards of $15 million. If the total cost of the Hadoop solution ran over a quarter million dollars, I would be very surprised (the owners, bless their hearts, didn't get rich by throwing their money away). While some wouldn't consider a $350 million a year company all that small, compared to the client, they're a gnat on an elephant's back. Completely different cultures, too. I guess it's all about what you need to do with the data, and what it will cost you to hold and manipulate it. Either way, it's a scary amount of data, and an interesting insight into what companies are doing with the information they gather.
If we're lucky, the data is used to improve products, make better predictions about the weather, the economy, and fashion trends, or to develop new drugs and therapies.
I just hope they're not using brain scans to figure out ways to manipulate us into buying more stuff we don't need. :-)
And now I'm suspicious of those mind-reading headsets for video games... what data are they collecting?
Maria I wont worry much about cost only right now since data is more important to me than the cost. If my data is protected properly and is in managable state Im fine with the cost.
It's really the unstructured data that seems to be the problem. If data could be automatically structured into nice schemes, then accessing and analyzing wouldn't be as big a problem... but the trick is getting data to self-organize... :P
The one problem with storing big data on the cloud is the privacy issue, storing all that public information on the cloud that need to be mined latter can cause a headache, and even with new privacy preserving techniques in hte market as far as data mining on big data is concerned, there are still those who'd rather not risk it and go in another direction that costs a little more.
The talent shortage is already evident. Quality people are expensive and many companies particularly in ecommerce and big data are finding the need to headhunt. Signing bonuses are becoming more commonplace and retention techniques like shares with vesting periods of 4 - 5 years are also making a difference.
Retention bonuses are a great idea, @swijeyakumar. I wonder whether this is something midsize companies are able to offer, or only a tool available to enterprises. Also, can government agencies offer this perk to big-data experts or is it out of their reach as well? It's going to be interesting to see how organizations get creative to attract and retain big-data experts. I'd imagine we'll see some fantastic internship opportunities, as smaller companies try to lure students to work on projects during their education, too. Sure, that's a BandAid approach, but it will at least get some projects done.
Big Data management and analytics are a top priority for most institutions today, and with growth rates reaching 100 percent annually, infrastructure capacity and associated costs become a strain. However, business users–from online trading to equity trading and many other application environments–continue to demand analytics against richer and broader data sets for better business insights.
The ThinkerNet does not reflect the views of TechWeb. The ThinkerNet is an informal means of communication to members and visitors of the Internet Evolution site. Individual authors are chosen by Internet Evolution to blog. Neither Internet Evolution nor TechWeb assume responsibility for comments, claims, or opinions made by authors and ThinkerNet bloggers. They are no substitute for your own research and should not be relied upon for trading or any other purpose.
In the fall of 2011, around 160,000 students in 190 countries enrolled in a Stanford-sponsored online course about artificial intelligence. About 23,000 completed the course and got certificates, including 248 who got a perfect score. The university offered the same course the old-fashioned way to students sitting in Stanford classrooms. None of the those students got a perfect score.
I don't wear a watch. I haven't worn one years. If I'm carrying a phone -- any phone -- I always know what time it is and don't have to worry about time zones or daylight savings time. And I don't want to have an iPod or an iPhone that I can wear on my wrist. Again: Why? If I want to sport one while jogging, there are plenty of bands you can already buy that do that.
Organizations are expending enormous resources to improve their internal productivity by implementing cloud, adding collaborative applications, and investing in analytics solutions. Individually, we can improve our own productivity, even during sometimes lengthy meetings, by using free note-taking apps like Evernote or Microsoft OneNote.
Enterprises are discovering that using social networking within the secure setting of a SaaS provider's network gives them an unusual opportunity to freely collaborate with partners, suppliers, and even competitors.
Microsoft's recent decision to bundle its Office software with business partner offerings indicates that cloud software may be in the news, but licensed packages are still in demand for failover.
Multi-tenant clouds assure security for clients, but not necessarily for their ideas. Here's one thing you should discuss with your cloud provider before you sign on.
All the recent hoopla about cloud security overlooks an important point, which is that it's not strictly a cloud problem. The linkage of online services into cooperative chains creates the risk, and only biometrics and federation of providers can save us.
With 24/7 processing and business continuation paramount, more organizations are considering having three datacenters, where primary and secondary datacenters are in their immediate region and a third is in a remote geography. Why? To avoid repercussions of a major disaster that could hit every IT resource in a specific region.
Big-data has become a big point of emphasis for many businesses. While the technology is available to deploy these applications, the needed personnel often is not. As a result, analytic engineers' salaries have blown past the six-figure mark, and hiring these experts has become a challenge for IT managers.
Cisco's rumored sale of Linksys suggests we may have problem with innovation and profit at the edge of our Internet, and that could be critical to the evolution of many Internet-delivered services.
New York's Metropolitan Transit Authority is conducting a pilot test of digital kiosks to guide subway users to where they want to go more efficiently and at lower cost.
The whole Amazon.reader debate is a double-stupid. It's stupid to think that there's any e-book buyer who doesn't know Amazon's URL, and it was stupider to let ICANN launch the whole free-form TLD initiative to start with.
While NFC's original goal was to enhance mobile commerce applications, it is finding its way into a number of other uses, which is creating both opportunity as well as challenges for IT departments.
Enterprises would like to move to cloud computing but are hesitant because they are concerned about providers’ ability to secure company data. Here are some tips that help to ensure that if breaches occur, the business is not left holding the bag.
Edmunds separates customers into segments based on the info it collects on its site and from partners, and uses that to push out custom content, said Brian Baron, director of business analytics for Edmunds.com, at Predictive Analytics Innovation Summit.
The automotive website uses propensity modeling to target ads and customer registration forms, said Brian Baron, director of business analytics for Edmunds.com, at Predictive Analytics Innovation Summit.
Expert Integrated Systems: Changing the Experience & Economics of IT In this e-book, we take an in-depth look at these expert integrated systems -- what they are, how they work, and how they have the potential to help CIOs achieve dramatic savings while restoring IT's role as business innovator. READ THIS eBOOK
your weekly update of news, analysis, and
opinion from Internet Evolution - FREE! REGISTER HERE
Wanted! Site Moderators Internet Evolution is looking for a handful of readers to help moderate the message boards on our site as well as engaging in high-IQ conversation with the industry mavens on our thinkerNet blogosphere. The job comes with various perks, bags of kudos, and GIANT bragging rights. Interested?
To save this item to your list of favorite Internet Evolution content so you can find it later in your Profile page, click the "Save It" button next to the item.
M2M: Rise of the Machines? Not Yet David Weldon In the 1970 science fiction thriller Colossus: The Forbin Project, two giant supercomputers from the United States and Soviet Union secretly join forces to take control of the collective nuclear might of the two countries. In the film, the two machines discover each other's existence, communicate back-and-forth, share their collective data, and cut their human creators out of the process. It is the ultimate example of machine-to-machine communications, or M2M. CLICK FOR MORE
M2M: Rise of the Machines? Not Yet David Weldon In the 1970 science fiction thriller Colossus: The Forbin Project, two giant supercomputers from the United States and Soviet Union secretly join forces to take control of the collective nuclear might of the two countries. In the film, the two machines discover each other's existence, communicate back-and-forth, share their collective data, and cut their human creators out of the process. It is the ultimate example of machine-to-machine communications, or M2M. CLICK FOR MORE