The Macrosite for News, Analysis and Opinion about the Future of the Internet
Kim Davis

Unstructured Data: The Elephant Enters the Room

Written by Kim Davis
10/25/2011 4 comments
no ratings
DISCUSS     Email This

One of the primary challenges facing business analytics is tackling large sets of unstructured data. To get a feel for just how tricky this is, imagine a strictly structured data set. Data entered into a well designed template, with rules specifying the content of each field, is rigorously structured. Simple programming can search, sort, and sift data of this kind and extract its value.

"Unstructured data" implies hybrid collections of text, image, video, and other kinds of files. Medical databases, for example, can contain written medical records, charts and graphs, x-rays, and other images. Web pages frequently host video and flash video, images and animation, as well as text. Data from such unstructured sources, especially on a large scale (think Twitter or Facebook, of course), is very hard to handle.

The problems involved in analysing unstructured data can be set out concisely:

  • Not only is the data heterogeneous; techniques for integrating it into one analytics environment may be heterogeneous, too. APIs are only part of the solution; enterprises also need to consider how to define unconventional data, and in particular how to identify and define what is of value in it.

  • Traditional business intelligence environments are designed precisely to handle structured data (the simple database architectures described above). It's unlikely that an existing environment can straightforwardly be adapted to assimilate unstructured data. New tools and thinking are required.

  • Advance identification of analytical goals is required if large quantities of "messy" and irrelevant data are not to be imported into the intelligence environment, reducing visibility of valued information

In short, know what you're looking for and why you're looking for it -- and be prepared to innovate.

Vendors offer a range of tools designed to address unstructured data. NAS (network-attached storage) systems, for example, provide solutions for searching and sharing hybrid file content. Their analytical capacity has historically been limited, but there may be prospects for scaling it out. Commercially developed solutions specifically geared to extract value from unstructured data are increasingly available.

Enter the elephant.

Hadoop the elephant, that is, a framework for storing data on a distributed file system, potentially based across hundreds or thousands of servers, and running operations across the servers. It was named after the developer's son's toy elephant.

In a development announced today, IBM is leveraging Hadoop to support InfoSphere BigInsights, an unstructured data analytics tool sitting on the SmartCloud platform. Both free and pay versions are preconfigured and can be operated by clients almost immediately to analyze mixed collections of text, video, images, and social media content.

The launch dovetails with IBM's recent acquisition of Hadoop specialists Platform Computing. It also underlines the importance of IBM's decision, announced in May, to support the open-source Apache Hadoop project rather than create its own version of Hadoop.

InfoSphere BigInsights will allow clients to interact with Hadoop-generated analytics through a user-friendly, browser-based interface known as "BigSheets." In launching a ready-to-use, Hadoop-based solution for unstructured data, IBM is ahead of Microsoft, which is planning to launch a beta-service at year's end.

These are early days, of course, for business analytics in general, and the analysis of large, unstructured data sets in particular, but IBM's deployment of the elephant is a big step in the direction of reliable, real-time analytics with the versatility to master today's hybrid and rapidly evolving information landscape.

— Kim Davis Follow me on TwitterVisit my LinkedIn pageFriend me on Facebook, Community Editor, Internet Evolution

DISCUSS     Email This
Current display:       newest comments first       display in chronological order
hounhosp
Thinkernetter
Wednesday October 26, 2011 9:39:33 AM
no ratings

"In launching a ready-to-use, Hadoop-based solution for unstructured data, IBM is ahead of Microsoft, which is planning to launch a beta-service at year's end."

With Watson, IBM had already demonstrated its advance in information processing and data mining. The launch of the new  "Hadoop-based solution for unstructured data" will certainly give the company a substantial advance ahead its competitors.

Mashka
Researcher
Tuesday October 25, 2011 9:36:59 PM
no ratings

Kim, may be I didn't understand well,but as far as I know. there are special search  programs that can help with unstructured data.Something like  www.splunk.com

 

Nicole Ferraro
IQ Crew
Tuesday October 25, 2011 5:36:56 PM
no ratings

This is a great topic, Kim, and it's especially great that we'll be dealing with the issue of unstructured data next week when we kick of our 7DEE series on the subject of analytics. Sign up today, one and all!

Mary Jander
Thinkernetter
Tuesday October 25, 2011 5:08:52 PM
no ratings

Unstructured data is one of the most significant areas in which to deploy analytics because social media, email, and other items not formerly organized for perusal are turning out to be industry-changing market gauges. The more big players like IBM get going in this space, the more interesting solutions we'll see -- with real-time information gleaned from all kinds of unstructured information becoming a reality.

The ThinkerNet does not reflect the views of TechWeb. The ThinkerNet is an informal means of communication to members and visitors of the Internet Evolution site. Individual authors are chosen by Internet Evolution to blog. Neither Internet Evolution nor TechWeb assume responsibility for comments, claims, or opinions made by authors and ThinkerNet bloggers. They are no substitute for your own research and should not be relied upon for trading or any other purpose.
previous posts from Analytics Clan Editor's Blog
Kim Davis
Kim Davis   5/16/2013   27 comments
A study of US ecommerce trends in the run up to Mother's Day points to what Jay Henderson, Global Strategy Director for IBM Smarter Commerce, called "a pretty seismic shift" in online shopping habits.
Kim Davis
Kim Davis   5/7/2013   14 comments
The New York Times made a fuss about the obvious this week, as it so often does, pointing out that Hollywood increasingly leverages big-data to select and hone movie scripts.
Kim Davis
Kim Davis   5/2/2013   9 comments
State and local government agencies would love to get their hands around big-data. All they lack is adequate data storage and computer power, and enough staff.
Kim Davis
Kim Davis   4/22/2013   6 comments
It's unwise to overlook significant data trends, even if you're not sure what's causing them.
Kim Davis
Kim Davis   4/10/2013   4 comments
Some people -- me, I confess -- hardly ever look at Pinterest. Many people practically live on the site. If it's truly competing with Twitter to be the number two social platform to Facebook, it's hardly surprising it's also starting to get a grip on its analytics.
5
of
Tony Kontzer
Salesforce.com Trumpets the 'Social Enterprise'

9|25|12   |   1:45   |   2 comments


"Social Enterprise" is an increasingly trendy term, and Salesforce.com has been leading the way. At its Dreamforce conference last week, the theme was clear: From here on, enterprise applications must have social capabilities built in.
John Soat
Technology Santa Claus

12|23|09   |   2:06   |   4 comments


In the holiday spirit of giving, Technology Santa Clause offers a few words of advice to struggling IT professionals: ‘Be careful what you wish for.’
Kim Davis
Fast Forward to the Future

4|23|13   |   2:29   |   20 comments


A look back at tech writing in the 90s makes us wonder where enterprise IT will be 20 years from now.
Mary Maida
Medtronic Quantifies Social Business

1|9|13   |   1:15   |   No comments


The medical instruments manufacturer looks to metrics to quantify its social business engagement, according to Mary Maida, Medtronic lead information solutions manager. Internet Evolution editor in chief Mitch Wagner interviewed Maida at the E2 Innovate conference.
Second Shooter
The Cloud May Be Taking Over the Internet

11|28|12   |   2:12   |   6 comments


A change in priorities for networking spending could indicate a fundamental shift in Internet architecture that would affect everyone.
Mary E. Shacklett
Watch Your Business Secrets on Multi-Tenant Clouds

11|26|12   |   1:56   |   1 comment


Multi-tenant clouds assure security for clients, but not necessarily for their ideas. Here's one thing you should discuss with your cloud provider before you sign on.
Mitch Wagner
TweetDeck Gets a Second Life

11|5|12   |   9:54   |   13 comments


A recent release of the popular TweetDeck app for Twitter power-users gives new life to software that had previously taken a wrong turn. Here's a quick walk-through of the new TweetDeck, to show you why it should be at the top of your Twitter toolkit.
Mary E. Shacklett
Enterprises Beef Up Data Recovery

11|2|12   |   2:22   |   No comments


Global enterprises are now looking beyond having just two datacenters and toward establishing multiple datacenters in different parts of the world.
Wisdom of the Big Chair
Get on Facebook Right Now

11|1|12   |   2:42   |   No comments


A growing number of HR managers are suspicious of individuals who do not take part in social media and view them as anti-social in real life as well as online.
Mitch Wagner
Even Jerks Need Jobs

10|23|12   |   3:56   |   26 comments


Michael Brutsch, a.k.a. Reddit's Violentacrez, is a creep who posted borderline kiddie porn to the Internet anonymously, and got fired when outed by a media outlet. It's a cautionary tale even for people who aren't jerks and predators.
IETV: the thinkerNet on film
5
of
Paul J. Fleuranges
Digital Signage Keeps NYC Subway Straphangers on Track

5|6|13   |   3:51   |   No comments


New York's Metropolitan Transit Authority is conducting a pilot test of digital kiosks to guide subway users to where they want to go more efficiently and at lower cost.
Kim Davis
Fast Forward to the Future

4|23|13   |   2:29   |   20 comments


A look back at tech writing in the 90s makes us wonder where enterprise IT will be 20 years from now.
Mitch Wagner
Google Launches Its Most Depressing Service Yet

4|15|13   |   2:59   |   10 comments


Google's new Inactive Account Manager lets you control how Google disposes of your accounts when you die.
Second Shooter
Argument Over Top-Level Domains Is 'Stupid'

4|11|13   |   2:07   |   3 comments


The whole Amazon.reader debate is a double-stupid. It's stupid to think that there's any e-book buyer who doesn't know Amazon's URL, and it was stupider to let ICANN launch the whole free-form TLD initiative to start with.
Kim Davis
Ladies, Your Tablet Awaits

3|21|13   |   2:22   |   37 comments


ePad Femme is the world’s first tablet “made exclusively for women.”
Wisdom of the Big Chair
NFC Moves Into the Mainstream

3|20|13   |   2:16   |   No comments


While NFC's original goal was to enhance mobile commerce applications, it is finding its way into a number of other uses, which is creating both opportunity as well as challenges for IT departments.
Wisdom of the Big Chair
Integrating Security Into Your Cloud Contract

3|19|13   |   3:35   |   No comments


Enterprises would like to move to cloud computing but are hesitant because they are concerned about providers’ ability to secure company data. Here are some tips that help to ensure that if breaches occur, the business is not left holding the bag.
Brian Baron
How Edmunds.com Collects Customer Information

3|18|13   |   1:15   |   No comments


Edmunds separates customers into segments based on the info it collects on its site and from partners, and uses that to push out custom content, said Brian Baron, director of business analytics for Edmunds.com, at Predictive Analytics Innovation Summit.
Brian Baron
How Edmunds.com Uses Analytics to Customize Site

3|14|13   |   0:47   |   No comments


The automotive website uses propensity modeling to target ads and customer registration forms, said Brian Baron, director of business analytics for Edmunds.com, at Predictive Analytics Innovation Summit.
Second Shooter
Locked Handsets Aren't the Problem – Subsidies Are the Problem

3|13|13   |   2:09   |   10 comments


Subsidized handsets, rather than locked handsets, should be the focus of regulators. We're not getting good deals, not fostering innovation, and weakening our power as buyers.
an IBM information resource
sponsored content
big blue blog
Todd Watson
Todd Watson   5/17/2013   1 comment
It's been 17 years since I've visited the city of Dublin, but I still have some very distinct impressions from my one and only visit.
an IBM information resource
sponsored content
Expert Integrated Systems: Changing the Experience & Economics of IT
In this e-book, we take an in-depth look at these expert integrated systems -- what they are, how they work, and how they have the potential to help CIOs achieve dramatic savings while restoring IT's role as business innovator.

READ THIS eBOOK
your weekly update of news, analysis, and
opinion from Internet Evolution - FREE!

REGISTER HERE
Wanted! Site Moderators
Internet Evolution is looking for a handful of readers to help moderate the message boards on our site – as well as engaging in high-IQ conversation with the industry mavens on our thinkerNet blogosphere. The job comes with various perks, bags of kudos, and GIANT bragging rights. Interested?

Please email: moderators@internetevolution.com
Internet Evolution – not for thickies
Keep Critical Data With a Knowledge Management System
Taimoor Zubair
Fortune 500 companies lose at least
$31.5 billion a year by failing to share knowledge. A Knowledge Management System (KMS) can help companies significantly reduce these costs.

CLICK FOR MORE
IT Suffers From Obama Admin's Jekyll & Hyde Approach to Privacy Rights
Ron Miller
Recently, the Obama administration has been of two minds where privacy rights are concerned. On one hand, you have an administration that vowed to
veto CISPA and mandated open data for government websites. On the other hand, you have an increasingly out-of-control Department of Justice on a fishing expedition at AP and demanding legislation to let the FBI wiretap private, encrypted communications and levy fines if a company fails to comply.

CLICK FOR MORE
IT Suffers From Obama Admin's Jekyll & Hyde Approach to Privacy Rights
Ron Miller
Recently, the Obama administration has been of two minds where privacy rights are concerned. On one hand, you have an administration that vowed to
veto CISPA and mandated open data for government websites. On the other hand, you have an increasingly out-of-control Department of Justice on a fishing expedition at AP and demanding legislation to let the FBI wiretap private, encrypted communications and levy fines if a company fails to comply.

CLICK FOR MORE
IT Suffers From Obama Admin's Jekyll & Hyde Approach to Privacy Rights
Ron Miller
Recently, the Obama administration has been of two minds where privacy rights are concerned. On one hand, you have an administration that vowed to
veto CISPA and mandated open data for government websites. On the other hand, you have an increasingly out-of-control Department of Justice on a fishing expedition at AP and demanding legislation to let the FBI wiretap private, encrypted communications and levy fines if a company fails to comply.

CLICK FOR MORE
Websites Should Consider Tougher ID Verification Policies
Alan Reiter
The apartment and house sharing service,
Airbnb, now requires members to verify their identities by demonstrating a presence on the web, and by either scanning a government ID or entering detailed personal details. Other enterprises should take a close look at Airbnb's verification policies.

CLICK FOR MORE