Now that more of the big data sites on the Web are getting on board the semantic Web bandwagon and applying machine-readable RDF tags to their published content, we're starting to see the outlines of "Web 3.0" beginning to emerge in the form of the linked data movement.
But will all those semantic smarts really improve the information overload problem, or find new ways to annoy us?
The first phase in the development of the semantic Web essentially aims at defining the technology standards for making digital content "meaningful" to computers in ways that approximate the intuitions of human beings. That way, software could identify connections between different content that's specifically relevant to the context in which it is being retrieved. No more "your search returned 1,100,095 entries"; no more frustration trying to find online info about that guy who has to introduce himself as "Jon Hamm, but not the famous one."
The research community working on semantic Web technologies has put enough of a framework in place that more and more closed data systems are marking up their content using the richer RDF semantic metadata tags. The utility of semantic search within any given system depends on a critical mass of data being properly tagged, and that takes time. But the process has been ongoing and is now starting to bear fruit.
The linked data movement takes the semantic concept to the next level by automatically creating linkages between data sets, not just between data elements in the same system. Thomson Reuters Calais Release 4.0, for example, promises to deliver up unstructured data from all over the Web, including Wikipedia, DBpedia, the Internet Movie Database, and Shopping.com, as relevant results to natural language queries.
Linked data is becoming the hot new thing partly because it's being talked up by Web honcho Sir Tim Berners-Lee, and is the subject of an increasing number of industry conclaves and academic conferences. It's also cool because it promises to solve a genuine problem caused by the first and second generations of Web technology -- the flood of information that forces people to become the integration point between literally thousands of disconnected systems.
My question is, what new frustrations and unintended consequences will Web 3.0 bring? I'm all for better search results and automating electronic chimp-work, and I haven't used any of the latest-and-greatest tools, but my limited experience with anticipatory, context-sensitive "helper" systems leaves me cold.
Part of the problem is that the results are literally correct -- that is, I can appreciate why the algorithm returned those particular references and linkages, or recommended certain others -- but the association of concepts is just a little bit off in ways I could never describe properly to a machine. The deep connection between concepts is too subjective and experiential to define with metadata, and too complicated to capture in a static taxonomy.
Oddly, the gross imprecision and frustrations of Google no longer bother me. My expectations are set, and I've adjusted my workstyle to fit the idiosyncrasies of the tool. Incremental improvements are welcome. Smart, contextual solutions for very specific problems, such as BingTravel (formerly FareCast, which combs through airline pricing information to provide optimum fares and predictions about whether prices will go up or down), strike me as breakthroughs.
The promises of semantic search are vastly greater. I have no doubt that the information and patterns that can be exposed by mashing up linked semantic data will be profound, interesting, and more relevant by orders of magnitude than what we have now. But I also suspect that the frustrations will be much more insidious: like speaking to someone who understands your literal words (not just your grammar), but not your metaphors or idiomatic usage.
At the margins of natural language search, weird moments and unintentional humor will abound; and yet somehow, I don't think all those PhD scholars would be amused to think that they are creating the computer equivalent of Borat.
Oh well. I guess we have to leave something for them to discuss at the Web 4.0 conferences in 2015.
You and me both know of countless ideas on the Internet that were supposed to change the way we use/access the internet-But so many of them ended up in the Scrap heap of Internet Rejects.
Will Semantic web (in current form) join that list?
Ultimately its the crowds and the Money behind it that decides in which direction the internet goes.
Great example, EliteC! Hopefully, we won't be asking ourselves "Where's the beef?" when some of the old/new products hit the market to help us find ourselves online.
There will always be someone who will introduce it as new when most of the time it is an item already on the market with a minor change. In terms it is new by adding the change, but not in they way it is presented. For example: Hardees is introducing their new Big Carl in compeiton to McDonalds Big Mac, the difference is grill and extra pattie. However it is still a burger one just have 3 pieces of bread and the other three pieces of meat.
@ Root Maniac - That trust element is exactly where the semantic web concept may disappoint. The social life of information is complicated - we as human beings apply all kinds of subjective and experiential filters to make instanteous assessments of trustworthiness, linkages between concepts, etc. I don't doubt that networked IT systems will eventually have the horsepower to make the requisite number of computational steps (if they don't already), but I find it very hard to believe that the process outcome would resemble human judgement. Or if it did, it would do so with just enough imprecision to annoy us, confuse us, or mislead us just often enough to compromise the integrity of the process.
With Google-type search, you are asking a deaf-mute for directions, and your expectation is that it will point to stuff in broad gestures, based on its limited understanding of what you are asking. Symantic systems are supposed to speak your language and answer you with relevant information, which implies a higher expectation of trust. If that trust does not materialize because it is a) limited or b) being manipulated, the level of frustration will be a lot higher.
Any semantic analysis system is going to need a reliable mechanism of knowing which data can be trusted, otherwise any results it returns will be tainted by marketers gaming the system, ignorance, and outright lies. It's hard enough for a human browsing through Google results to decide which ones are relevant and reliable, let alone a machine. What mechanisms do these developers propose for ensuring the links between data sets will reliably reflect actual correlations, and not spurious ones created by mercantile manipulation, boneheaded bloviating, or plain old maliciousness?
Back in the early 1990s, a few vendors were trying to make hay by advertising their wares as artificial intelligence. Then, when it was apparent that the science was still highly flawed, the word AI became marketing anathema.
Here we have AI re-emerging for the Web.
My question is whether suppliers will look to avoid using the term and instead attempt to demonstrate that they've reinvented the entire concept using "new" technology.
Perhaps they have; but more likely developers have built on the AI techniques that have been long behind many different products and services.
Harrison Ford was not only in slient films as well as contemporary films, but has been remarkably well preserved and active for somebody who died in 1957 (there was an actor named Harrison Ford who did silent films and died in 1957 who is no relation to the Harrison Ford (think Star Wars) we all can immediately recall).
There will be a proof that 1+1 = 0, due to a typo, and all sorts of weird things will now be possible. I hope such a system is not hooked up to a vital or threatening system (like a doomsday device).
Many more people will not be able to fly internationally due to their names matching the list of "people of interest".
I'm sure we can come up with other possibilities, but I think the creation of links is the proper first step. The next big hurdle will be linking them together and resolving the various identifiers (and dis-ambiguating the entities they represent). One interesting startup, FreeBase, is trying to build up such a list with a common set of URIs (Unique Resource Identifiers) that can be shared and reused. They have a large set of stuff and it looks promising.
Of course, Semantic Web technology is attempting to solve two problems:
What is the context of you search/request/data and how can I present it to you?
Developing/evolving machine intelligence to make computers more responsive to normal human input (like speech).
One interesting problem will be how to present the results of your semantic-based queries when there are a large number of results (if I ask "what are the various viewpoints of the works of William Shakespeare?", I expect to see a LOT of answers, even if they are grouped and summarized along some common-sense lines).
A last note: IBM is working with Jeopardy (yes...Alex Trebeck and those folks) to have a Blue Gene system be a live participant on their show. It will listen to the answer like the two other contestants and respond with the question verbally. No date announced, but they are working on it to demonstrate how far semantic technology and AI have come, as well as speech recognition and speech synthesis. I don't know how it will buzz in to get to provide the question, though.
Semantic Web will have its own improvement curve. Its own share of mistakes & blunders. But at the end we all (including our companion machines) will come out wiser. I am witnessing my young nephews & nieces growing and learning. And its interesting how they make innocent intelligent mistakes while learning. Its exciting to see how the web evolves beyond an information highway to a 'global human knowledge base'. Even more exciting will be the transition to get the succinct wisdom out of terabytes of RDFs & ontologies.
® 2001-2009 CWH Dubai Techn, Int'l USA/UAE; has explored and are in Alpha Testing phases of this Technology.
Furthermore, with the Current Mandates for Computerazation of U.S. National Health Care Records. The Neccessity is very apparant; communications; directly and Instantly between all medical devices, and servers; for actual triage and treatment by Attending, Medical Doctors anywhere in The World, Including every prescribed medicine(s). Even the Major Business Communities shall be require in the very near future.
Moreover, this is two fold; as the need for Upgraded Security shall be a Must!
Therefore, More spending to secure the Global Networks with More Layred Security, such as CWH's 23rd. Century Designs and Software Implemtations!
Internet Evolution RSS Updates Want to stay up to date on the topics covered in this article? Use the links below to subscribe to our topical RSS feeds:
The ThinkerNet does not reflect the views of TechWeb. The ThinkerNet is an informal means of communication to members and visitors of the Internet Evolution site. Individual authors are chosen by Internet Evolution to blog. Neither Internet Evolution nor TechWeb assume responsibility for comments, claims, or opinions made by authors and ThinkerNet bloggers. They are no substitute for your own research and should not be relied upon for trading or any other purpose.
Earlier this month, I came into possession of a used Xbox console that I planned, with all due sincerity, to use exclusively as a “media center extender” to get music and movies from the PC in my office downstairs onto the TV in our bedroom.
If your Facebook friends are deadbeats, it might be harder for you to get a credit card or mortgage, according to a recent report on the banking industry site, CreditCards.com.
The ground had barely stopped trembling under Port-au-Prince when the first tweets started coming in. The 7.0 magnitude earthquake that devastated Haiti on January 12 is turning out to be another grim proving ground for a new mode of crisis management that coordinates responses through social media.
Usually a little imprecision in business intelligence systems costs companies a few dollars. Last month, a BI failure nearly cost hundreds of airline passengers their lives.
Smarter Collaboration: How to Thrive in a Challenging Business Environment Market conditions are changing faster than ever, and organizations need to improve their agility and adaptability in order to provide better service and improve processes. The ability to work with customers, business partners, and employees as effectively as possible - while at the same time holding down costs - is a key to success. READ THIS eBOOK
your weekly update of news, analysis, and
opinion from Internet Evolution - FREE! REGISTER HERE
Wanted! Site Moderators Internet Evolution is looking for a handful of readers to help moderate the message boards on our site as well as engaging in high-IQ conversation with the industry mavens on our thinkerNet blogosphere. The job comes with various perks, bags of kudos, and GIANT bragging rights. Interested?
To save this item to your list of favorite Internet Evolution content so you can find it later in your Profile page, click the "Save It" button next to the item.
What can users today do to protect their online privacy? The simplest and most obvious option is to not use the Internet – at all. However, once all digital information is consolidated over the Internet, trying to protect digital identity by simply unplugging from the Internet becomes impossible – a fact that has manifest implications for civil liberties, Saunders says.
Now that Bing has struck a deal with Twitter, its search service will have to process a tsunami of Tweets, many of which are worthless junk. Stefan Weitz, director with Bing Search, explains to Michael Singer how his service will make sense of the Twitter mayhem to provide relevant results to end users and enterprises.
By 2011 the number of Internet-connected sensors will exceed 1 trillion, making your chances of doing anything or going anywhere unnoticed pretty much zero. Saunders talks about how the 'sensortization' of the Internet is eliminating the traditional divide between online and offline populations.
Bing, Microsoft’s search service, has struck a deal with Twitter. Here Stefan Weitz, director with Bing Search, talks through how the deal will work from a technical perspective, and what’s in it for users.
The 20th Century Internet was characterized by the ability to interact with other people and information on the Internet largely without anyone knowing who you were. The Internet of this century, conversely, will be defined by identity. Saunders explains how Internet users are unwittingly contributing to the demise of the anonymous Internet.
Steve Saunders talks about the risks inherent in uncontrolled, widespread profiling of Internet users, and how one day this practice could form the basis of a new industry, the Outernet, which in economic terms will have outgrown the commercial value of the Internet itself.
Search companies and social networks are collecting incredibly detailed information about their users, says Steve Saunders, who predicts that these 'profiles' could one day become commodities to be bought and sold by companies on 'profile markets' or 'identity exchanges’ – the digital DNA equivalents of the financial and commodities exchanges on which stocks, oil, and gold are traded.
One of the most important Internet issues of all time is being ignored by the media. In this three-part video series Steve Saunders explains how search companies are turning the tables on their users by creating user profiles for financial gain, and how soon this trend will explode into full scale profiling.
Research shows that the youth of today like Facebook – but not blogging or Twitter. Does that mean Facebook has won, or just that it's not yet out of favor? Will all the services we see today fade into Ovaltine-or-Wheaties status in just a few years?
What kinds of companies are doing the most innovation in the data center? Turns out it's midtier enterprises that are taking the "Just Right" approach.