The Macrosite for News, Analysis and Opinion about the Future of the Internet
Mike Moran

De-Mystifying LSI & Other Search Engine Arcana

Written by Mike Moran
10/6/2008 2 comments
no ratings
DISCUSS   Digg   Del.icio.us   Reddit   Email This   TWEET THIS

LSI: It might sound like a TV crime show, but it's actually a well known technique of text analysis that makes search results better.

OK, it's well known to search geeks like me, anyway. You might not be familiar with Latent Semantic Indexing, but if you think you need to understand it -- or if your vendor or consultant is boasting about it -- read on.

I first heard of Latent Semantic Analysis (LSA) back in the 1990s. It’s a text analysis technique patented in the 80s that can be applied to many computer science problems, of which search indexing is one (hence Latent Semantic Indexing).

LSA is one of many text analysis techniques that look at the tendencies of certain words to be near each other in text. When you think about it, it’s very obvious (as so many great ideas are). Words don’t occur randomly -- language is a highly patterned activity, and those patterns can help computers better understand the meaning of documents.

Consider how difficult it is to correctly identify which of several meanings of a word might be the right one for a searcher. When someone searches for “jaguars,” for instance, are they looking for the animal, the car, or even the football team? When searchers type in just one word, there’s no way for the search engine to know, but the moment a second word is entered, it’s often quite clear.

For example, when someone enters “jaguar prices,” you know it’s the car. And “Mexican jaguar” is about the animal, and “jaguars quarterback” is about the football team. For a human being, it’s simple for us to understand which meaning is intended each time, but semantic analysis is one way for computers to figure it out, too.

Now, often a computer could guess right without semantic analysis, because those two-word phrases appear in the right documents. But what about a document that refers to a “Mexican Jaguar dealer”? People who search for “Mexican jaguar” would certainly not be interested, but a typical text search might turn it up, just because it contains the matching phrase. A computer that uses semantic analysis would likely not be tricked so easily, because it detects that the “Mexican jaguar” search is for animal information. Based on the language used, the search engine can tell that the document about the “Mexican Jaguar dealer” is not about animals at all.

Internet search engines certainly use many semantic analysis techniques, of which LSI is just one. Should you care which one is used? Is LSI better?

I'd argue that it isn't necessarily better. Indeed, if someone is peddling LSI as a key feature, get out your snakeoil detector.

While this stuff is exciting to propeller heads, all searchers should care about is whether they find what they are looking for. Even if you are a search marketer concerned with getting your pages ranked as highly as possible by the search engines, I'd still tell you to stop worrying about such technical arcana.

There simply isn't sufficient evidence that one semantic analysis technique is better than another. There are many variables in a search engine, and, in most cases, I've found that the content trumps the search technology.

Too many people waste their time looking for tricks and secrets to outsmart Google's ranking algorithm, but the smart folks are remembering that they must appeal to people, too. Just write naturally, using the words that make the most sense to your readers. If your work is interesting, it will be found. If you know what your customers are looking for, that will be enough.

— Mike Moran, author of Do It Wrong Quickly, is a speaker and consultant on Internet marketing

Channel:
Tags: Search
DISCUSS   Digg   Del.icio.us   Reddit   Email This
Current display:       newest comments first       display in chronological order
Mike Moran
Thinkernetter
Monday October 6, 2008 4:07:41 PM
no ratings

Hi Paul,

No one knows how much Google is using LSI unless they are employed there, and they ain't telling. My point is that I think you are best served coming up with the content that will appeal to searchers and assuming that the better you do that, the more it will be found. You can spend a lot of time trying to reverse engineer the search algorithm, and by the time you do, Google will have moved on to something else.

Trying to create thematic silos is a way to appeal to an LSI-heavy algorithm, but Google uses hundreds of factors in its algorithm, of which LSI is but one. Unless you are in an extremely competitove environment ("digital cameras"), you'd be better off creating more pages with information that searchers are looking for than staring deeply into any algorithm to over-optimize the pages you already have.

Paul Whyte
Researcher
Monday October 6, 2008 3:36:05 PM
no ratings

Hi Mike,

Thanks for the Post! My first question is how extensive is Google using LSI in its search aglorithms? Because if Google is fastly shifting to semantic indexing, then we must star taking LSI seriously even though you suggested otherwise.

Secondly, you did not provide much help to websites by way of providing ways to help them  to optimized their website for LSI. What's your take on the following advise to take advantage of LSI technology: "

"Optimising your website for Latent Semantic Indexing algorithms necessitates excellent design of your website structure and architecture. At the upmost importance is the proper use of keywords as part of your internal link anchor text that properly support your top-teir keywords.

The best way to optimise your website for Latent Semantic Indexing is to create what are known as thematic silos. This entails creating a top level page for your particular keywords and then creating pages under this page for related complementary keywords in the same theme. Looking at a practical example let’s say we have a hostel in Sydney - a silo focussing on Sydney might look like this":

 

The ThinkerNet does not reflect the views of TechWeb. The ThinkerNet is an informal means of communication to members and visitors of the Internet Evolution site. Individual authors are chosen by Internet Evolution to blog. Neither Internet Evolution nor TechWeb assume responsibility for comments, claims, or opinions made by authors and ThinkerNet bloggers. They are no substitute for your own research and should not be relied upon for trading or any other purpose.
previous posts from Mike Moran
Mike Moran
Mike Moran   2/4/2010   26 comments
When you were a kid, you might have had an imaginary friend, or you might have known someone who did. It's the kind of thing that is a natural part of child development.
Mike Moran
Mike Moran   1/26/2010   11 comments
If we’ve learned nothing else about Internet marketing, it’s that the easiest way to make a splash is with something free. Business after business on the Web has come out with something free to build mindshare in the hopes that it will lead to market share.
Mike Moran
Mike Moran   1/4/2010   8 comments
The fun thing about the Internet is that there is always something new coming along. And the really annoying thing about the Internet is that there is always something new coming along.
Mike Moran
Mike Moran   12/4/2009   27 comments
As Microsoft Corp. (Nasdaq: MSFT) and Yahoo Inc. (Nasdaq: YHOO) near the expected regulatory approval of their deal for Bing to power all Yahoo searches, Bing will become a far more important player in the search business -- one that no search marketer can safely ignore. Bing will have close to 30 percent market share, so it’s the rare marketer that can afford to ignore nearly one third of all customers.
Mike Moran
Mike Moran   12/1/2009   16 comments
It’s become fashionable the last few years to take pot shots at Google (Nasdaq: GOOG) for succeeding at only one thing: advertising. After all, the critics say, look at everything else Google has tried. They’ve taken on voicemail, video, office software, and dozens of other areas, but what do they make money at? Just advertising.
5
of
IETV: the thinkerNet on film
5
of
2pm EST
Tue
Feb 23rd
2pm EST
Thu
Mar 4th
3pm EST
Tue
Mar 9th
an IBM information resource
sponsored content
big blue blog
Todd Watson
IBM is announcing today the first of its Power7 processor-based systems and the Power7 processor itself at an event in NYC.
white papers & case studies
an IBM information resource
sponsored content
Smarter Collaboration: How to Thrive in a Challenging Business Environment
Market conditions are changing faster than ever, and organizations need to improve their agility and adaptability in order to provide better service and improve processes. The ability to work with customers, business partners, and employees as effectively as possible - while at the same time holding down costs - is a key to success.

READ THIS eBOOK
your weekly update of news, analysis, and
opinion from Internet Evolution - FREE!

REGISTER HERE
Wanted! Site Moderators
Internet Evolution is looking for a handful of readers to help moderate the message boards on our site – as well as engaging in high-IQ conversation with the industry mavens on our thinkerNet blogosphere. The job comes with various perks, bags of kudos, and GIANT bragging rights. Interested?

Please email: moderators@internetevolution.com
CMP Media LLC
Internet Evolution – not for thickies
Congress Hits the Snooze Button With China
Ira Winkler
In his
recent Congressional testimony, Dennis Blair, the U.S. director of national intelligence, stated that the U.S. is "severely threatened" by cyber attacks and that the recent Google (Nasdaq: GOOG) attacks should serve as a wake-up call.

CLICK FOR MORE
what.the.ferraro
More Pitiful Privacy from Facebook

12|16|09   |   02:08   |   2 comments


Facebook's new privacy controls just don’t cut it with little miss 'Air Quotes.'
Sweeney Blog
Businesses Go on Year-End Spending Spree

12|14|09   |   02:03   |   5 comments


Businesses and VCs are burning through the last of 2009's cash with some last-minute spending and acquisitions.
Marissa Mayer
VP of Search Products & User Experience, Google

11|3|09   |   1:57   |   No comments


Google Search Honcha talks about the new options the company has added to its search service, including fripperies such as the 'Wonderwheel.'
what.the.ferraro
The Unimportance of Real-Time Search

11|2|09   |   1:36   |   6 comments


The big news at the Web 2.0 Summit was that Twitter partnered with Google and Bing, enabling the search engines to show Tweets in search results. This couldn't possibly be less interesting.
Steve Saunders' Outernet
The Death of Anonymity: Part 4

Part 4 of 4   |  
See complete series
10|29|09   |   1:40   |   7 comments


In the final episode of this series about the death of Internet anonymity, Saunders describes how the Internet of the future will start to attain a level of intelligence that requires no human intervention. Scary.
Marissa Mayer
VP of Search Products & User Experience, Google

10|29|09   |   01:46   |   1 comment


Google's 'It Girl' talks about using personalized search to make sense of the mass of information on the Web – and how sometimes Google can appear to be semantically smarter than it really is.
Steve Saunders' Outernet
The Death of Anonymity: Part 3

Part 3 of 4   |  
See complete series
10|28|09   |   1:35   |   4 comments


What can users today do to protect their online privacy? The simplest and most obvious option is to not use the Internet – at all. However, once all digital information is consolidated over the Internet, trying to protect digital identity by simply unplugging from the Internet becomes impossible – a fact that has manifest implications for civil liberties, Saunders says.
Singer at C-Level
Bing + Twitter: Wrestling a Tweety Fire Hose

10|27|09   |   2:33   |   2 comments


Now that Bing has struck a deal with Twitter, its search service will have to process a tsunami of Tweets, many of which are worthless junk. Stefan Weitz, director with Bing Search, explains to Michael Singer how his service will make sense of the Twitter mayhem to provide relevant results to end users and enterprises.
Steve Saunders' Outernet
The Death of Anonymity: Part 2

Part 2 of 4   |  
See complete series
10|27|09   |   2:08   |   8 comments


By 2011 the number of Internet-connected sensors will exceed 1 trillion, making your chances of doing anything or going anywhere unnoticed pretty much zero. Saunders talks about how the 'sensortization' of the Internet is eliminating the traditional divide between online and offline populations.
Singer at C-Level
Inside the Bing/Twitter Deal

Part of 2   |  
See complete series
10|26|09   |   1:43   |   3 comments


Bing, Microsoft’s search service, has struck a deal with Twitter. Here Stefan Weitz, director with Bing Search, talks through how the deal will work from a technical perspective, and what’s in it for users.
Tom Nolle
Everything New Is Old Again

2|9|10   |   2:13   |   No comments


Research shows that the youth of today like Facebook – but not blogging or Twitter. Does that mean Facebook has won, or just that it's not yet out of favor? Will all the services we see today fade into Ovaltine-or-Wheaties status in just a few years?
what.the.ferraro
Email Marketing Gets Desperate

2|8|10   |   2:31   |   3 comments


Promotional emails will use just about anything timely to get people to buy things. Seriously, anything.
Steve Saunders' Outernet
America, Truck Yeah!

2|8|10   |   1:42   |   5 comments


Steve likes his new Dodge Ram 1500, but hates Chrysler's Web non-sales strategy. Rant on, li'l buddy.
what.the.ferraro
Twits Go Wild for Resignation Tweet

2|5|10   |   1:48   |   4 comments


Jonathan Schwartz is the first Fortune 200 CEO to resign via Tweet. Can he walk on water, too?
Full Nelson
Go With the FLO, Part 2

Part 2 of 2   |  
See complete series
2|5|10   |   2:17   |   3 comments


Fritz and his sweater continue their review of Qualcomm's FLO TV.
Singer at C-Level
Goldilocks & the Data Center

2|4|10   |   3:39   |   2 comments


What kinds of companies are doing the most innovation in the data center? Turns out it's midtier enterprises that are taking the "Just Right" approach.
Full Nelson
Go With the FLO, Part 1

Part of 2   |  
See complete series
2|4|10   |   2:39   |   1 comment


Qualcomm's FLO TV gizmo streams live TV shows. Tragically, they include the O'Reilly Factor
Eurotrash
High & Dry in Barcelona

2|3|10   |   1:08   |   No comments


Ray’s heading to Barcelona for the Mobile World Congress, and he’s not happy about it, the miserable git.
Sweeney Blog
No Sex, Please... It's the Super Bowl

2|3|10   |   2:24   |   2 comments


The Super Bowl ads that CBS rejected are turning up online, generating lots of attention but zero revenue for the broadcaster.
Cirque Du Solez
Books Come Alive

2|2|10   |   2:02   |   3 comments


Ray Kurzweil's Blio and Apple's iPad tablet will make it easier than ever to have books "read" to us, says Dr. Kim, who believes that talking tablets will become interwoven into our consciousness as we "merge" with the increasingly elegant machines we hold in our hands.