The Macrosite for News, Analysis and Opinion about the Future of the Internet
Mike Moran

De-Mystifying LSI & Other Search Engine Arcana

Written by Mike Moran
10/6/2008 2 comments
no ratings
DISCUSS   Digg   Del.icio.us   Reddit   Email This   TWEET THIS

LSI: It might sound like a TV crime show, but it's actually a well known technique of text analysis that makes search results better.

OK, it's well known to search geeks like me, anyway. You might not be familiar with Latent Semantic Indexing, but if you think you need to understand it -- or if your vendor or consultant is boasting about it -- read on.

I first heard of Latent Semantic Analysis (LSA) back in the 1990s. It’s a text analysis technique patented in the 80s that can be applied to many computer science problems, of which search indexing is one (hence Latent Semantic Indexing).

LSA is one of many text analysis techniques that look at the tendencies of certain words to be near each other in text. When you think about it, it’s very obvious (as so many great ideas are). Words don’t occur randomly -- language is a highly patterned activity, and those patterns can help computers better understand the meaning of documents.

Consider how difficult it is to correctly identify which of several meanings of a word might be the right one for a searcher. When someone searches for “jaguars,” for instance, are they looking for the animal, the car, or even the football team? When searchers type in just one word, there’s no way for the search engine to know, but the moment a second word is entered, it’s often quite clear.

For example, when someone enters “jaguar prices,” you know it’s the car. And “Mexican jaguar” is about the animal, and “jaguars quarterback” is about the football team. For a human being, it’s simple for us to understand which meaning is intended each time, but semantic analysis is one way for computers to figure it out, too.

Now, often a computer could guess right without semantic analysis, because those two-word phrases appear in the right documents. But what about a document that refers to a “Mexican Jaguar dealer”? People who search for “Mexican jaguar” would certainly not be interested, but a typical text search might turn it up, just because it contains the matching phrase. A computer that uses semantic analysis would likely not be tricked so easily, because it detects that the “Mexican jaguar” search is for animal information. Based on the language used, the search engine can tell that the document about the “Mexican Jaguar dealer” is not about animals at all.

Internet search engines certainly use many semantic analysis techniques, of which LSI is just one. Should you care which one is used? Is LSI better?

I'd argue that it isn't necessarily better. Indeed, if someone is peddling LSI as a key feature, get out your snakeoil detector.

While this stuff is exciting to propeller heads, all searchers should care about is whether they find what they are looking for. Even if you are a search marketer concerned with getting your pages ranked as highly as possible by the search engines, I'd still tell you to stop worrying about such technical arcana.

There simply isn't sufficient evidence that one semantic analysis technique is better than another. There are many variables in a search engine, and, in most cases, I've found that the content trumps the search technology.

Too many people waste their time looking for tricks and secrets to outsmart Google's ranking algorithm, but the smart folks are remembering that they must appeal to people, too. Just write naturally, using the words that make the most sense to your readers. If your work is interesting, it will be found. If you know what your customers are looking for, that will be enough.

— Mike Moran, author of Do It Wrong Quickly, is a speaker and consultant on Internet marketing

Channel:
Tags: Search
DISCUSS   Digg   Del.icio.us   Reddit   Email This
Current display:       newest comments first       display in chronological order
Mike Moran
Thinkernetter
Monday October 6, 2008 4:07:41 PM
no ratings

Hi Paul,

No one knows how much Google is using LSI unless they are employed there, and they ain't telling. My point is that I think you are best served coming up with the content that will appeal to searchers and assuming that the better you do that, the more it will be found. You can spend a lot of time trying to reverse engineer the search algorithm, and by the time you do, Google will have moved on to something else.

Trying to create thematic silos is a way to appeal to an LSI-heavy algorithm, but Google uses hundreds of factors in its algorithm, of which LSI is but one. Unless you are in an extremely competitove environment ("digital cameras"), you'd be better off creating more pages with information that searchers are looking for than staring deeply into any algorithm to over-optimize the pages you already have.

Paul Whyte
Researcher
Monday October 6, 2008 3:36:05 PM
no ratings

Hi Mike,

Thanks for the Post! My first question is how extensive is Google using LSI in its search aglorithms? Because if Google is fastly shifting to semantic indexing, then we must star taking LSI seriously even though you suggested otherwise.

Secondly, you did not provide much help to websites by way of providing ways to help them  to optimized their website for LSI. What's your take on the following advise to take advantage of LSI technology: "

"Optimising your website for Latent Semantic Indexing algorithms necessitates excellent design of your website structure and architecture. At the upmost importance is the proper use of keywords as part of your internal link anchor text that properly support your top-teir keywords.

The best way to optimise your website for Latent Semantic Indexing is to create what are known as thematic silos. This entails creating a top level page for your particular keywords and then creating pages under this page for related complementary keywords in the same theme. Looking at a practical example let’s say we have a hostel in Sydney - a silo focussing on Sydney might look like this":

 

The ThinkerNet does not reflect the views of TechWeb. The ThinkerNet is an informal means of communication to members and visitors of the Internet Evolution site. Individual authors are chosen by Internet Evolution to blog. Neither Internet Evolution nor TechWeb assume responsibility for comments, claims, or opinions made by authors and ThinkerNet bloggers. They are no substitute for your own research and should not be relied upon for trading or any other purpose.
previous posts from Mike Moran
Mike Moran
Mike Moran   11/19/2009   11 comments
Marketers are known for exaggerated claims and stretching the truth just a wee bit. But most marketers I know truly believe in what they sell. Their aggressiveness is based on a confidence that what they are promoting truly benefits the customer.
Mike Moran
Mike Moran   11/11/2009   24 comments
A debate has raged for years as to whether you should place your juiciest content behind registration pages that visitors must fill out before seeing the good stuff. Some proponents argue that collecting that registration information is an ideal way to boost your marketing mailing lists. Others say that it ruins your search marketing and is nasty to your visitors. Who's right?
Mike Moran
Mike Moran   11/3/2009   7 comments
Time was that marketing was about the Four Ps: Product, Price, Place, and Promotion. Back when I received my certification as a marketing professional, I was schooled in each of the Four Ps. But somewhere along the way, we lost three of the Ps and were left just with promotion, or as we usually refer to it, messaging.
Mike Moran
Mike Moran   10/14/2009   18 comments
It's hard to look left and look right without bumping into something written about performance-based marketing. Everyone wants to track the number of visitors to their Website and their conversion rate. They want to know their return on advertising spend. To look at what's written out there, everyone has finally gotten the memo about performance-based marketing, right?
Mike Moran
Mike Moran   10/5/2009   27 comments
In so many companies that I work with, the forward-looking marketing people ruefully sum up their inability to move forward by explaining that the lawyers won't allow it. It could be anything from blogging, to Facebook, to message boards these days, but not so long ago, it was email and paid search.
most recent post: SeanFromIT... No wonder youth don't use Second Life...
5
of
IETV: the thinkerNet on film
5
of
2pm EST
Tue
Dec 1st
an IBM information resource
sponsored content
big blue blog
Todd Watson
Todd Watson   11/20/2009   Post a comment
While Google introduces its new Chrome OS (which I'm hearing will be widely available in one year?  Did I mishear that?), IBM announced 10 new products today to help companies using IBM System z mainframe technology.
white papers & case studies
an IBM information resource
sponsored content
Smarter Collaboration: How to Thrive in a Challenging Business Environment
Market conditions are changing faster than ever, and organizations need to improve their agility and adaptability in order to provide better service and improve processes. The ability to work with customers, business partners, and employees as effectively as possible - while at the same time holding down costs - is a key to success.

READ THIS eBOOK
your weekly update of news, analysis, and
opinion from Internet Evolution - FREE!

REGISTER HERE
Wanted! Site Moderators
Internet Evolution is looking for a handful of readers to help moderate the message boards on our site – as well as engaging in high-IQ conversation with the industry mavens on our thinkerNet blogosphere. The job comes with various perks, bags of kudos, and GIANT bragging rights. Interested?

Please email: moderators@internetevolution.com
Copyright © 2009 United Business Media Limited - All rights reserved.      About Us  |  Privacy Policy and Terms of Use  |  Contact Us
CMP Media LLC
Internet Evolution – not for thickies
Marissa Mayer
VP of Search Products & User Experience, Google

11|3|09   |   1:57   |   No comments


Google Search Honcha talks about the new options the company has added to its search service, including fripperies such as the 'Wonderwheel.'
what.the.ferraro
The Unimportance of Real-Time Search

11|2|09   |   1:36   |   6 comments


The big news at the Web 2.0 Summit was that Twitter partnered with Google and Bing, enabling the search engines to show Tweets in search results. This couldn't possibly be less interesting.
Steve Saunders' Outernet
The Death of Anonymity: Part 4

Part 4 of 4   |  
See complete series
10|29|09   |   1:40   |   7 comments


In the final episode of this series about the death of Internet anonymity, Saunders describes how the Internet of the future will start to attain a level of intelligence that requires no human intervention. Scary.
Marissa Mayer
VP of Search Products & User Experience, Google

10|29|09   |   01:46   |   1 comment


Google's 'It Girl' talks about using personalized search to make sense of the mass of information on the Web – and how sometimes Google can appear to be semantically smarter than it really is.
Steve Saunders' Outernet
The Death of Anonymity: Part 3

Part 3 of 4   |  
See complete series
10|28|09   |   1:35   |   4 comments


What can users today do to protect their online privacy? The simplest and most obvious option is to not use the Internet – at all. However, once all digital information is consolidated over the Internet, trying to protect digital identity by simply unplugging from the Internet becomes impossible – a fact that has manifest implications for civil liberties, Saunders says.
Singer at C-Level
Bing + Twitter: Wrestling a Tweety Fire Hose

10|27|09   |   2:33   |   2 comments


Now that Bing has struck a deal with Twitter, its search service will have to process a tsunami of Tweets, many of which are worthless junk. Stefan Weitz, director with Bing Search, explains to Michael Singer how his service will make sense of the Twitter mayhem to provide relevant results to end users and enterprises.
Steve Saunders' Outernet
The Death of Anonymity: Part 2

Part 2 of 4   |  
See complete series
10|27|09   |   2:08   |   8 comments


By 2011 the number of Internet-connected sensors will exceed 1 trillion, making your chances of doing anything or going anywhere unnoticed pretty much zero. Saunders talks about how the 'sensortization' of the Internet is eliminating the traditional divide between online and offline populations.
Singer at C-Level
Inside the Bing/Twitter Deal

Part of 2   |  
See complete series
10|26|09   |   1:43   |   3 comments


Bing, Microsoft’s search service, has struck a deal with Twitter. Here Stefan Weitz, director with Bing Search, talks through how the deal will work from a technical perspective, and what’s in it for users.
Marissa Mayer
VP of Search Products & User Experience, Google

10|26|09   |   01:20   |   4 comments


Google's Marissa Mayer explains how its partnership with Twitter both makes Google search more comprehensive and extends its social-networking reach.
Steve Saunders' Outernet
The Death of Anonymity: Part 1

Part 1 of 4   |  
See complete series
10|26|09   |   1:29   |   13 comments


The 20th Century Internet was characterized by the ability to interact with other people and information on the Internet largely without anyone knowing who you were. The Internet of this century, conversely, will be defined by identity. Saunders explains how Internet users are unwittingly contributing to the demise of the anonymous Internet.
what.the.ferraro
Facebook Lacks Social Skills

11|20|09   |   1:53   |   No comments


Facebook's 'Suggestions' for users demonstrate how little social networking sites understand about true social relationships.
Singer at C-Level
Smart Grid Opportunities

11|20|09   |   2:49   |   No comments


Industry initiatives and government stimulus funds are giving enterprise software vendors a great opportunity to help build out and manage smart grid technologies.
Tom Nolle
Total Telephony Transcends Telepresence

11|20|09   |   2:11   |   2 comments


The problem with telepresence is that it's not universally accepted, because video calling isn't. While we can all do video calling, we also apparently worry too much about how we look. If we want HD telepresence in our future, we have to dress down, mess up our hair, and dive into our online life.
what.the.ferraro
ThinkerNet Wins Min's Award for Best Blogs!

11|19|09   |   1:13   |   4 comments


ThinkerNet wins the Min's award for 'Best Blogs' – Internet Evolution's fifth award this year!
Full Nelson
SanFran.gov

11|19|09   |   8:51   |   No comments


Fritz has an exclusive talk with the mayor and CTO of San Francisco about that city's latest e-government efforts.
Robert D. Atkinson
America Has Much to Learn About Digital Piracy

11|18|09   |   2:09   |   No comments


The US loses about $20 billion a year on pirated software, movies, and music. But public policy can help stem the tide of digital theft. For example, France has recently passed a 'three strikes and you’re out' law, whereby if after two warning letters an individual continues to download pirated software then his Internet access will be cut off. US policy makers should consider adopting similar policies.
Singer at C-Level
Connecting Stakeholders: Part 3

Part 3 of 3   |  
See complete series
11|18|09   |   2:09   |   No comments


Financial management planning does not need to include Voodoo economics, but it does help to tap into the knowledge base of your team through some sort of real-time system. We explore your options.
Reiter's Block
Tweeting for Customer Support

11|18|09   |   2:20   |   No comments


When Reiter gets incensed over incompetent Verizon FiOS order-taking and support, he broadcasts it via Twitter. Did it do any good? How should your company offer Twitter support? Watch this for all the answers.
what.the.ferraro
Dogster.com More Popular Than Gov 2.0

11|17|09   |   2:05   |   1 comment


A lot of attention is being paid to launching Gov 2.0 Websites, but these sites aren't attracting a lot of visitors.
Reiter's Block
Is the BlackBerry 9700 'Bold' Enough?

11|17|09   |   3:07   |   4 comments


The successor to the BlackBerry Bold 9000 – the Bold 9700 – will be available soon in the US. Is it worth upgrading? Reiter's got one, and offers advice.
TechWeb The Global Leader In Technology Media