LSI: It might sound like a TV crime show, but it's actually a well known technique of text analysis that makes search results better.
OK, it's well known to search geeks like me, anyway. You might not be familiar with Latent Semantic Indexing, but if you think you need to understand it -- or if your vendor or consultant is boasting about it -- read on.
I first heard of Latent Semantic Analysis (LSA) back in the 1990s. It’s a text analysis technique patented in the 80s that can be applied to many computer science problems, of which search indexing is one (hence Latent Semantic Indexing).
LSA is one of many text analysis techniques that look at the tendencies of certain words to be near each other in text. When you think about it, it’s very obvious (as so many great ideas are). Words don’t occur randomly -- language is a highly patterned activity, and those patterns can help computers better understand the meaning of documents.
Consider how difficult it is to correctly identify which of several meanings of a word might be the right one for a searcher. When someone searches for “jaguars,” for instance, are they looking for the animal, the car, or even the football team? When searchers type in just one word, there’s no way for the search engine to know, but the moment a second word is entered, it’s often quite clear.
For example, when someone enters “jaguar prices,” you know it’s the car. And “Mexican jaguar” is about the animal, and “jaguars quarterback” is about the football team. For a human being, it’s simple for us to understand which meaning is intended each time, but semantic analysis is one way for computers to figure it out, too.
Now, often a computer could guess right without semantic analysis, because those two-word phrases appear in the right documents. But what about a document that refers to a “Mexican Jaguar dealer”? People who search for “Mexican jaguar” would certainly not be interested, but a typical text search might turn it up, just because it contains the matching phrase. A computer that uses semantic analysis would likely not be tricked so easily, because it detects that the “Mexican jaguar” search is for animal information. Based on the language used, the search engine can tell that the document about the “Mexican Jaguar dealer” is not about animals at all.
Internet search engines certainly use many semantic analysis techniques, of which LSI is just one. Should you care which one is used? Is LSI better?
I'd argue that it isn't necessarily better. Indeed, if someone is peddling LSI as a key feature, get out your snakeoil detector.
While this stuff is exciting to propeller heads, all searchers should care about is whether they find what they are looking for. Even if you are a search marketer concerned with getting your pages ranked as highly as possible by the search engines, I'd still tell you to stop worrying about such technical arcana.
There simply isn't sufficient evidence that one semantic analysis technique is better than another. There are many variables in a search engine, and, in most cases, I've found that the content trumps the search technology.
Too many people waste their time looking for tricks and secrets to outsmart Google's ranking algorithm, but the smart folks are remembering that they must appeal to people, too. Just write naturally, using the words that make the most sense to your readers. If your work is interesting, it will be found. If you know what your customers are looking for, that will be enough.
— Mike Moran, author of Do It Wrong Quickly, is a speaker and consultant on Internet marketing