How Does Google Use Latent Semantic Indexing (LSI)?
The ranking algorithms for Google and other search engines are constantly changing. Rumor mills on the various search engine optimization forums give conflicting advice. So I turned to my friend Mike Grehan, author of Search Engine Marketing: The essential best practice guide, whom I consider an expert in the field.
My Question: What effect does Latent Semantic Indexing have on Google ranking?
Initially, search engines would look solely for the presence and frequency of keywords on a webpage to determine relevancy. But such an approach can result in poor results. For example, search results might distinguish between synonyms, such as “car” and “automobile,” or fail to distinguish between polynyms (words which have multiple meanings) such as “apple” and “computer.” Latent Semantic Indexing (LSI) is an approach to understanding keywords in the context of the words on the entire webpage.
“Latent Semantic Indexing is often misunderstood in its true purpose. (It is based on the vector space model of document classification.) Fundamentally, it operates at some level in a ranking algorithm to help alleviate issues with ranking pages purely by text pattern matching, by adding context.
Using statistical analysis, LSI can discover that documents have words which are often used in the same context. For example, “apple” and “computer” will also have “Mac OS” and are therefore also relevant. The same thing applies with “windows” as an operating system as opposed to an invention for looking through walls. It’s all about trying to understand more about the nature and intent of the user query and returning information in context with the user’s search, even when they give little clue as to the actual nature of the search. Incidentally, LSI is used by other search engines besides Google.”
You can learn more about LSI if you’re willing to wade through some technical papers:
Latent Semantic Indexing (LSI), by Clara Yu, et al., National Institute for Technology and Liberal Education, January 1, 2002.
MOLE: Text Analysis Group, THOR Center for Neuroinformatics, Section for Digital Signal Processing. http://isp.imm.dtu.dk/thor/projects/multimedia/textmining/