Archive for January, 2011
Consider what you are doing when you enter terms into a search engine. You are asking for the best web page among the billions that exist that have the information you are interested in. The perfect search engine would understand what the page’s content is and what you are asking for. Everything we have now is a heuristic, a shortcut, that can be gamed.
PageRank is a brilliant hack, but it is basically useless now. The idea of PageRank is to use existing links to a page to determine how useful the page is. Why did this work? This worked because in the old days, people wrote articles and put links in articles that they found useful. When looked at this way, PageRank is a distributed Yahoo directory. Everyone was categorizing web pages they found useful by linking to them. PageRank then harnessed the crowd intelligence to make it searchable.
So why doesn’t the algorithm work anymore? Content farms and spammers are creating more and more of the web’s pages. So links to a page is no longer an endorsement of the page’s content by a real person. Crowd sourcing no longer works when most of the crowd are spammers and bots.