Introduction to Google Ranking
Google的搜索排名算法from Official Google Blog by Karen
Posted by Amit Singhal, Google FellowIn May, Udi Manber introduced our search quality group, the group responsible for the ranking of search results. He introduced various teams within "Quality" (as we like to call the group) including Core Ranking, International Search, User Interfaces, Evaluation, Webspam, and other teams. In this post, I want to tell you more about one of these: the Core Ranking team.
五月份Udi Manber向大家介绍了我们的搜索质量小组,这个小组负责搜索结果的排名。他还提到“质量”(我们喜欢这个叫法)下面各种各样的团队,核心排名,国际搜索,用户界面,评估,网络欺骗等团队。在这个帖子里我会详细介绍其中的一个:核心排名团队。
Let me introduce myself. My name is Amit Singhal. I'm a Google Fellow in charge of the ranking team at Google. I've worked in the field of search for the past eighteen years, having been introduced to search in 1990 as a graduate student in computer science. In the academic world, the field of search is known as Information Retrieval (or IR). After spending a decade as an IR researcher, I came to Google in 2000, and have worked on Google ranking ever since.
首先允许我介绍一下自己,我的名字是Amit Singhal。我是负责排名团队的Google人。从1990年从计算机科学系毕业以来就进入搜索领域,到现在已经有十八个年头,搜索又被称为信息检索。在信息检索领域研究了十来年后,我于2000年来到Google,从那时起一直在Google的排名领域工作。
Google ranking is a collection of algorithms used to find the most relevant documents for a user query. We do this for hundreds of millions of queries a day, from a collection of billions and billions of pages. These algorithms are run for every query entered into most of Google's search services. While our web search is the most used Google search service and the most widely known, the same ranking algorithms are also used - with some modifications - for other Google search services, including Images, News, YouTube, Maps, Product Search, Book Search, and more.
Google的排名是一系列算法的组合,这些算法被用来找与用户搜索关键词最相关的文件。每天我们都要在数亿万网页上处理上亿的搜索关键词。几乎所有的Google搜索服务上进行的搜索都会用到这些算法。我们的网页搜索是Google最常用也是最广为人知的搜索服务,和Google别的搜索服务,使用的是同样的排名算法-只是经过一些修改。
The most common question I get asked about Google's ranking is "how do you do it?" Of course, there is a lot that goes into building a state-of-the-art ranking system like ours, and I will delve deeper into the technology behind it in a later post. Today, I would like to briefly share the philosophies behind Google ranking:
关于Google的排名人们问我最多的问题是你们到底怎么做的?当然要创建一个像我们这样的顶级排名系统比较的复杂,在接下来的帖子里我会深入介绍排名背后的技术。今天我要分享的是Google排名算法的理念。
1) Best locally relevant results served globally.The first one is obvious. Given our passion for search, we absolutely want to make sure that every user query gets the most relevant results. We often call this the "no query left behind" principle. Whenever we return less than ideal results for any query in any language in any country - and we do (search is by no means a solved problem) - we use that as an inspiration for future improvements.
最好的与当地相关的搜索结果全球都有用
2) Keep it simple.
尽可能简单
3) No manual intervention.
没有人工参与
第一条很明显。以我们对搜索的热情,我们完全希望每位用户的搜索关键词都能得到最相关的结果。这也是我们常说的不落下任何一个搜索关键词的原则。如果有关键词搜到的结果没有预期的那么多,不管是用什么语言,在哪个国家,这种情况有时候确实会出现(搜索远不是一个已经解决的问题)-这种不理想的搜索会激励我们在将来不断的改进。
The second principle seems obvious. Isn't it the desire of all system architects to keep their systems simple? Well, as search systems go, given the wide variety of user queries we have to respond to in multiple languages, it is easy to go down the path where more and more complexity creeps into the system to serve the next incremental fraction of the queries. We work very hard to keep our system simple without compromising on the quality of results. This is an ongoing effort, and a worthy one. We make about ten ranking changes every week and simplicity is a big consideration in launching every change. Our engineers understand exactly why a page was ranked the way it was for a given query. This simple understandable system has allowed us innovate quickly, and it shows. The "keep it simple" philosophy has served us well.
第二条也是比较明晰。保持系统的简单难道不是所有系统设计师的渴望吗?在搜索系统运行的过程中,想一想我们要处理的那么广泛的领域,来自那么多语言的搜索关键词,要保证处理好数量不断增长的搜索关键词,系统会变得越来越复杂。在保证搜索结果质量的情况下我们尽量保持系统的简洁。这需要长期的努力,但是非常的值得。每个星期我们大概会对搜索排名做十个改动,每次修改的发布简洁都是重点考虑的对象。我们的工程师可以理解一个搜索关键词下的某个页面为什么会得到这样的排名。有这套易懂的系统,我们的创新更快,也更容易反映出来。保持简单的理念用于我们非常的恰当。
No discussion of Google's ranking would be complete without asking the common - but misguided! :) - question: "Does Google manually edit its results?" Let me just answer that with our third philosophy: no manual intervention. In our view, the web is built by people. You are the ones creating pages and linking to pages. We are using all this human contribution through our algorithms. The final ordering of the results is decided by our algorithms using the contributions of the greater Internet community, not manually by us. We believe that the subjective judgment of any individual is, well ... subjective, and information distilled by our algorithms from the vast amount of human knowledge encoded in the web pages and their links is better than individual subjectivity.
如果不提那个常见的,但是误导的问题:)任何关于Google搜索排名的讨论都是不完整的:“Google手动调整搜索结果吗?”我就用上面的第三条理念来回答这个问题:没有人工参与。在我们看来网络是大家建造起来的。你就是创建这些网页链接这些网页的人。我们的算法中就用到了所有这些大家的贡献。最终结果的排序是由我们的算法决定的,而不是人工调整的,这些算法利用的是巨大的互联网社区的贡献。我们认为任何个人的主观判断,都是主观的,我们的算法从包含在网页和其链接里的浩大的人类知识中提取出来的信息总是胜过个人的主观判断的。
The second reason we have a principle against manually adjusting our results is that often a broken query is just a symptom of a potential improvement to be made to our ranking algorithm. Improving the underlying algorithm not only improves that one query, it improves an entire class of queries, and often for all languages. I should add, however, that there are clear written policies for websites recommended by Google, and we do take action on sites that are in violation of our policies or for a small number of other reasons (e.g. legal requirements, child porn, viruses/malware, etc).
我们反对人为调整搜索结果理念的另一个原因是一个坏掉的关键词链接是可能会改进我们的排名算法的征兆。改进潜在的排名算法不仅对一个关键词的搜索结果有帮助,而且对一系列的关键词搜索结果都会有帮助,这种改进带来的好处还常常会体现在所有的语言上。然而我要补充一点,Google关于网站的条款非常清楚,对于触犯这些条款的网站和一些别的原因(比如:司法要求,儿童色情,病毒或恶意软件等)我们也会采取一些干预措施。
Stay tuned for my followup post, where I will discuss in detail the technologies behind our ranking and show examples of several state-of-the-art ranking techniques in action. Let me just conclude this post by saying, our passion for search is stronger than ever - and as a search researcher, I have the best job in the world :-).
请继续留意下面的帖子,我会详细的讨论排名背后的的技术并且会展示一些正在使用的顶尖的排名技术的例子。在这篇帖子结束的时候我想说,我们对搜索的热情比任何时候都强烈-作为一个搜索研究员,我有世界上最好的工作:)。
No comments:
Post a Comment