Bing Search: An Engine in the Clouds Munich Munich Rablstr. - PowerPoint PPT Presentation

Bing Search: An Engine in the Clouds

Munich – Munich – Rablstr. Gewürzmühlstr. • Founded in January 2009 • Offices in London and Munich • More than 70 employees in total • Close collaborations with Microsoft Research in Redmond and Cambridge • Collaborations with various MSFT Product Groups (incl. Office, Skype, Windows, Xbox etc.) London - Cardinal Place • STC-E Munich: web ranking • Other STCs in Beijing, Hyderabad, and Silicon Valley

Applied Research branch of MSR working on: • • IT-Security • Data-privacy Enabli bling ng acquisi quisition tion of data ta on all class sses es of devices/se ces/sensor nsors, and d provi oviding ding industr stry y leadin ding g analyt alytic ics s and proce cess ssing ng capabil pabilitie ities s • Mobility from om the edge ge to cloud, d, leveragi raging: ng: • Mobile Solutions • Web-Services Deep technical Relationships with Strategic customer expertise engineering groups scenarios (Windows, Office 365 (8 years of experience in (T elemetry, Stream Analytics, Windows Azure) Early fault detection, platform and small devices) Predictive maintenance) http://research.microsoft.com/en-us/labs/atle/default.aspx

1 2 3 4 5

 Content Gathering  Crawling  Indexing  Matching Query Words to Content  Query modifications needed?  Ranking Results  Features used

 Comprehensiveness  Serving & discovery of hundreds of billions of documents  Frequency  Optimized towards freshness & politeness  Balance  Depth vs. breadth in the processing of document content

 Result counts are misleading  Intelligent document selection matters

 Balance  Frequency  Depth vs. breadth in the  Optimized towards processing of document freshness & politeness content

• Most components are easily parallelizable such as crawling and document processing • Developed on top of Private Cloud • Leveraging Cosmos as a highly scalable storage and computing system • Running under highly performant Datacenter Management System

 Petabyte Store and Computation System  About 62 physical petabytes stored (~275 logical petabytes stored in 2011)  Tens of thousands of computers across many datacenters  Massively parallel processing based on Dryad  Similar to MapReduce but can represent arbitrary DAGs of computation  Automatic computation placement with data  SCOPE (Structured Computation Optimized for Parallel Execution)  SQL-like language with set-oriented record and column manipulation  Automatically compiled and optimized for execution over Dryad  Management of hundreds of “Virtual Clusters” for computation allocation Source: http://research.microsoft.com/en-us/events/fs2011/helland_cosmos_big_data_and_big_challenges.pdf

 Linguistic alterations  Stemming, morphological variations, plurals  Spell Corrections  Approximately one third of queries contain misspellings  Synonyms  Used sparingly, only in high confidence situations

 Web represents Knowledge  Approximately one third of queries contain misspellings  Users often correct themselves  Patterns of common (and not so common) misspellings are discovered  Linguistic Depth  Less useful  Rules like “ i ” before “e” except after “c” do not scale  Instead develop computer models of correct spelling  Statistics & usage data help

• Heavy dependence on interaction logs ( queries, sessions …)  User behavior is king • Leveraging Cosmos for processing Billions of entries  Deriving query histogram  Deriving query reformulation graph • Leveraging Machine Learning for ranking suggestions  T raining word-level/contextual-level models

Create lots of features (attributes) 1. Calculate them per query/document pair 2. Let the computer learn how to rank with 3. millions/billions of examples Rinse & Repeat 4.

 Do the words appear in the title of the document?  Do the words appear in the order specified?  Are the words a substantial part of the title?  Are the words excessively repeated?  Is the title uncommonly long?

Good • Title: Spaghetti Western - Wikipedia • Title: Western Spaghetti Recipe – T aste of Not so Home good • Title: The Spaghetti Western Orchestra Not so Tickets 2013 - The Spaghetti Western good Orchestra Concert tour 2013 Tickets

Words in Title in Correct Order? yes no 1000s of other features… Is Title uncommonly long? no yes Are words repeated excessively? +4 yes no 1000s of other trees… -1 +2

• The document with highest cumulative score is ranked highest • Improvements to algorithms & features are made constantly & shipped frequently • Machine Learning allows for scalability across many dimensions

 Titles  Images  Document Content  Visual Layout  Links  Co-occurrences of words  Clicks  Freshness  Word Frequencies  Word Proximity … and many more

• Leveraging Hybrid Cloud Computing Platform • Built on top of heterogeneous computing resources: • Single box • Cosmos: Map/Reduce • MPI • … • Easy deployment/management of applications (modules) • Powerful graphical layer of abstraction defining data-workflow. • Batch mode allowing to scale across dimensions such as markets • Leveraging Machine Learning as a Service

Accelerating Feature computation through Field Programmable Gate Arrays Sources: http://www.altera.com/technology/system-design/articles/2014/cpu-architects.html http://www.enterprisetech.com/2014/09/03/microsoft-using-fpgas-speed-bing-search/

 Feature Fundamentals  Which features are “universal”, which are market -specific?  Multiple Queries  When does it make sense to issue multiple (altered) queries?  How does one merge the results for the user?  Anchor & Link Signals  Which links and which anchor text are informative?  What features need to be leveraged in this context?  Knowledge Modeling  View the task of ranking as a translation from “query language” to “document language”

• Global Ranking • Everything mentioned before … for International (no English -US, no CJK) • Improving, Evaluating and Shipping Rankers in hundreds of markets • Universal Ranker • One Team and One Ranker

http://www.bingiton.com/

Bing Search: An Engine in the Clouds Munich Munich Rablstr. - PowerPoint PPT Presentation

Bing Search: An Engine in the Clouds Munich Munich Rablstr. Gewrzmhlstr. Founded in January 2009 Offices in London and Munich More than 70 employees in total Close collaborations with Microsoft Research in Redmond and

Lord of the Bing Taking Back Search Engine Hacking From Google and Bing 29 July 2010 Presented

search engine optimization ABOUT ME HOLISTIC SEARCH 2.0 ECOSYSTEM eRetail Search Platform

Travel Information Search on the Internet: An Exploratory Study Bing Pan National Laboratory for

Information Retrieval and Web Search Salvatore Orlando Bing Liu. Web Data Mining: Exploring

Social Tool Minimax Digital Package - Google Search Network Ads - Bing Search Network Ads -

Slow Down to Go Fast: Lessons Learned Shipping Bing Voice Search on Xbox James Waletzky Director

HeWang May19 th ,2010 Question Do you know whats the most powerful search

Automated Search Engine & Social Advertising Pay Per Click advertising, (also known as PPC) is

Evolutionary Computation for Feature Selection and Feature Construction Bing Xue School of

Chapter 16: Entity Search and Question Answering -- Amit Singhal Things, not Strings! It dont

SADedupe: Skew Area Inline Deduplication for Distributed Storage Binqi Zhang , Bing Bing Zhou,

Bing Agility Bing Agility MODERN ENGINEERING PRINCIPLES FOR LARGE SCALE TEAMS AND SERVICES

Distilling Knowledge for Search-based Structured Prediction Yijia Liu*, Wanxiang Che, Huaipeng

Transformation Networks for Target-Oriented Sentiment Classification 1 Xin Li 1 , Lidong Bing 2 ,

Recap: Search Example: Pancake Problem Search problem:

Informed search algorithms Outline Best-first search Greedy best-first search A *

Search Overview Introduction to Search Blind Search Techniques Heuristic Search

Pe PeriScope: A : An E n Effect ectiv ive P e Probing bing and F and Fuz uzzing F ing

Search 3 AI Slides (5e) c Lin Zuoquan@PKU 2003-2019 3 1 3 Search 3.1 Problem-solving

Types of Environments Goal Based Agents Plan ahead Fully observable vs. partially

Search Summary Search Summary Some material from: D Lin, J You, JC Latombe 1 Search Summary #

search for 8\h is 779 262 727 L 97 < 27 179 , 62 - Search is a generalization of BST search

Measurements of VBS (and other diboson processes) Bing ng Li Li on behalf of ATLAS & CMS

Heuristic (Informed) search strategy Search Algorithm #2 Search SEARCH#2 1.

Bing Search: An Engine in the Clouds Munich Munich Rablstr. - PowerPoint PPT Presentation

Bing Search: An Engine in the Clouds Munich Munich Rablstr. Gewrzmhlstr. Founded in January 2009 Offices in London and Munich More than 70 employees in total Close collaborations with Microsoft Research in Redmond and

Lord of the Bing Taking Back Search Engine Hacking From Google and Bing 29 July 2010 Presented

search engine optimization ABOUT ME HOLISTIC SEARCH 2.0 ECOSYSTEM eRetail Search Platform

Travel Information Search on the Internet: An Exploratory Study Bing Pan National Laboratory for

Information Retrieval and Web Search Salvatore Orlando Bing Liu. Web Data Mining: Exploring

Social Tool Minimax Digital Package - Google Search Network Ads - Bing Search Network Ads -

Slow Down to Go Fast: Lessons Learned Shipping Bing Voice Search on Xbox James Waletzky Director

HeWang May19 th ,2010 Question Do you know whats the most powerful search

Automated Search Engine &amp; Social Advertising Pay Per Click advertising, (also known as PPC) is

Evolutionary Computation for Feature Selection and Feature Construction Bing Xue School of

Chapter 16: Entity Search and Question Answering -- Amit Singhal Things, not Strings! It dont

SADedupe: Skew Area Inline Deduplication for Distributed Storage Binqi Zhang , Bing Bing Zhou,

Bing Agility Bing Agility MODERN ENGINEERING PRINCIPLES FOR LARGE SCALE TEAMS AND SERVICES

Distilling Knowledge for Search-based Structured Prediction Yijia Liu*, Wanxiang Che, Huaipeng

Transformation Networks for Target-Oriented Sentiment Classification 1 Xin Li 1 , Lidong Bing 2 ,

Recap: Search Example: Pancake Problem Search problem:

Informed search algorithms Outline Best-first search Greedy best-first search A *

Search Overview Introduction to Search Blind Search Techniques Heuristic Search

Pe PeriScope: A : An E n Effect ectiv ive P e Probing bing and F and Fuz uzzing F ing

Search 3 AI Slides (5e) c Lin Zuoquan@PKU 2003-2019 3 1 3 Search 3.1 Problem-solving

Types of Environments Goal Based Agents Plan ahead Fully observable vs. partially

Search Summary Search Summary Some material from: D Lin, J You, JC Latombe 1 Search Summary #

search for 8\h is 779 262 727 L 97 &lt; 27 179 , 62 - Search is a generalization of BST search

Measurements of VBS (and other diboson processes) Bing ng Li Li on behalf of ATLAS &amp; CMS

Heuristic (Informed) search strategy Search Algorithm #2 Search SEARCH#2 1.

Automated Search Engine & Social Advertising Pay Per Click advertising, (also known as PPC) is

search for 8\h is 779 262 727 L 97 < 27 179 , 62 - Search is a generalization of BST search

Measurements of VBS (and other diboson processes) Bing ng Li Li on behalf of ATLAS & CMS