Rapid Advances in Computer Science and Opportunities for Society European CS Presentation, October 2010 Alfred Spector VP, Research and Special Initiatives
Rapid Advances in Computer Science & Opportunities for Society Abstract Information and Communication Technologies have had a rapid impact on society, and –amazingly—the pace of innovation continues to accelerate. This innovation is catalyzed by ever-increasing hardware and networking capabilities, the growth in internet usage, as well as important advances in basic and applied computer science. In this talk, I will describe some of the research that Google is undertaking (for example, in machine translation, semantic processing, and information management), and discuss some of the likely beneficial impacts on our society – for example, in science, the humanities, education, philanthropic activities, and more. I’ll conclude my presentation with some interesting challenges from both a technology and policy point of view.
Outline Google Prodigiousness Advances in the Field: examples Translation Speech Vision Cloud-based collaboration around structured-data Operations Research Semantic Processing Beneficial Societal Impacts: examples Earth Engine Google Health Other Health Efforts Crisis Response Digital Humanities Education A Technical Themes Challenges
Mission Organizing the world’s information and Making it universally accessible and useful .
Google and Commerce Over 1 million AdWords advertisers worldwide Over 1 million AdSense publishers worldwide Via the Google Ad Network, AdSense publishers reach over 80% of global internet users in 100 countries and 20 languages YouTube is monetizing over a billion video views per week globally In 2009, Google generated $54 billion of economic activity for American businesses, website publishers, and non-profits
Prodigiousness Giga 10 9 , Tera 10 12 , Peta 10 15 , Exa 10 18 , Zetta10 21 Publicized: Bigtable of 70 petabytes, 10M ops/sec. Warehouse computing possibilities? 100 x 10 x 20 x 20 x 40 = 16,000,000 nodes… Some representative numbers: Storage: 10 18 -> 10 20-21 Users: 10 9 -> 10 10 Devices: 10 ? -> 10 12 Network: 10 20 , now, ->10 21/yr 32 KB/sec. for 1B people Apps: 10 5 -> 10 6-7 or more E.g., embedded car systems: 30-50 ECUs, 100M lines of code
A variety of science engineering challenges
Focus on Innovation that Benefits our Users Focus on Research and Engineering Commitment to advancing technology Rich domain of work due to our mission Grand challenge problems Internal consensus that production issues are often as challenging/fun as pure invention Technical leverage 1. Google Common Distributed System 2. A Focus on Services 3. Empiricism and a Holistic Approach to Design
Our Innovation Culture Focus on talent Distributed across the organization Impacting Google necessitates broad, diverse involvement in science and engineering Research is done both in our research team and in our engineering organization, organized opportunistically Teams benefit greatly From mutual talent From Google’s comparative advantages to our scale and broad use From service-based architecture (“ease” of working in vivo )
Ideal Distributed Computing Device s
Research Challenges in Ideal Distributed Computing Alternative designs that would give better energy efficiency at lower utilization Server O.S. design aimed at many highly-connected machines in one building Unifying abstractions for exploiting parallelism beyond inter- transaction parallelism and map-reduce Latency reduction A general model of replication including consistency choices, explained and codified Machine learning techniques applied to monitoring/controlling such systems Automatic, dynamic world-wide placement of data & computation to minimize latency and/or cost, given constraints on Building retrieval systems that efficiently and usably deal with ACLs Holistic models of privacy The user interface to the user’s diverse processing and state
Totally Transparent Processing For all d in D, all l in L, all m in M, and all c in C C: The set of D: The set of all L: The set of all M: The set of all all corpora end-user access human languages modalities devices Personal Computers Current languages Text The normal web Phone Historical languages Image The deep web Media Players/Readers Other forms of human Audio Periodicals Telematics notation Video Books Set-top Boxes Possible language Graphics Catalogs Appliances specialization Other sensor-based data Blogs Health devices Formal languages … Geodata … … Scientific datasets Health data …
Totally Transparent Processing
“Hybrid” Intelligence To extend the capability of people, not in isolation Aggregation of empirical signal is exceedingly valuable Ex: Feedback in Information Retrieval; e.g., in ranking or spelling correction Machine learning; e.g., image content analysis, speech recognition with semi-supervised learning
Research Challenges in Transparent Computing & Hybrid Intelligence Endless applications, with very new user interface implications Addressing limits to data Techniques to integrate user-feedback in acceptable fashions Approaches to new signal Explanation, scale, and variance minimization in machine learning Information fusion/learning across diverse signals – The Combination Hypothesis, more generally Usability: devices and subpopulations Privacy
Domains of Application Search engines Translation Speech recognition Vision Remedial Education Personal health Epidemiology Economic prediction Societal/environmental optimization Social Networking in ever more clever/useful ways Humanities and Social Sciences Multi-player gaming
Translation
Machine Translation @ Google Statistical Machine Translation Model translation process with a statistical model Learning from data: monolingual & bilingual More data: better translation quality Computationally expensive approach Models have many hundreds of Gigabyte of data (Moore's law helps here) Applying syntax information as a signal Results: Much better translation quality Ongoing progress More research groups, ... 58 languages (so far) recently: Haitian Creole, Urdu, Georgian, ..., Latin
Grand Challenges Long-distance reordering : Morphology : 'simple' case: SVO, SOV translating into (one) approach: morphologically rich parse source & reorder languages e.g. Russian, Hungarian need: morphology-aware translation models Reliability: some translation mistakes more severe than others: hotel - Montreal Heath Ledger - Tom Cruise issue: parsing accuracy for Research: How to detect out-of-domain texts 'crazy' translations? Finding all Training Data
How about Poetry? Paper at EMNLP 2010 conference: “Poetic” Statistical Machine Translation: Rhyme and Meter, D.Genzel, J.Uszkoreit, F.Och, EMNLP , 2010. Approach: Enforce meter and rhyme as extra constraints (similar to language model) E.g. iambic pentameter: stress pattern 0101010101 Produce most 'probable' translation that obeys constraints ("Function follows form") Example output (couplet in amphibrachic tetrameter An of ficer sta ted that three were ar res ted and that the e quip ment is cur rently tes ted.
Speech
Goals for Speech Technology at Google Much of the world’s information is spoken – we need to recognize it before we can organize it: YouTube transcription and translation (breaking the language barrier for YouTube access) Voicemail transcription Mobile is the fastest growing and most widespread platform for communication and services that has ever existed Spoken input and output is key to usability Our goal is completely ubiquitous availability of speech i/o (every application/service, every usage scenario, every language) How do we get there? Delivery from the cloud – support constant iteration and refinement Operating at large scale – train huge statistical models on huge amounts of data
Training Acoustic Models w/Unsupervised Learning Learning from use - without human transcription Challenges: How do we grow the model to take advantage of the data? (richer models of accent, speaker, noise, etc.) Huge computational demands Infrastructure demands – parallelization – leverage Google software environment Supervised vs. unsupervised training - hours of data vs. error rate
Vision
Computer Vision Advance state-of-the art in 3 key areas of image/audio/video analysis and apply results to our multimedia products. Semantic Interpretation: Generate human understandable description of content. (eg. auto-tagging videos on YouTube, Image annotation, porn classification etc.) Matching: Find similar entities from a large corpus. (eg. "find similar" on image search, video fingerprinting for YouTube etc ). Synthesis: Generate better images/video by understanding the statistics of a large corpus of images. (eg. better facades in 3D building on Google Earth, automatic shadow removal from areal images etc.)
Semantic Interpretation sample problem - Video Annotation Video metadata has a cognitive cost on the user because they have to type it in, be careful about what keywords they use, and in general try to make their video searchable Many uploaders don’t have the motivation, or energy to provide proper metadata Noisy metadata hurts everyone – spam, misspellings, 1337, acronyms, etc.
Recommend
More recommend