Scalable Learning Technologies Scalable Learning Technologies for - PowerPoint PPT Presentation

Distant Supervision Distant Supervision ● Sentiment Analysis: ● Look for Twitter tweets with emoticons like “:)”, “:(“ ● Remove emoticons. Then use as training data! Crimson Hexagon

Representation Representation Learning to Better Learning to Better Exploit Big Data Exploit Big Data

Representations Representations Representations Representations Image: David Warde-Farley via Bengio et al. Deep Learning Book

Representations Representations Representations Representations Note sharing Note sharing between classes between classes Inputs Bits: 0011001….. Images: Marc'Aurelio Ranzato

Representations Representations Representations Representations Massive improvements in Massive improvements in image object recognition (human-level?), image object recognition (human-level?), speech recognition. speech recognition. Good improvements in Good improvements in NLP and IR-related tasks. NLP and IR-related tasks. Inputs Bits: 0011001….. Images: Marc'Aurelio Ranzato

Example Example Example Example Google's image Source: Jeff Dean, Google

Inspiration: The Brain Inspiration: The Brain Inspiration: The Brain Inspiration: The Brain Input: delivered via dendrites Input: delivered via dendrites from other neurons from other neurons Processing: Processing: Synapses may alter input signals. Synapses may alter input signals. The cell then combines all input The cell then combines all input signals signals Output: If enough activation Output: If enough activation from inputs, output signal sent from inputs, output signal sent through a long cable (“axon”) through a long cable (“axon”) Source: Alex Smola

Perceptron Perceptron Perceptron Perceptron Input: Features Input: Features Every feature f i gets a weight w i . Every feature f i gets a weight w i . w 1 feature weight Feature f 1 dog 7.2 w 2 Feature f 2 food 3.4 Neuron w 3 bank -7.3 Feature f 3 delicious 1.5 w 4 Feature f 4 train -4.2

Perceptron Perceptron Perceptron Perceptron t f ( x ) Activation of Neuron Activation of Neuron a ( x )= ∑ w i f i ( x )= w Multiply the feature values Multiply the feature values of an object x of an object x i with the feature weights. with the feature weights. w 1 Feature f 1 w 2 Feature f 2 Neuron Output w 3 Feature f 3 w 4 Feature f 4

Perceptron Perceptron Perceptron Perceptron t f ( x )+ b ) o u t p u t ( x )= g ( w Output of Neuron Output of Neuron Check if activation exceeds Check if activation exceeds a threshold t = –b a threshold t = –b w 1 Feature f 1 e.g. g could return w 2 1 (positive) if positive, Feature f 2 -1 otherwise Neuron Output w 3 Feature f 3 w 4 Feature f 4 e.g. e.g. 1 for “spam”, 1 for “spam”, -1 for “not-spam” -1 for “not-spam”

Decision Surfaces Decision Surfaces Decision Surfaces Decision Surfaces Kernel-based Classifiers Linear Classifiers Decision Trees (Kernel Perceptron, (Perceptron, SVM) Kernel SVM) Only straight Only straight Not max-margin Not max-margin Multi-Layer Perceptron decision surface decision surface Any decision Any decision Images: Vibhav Gogate surface surface

Deep Learning: Deep Learning: Deep Learning: Deep Learning: Multi-Layer Perceptron Multi-Layer Perceptron Multi-Layer Perceptron Multi-Layer Perceptron Feature f 1 Neuron 1 Feature f 2 Neuron Output Feature f 3 Neuron 2 Feature f 4 Hidden Layer Output Layer Input Layer

Deep Learning: Deep Learning: Deep Learning: Deep Learning: Multi-Layer Perceptron Multi-Layer Perceptron Multi-Layer Perceptron Multi-Layer Perceptron Feature f 1 Neuron 1 Feature f 2 Neuron Output Feature f 3 Neuron 2 Feature f 4 Neuron 2 Output Layer Hidden Layer Input Layer

Deep Learning: Deep Learning: Deep Learning: Deep Learning: Multi-Layer Perceptron Multi-Layer Perceptron Multi-Layer Perceptron Multi-Layer Perceptron Feature f 1 Neuron 1 Feature f 2 Neuron Output 1 Feature f 3 Neuron 2 Feature f 4 Neuron Output 2 Neuron 2 Output Layer Hidden Layer Input Layer

Deep Learning: Deep Learning: Deep Learning: Deep Learning: Multi-Layer Perceptron Multi-Layer Perceptron Multi-Layer Perceptron Multi-Layer Perceptron Input Layer (Feature Extraction) f ( x ) Single-Layer: output ( x )= g ( W f ( x )+ b ) Three-Layer Network: output ( x )= g 2 ( W 2 g 1 ( W 1 f ( x )+ b 1 )+ b 2 ) Four-Layer Network: output ( x )= g 3 ( W 3 g 2 ( W 2 g 1 ( W 1 f ( x )+ b 1 )+ b 2 )+ b 3 )

Deep Learning: Deep Learning: Deep Learning: Deep Learning: Multi-Layer Perceptron Multi-Layer Perceptron Multi-Layer Perceptron Multi-Layer Perceptron

Deep Learning: Deep Learning: Deep Learning: Deep Learning: Computing the Output Computing the Output Computing the Output Computing the Output Simply Simply Output z 2 y 1 evaluate the evaluate the Input x 1 output function output function z 1 (for each node, (for each node, compute an compute an Input x 2 Output output based on z 3 output based on y 2 the node inputs) the node inputs)

Deep Learning: Deep Learning: Deep Learning: Deep Learning: Training Training Training Training Compute error Compute error on output, on output, Output if non-zero, z 2 if non-zero, y 1 do a stochastic Input x 1 do a stochastic gradient step gradient step z 1 on the error on the error function to fix it function to fix it Input x 2 Output z 3 y 2 Backpropagation Backpropagation The error is The error is propagated back propagated back from output from output nodes towards nodes towards the input layer the input layer

Deep Learning: Deep Learning: Deep Learning: Deep Learning: Training Training Training Training Compute error Compute error We are interested in the gradient, i.e. the partial derivatives for the on output, on output, output function z=g(y) z=g(y) if non-zero, if non-zero, with respect to all [inputs and] do a stochastic do a stochastic weights, including those at a ∂ z gradient step gradient step deeper part of the network ∂ y on the error on the error function to fix it y=f(x) function to fix it Exploit the chain rule to compute ∂ y the gradient ∂ x Backpropagation Backpropagation ∂ x = ∂ z ∂ z ∂ y x ∂ y ∂ x The error is The error is propagated back propagated back from output from output nodes towards nodes towards the input layer the input layer

DropOut Technique DropOut Technique DropOut Technique DropOut Technique Basic Idea Basic Idea While training, randomly drop inputs (make the feature zero) While training, randomly drop inputs (make the feature zero) Effect Effect Training on variations of original training data (artificial increase Training on variations of original training data (artificial increase of training data size). of training data size). Trained network relies less on the existence of specific features. Trained network relies less on the existence of specific features. Reference: Hinton et al. (2012) Also: Maxout Networks by Goodfellow et al. (2013)

Deep Learning: Deep Learning: Deep Learning: Deep Learning: Convolutional Neural Networks Convolutional Neural Networks Convolutional Neural Networks Convolutional Neural Networks Reference: Yann LeCun's work Image: http://torch.cogbits.com/doc/tutorials_supervised/

Deep Learning: Deep Learning: Deep Learning: Deep Learning: Recurrent Neural Networks Recurrent Neural Networks Recurrent Neural Networks Recurrent Neural Networks Source: Bayesian Behavior Lab, Northwestern University

Deep Learning: Deep Learning: Deep Learning: Deep Learning: Recurrent Neural Networks Recurrent Neural Networks Recurrent Neural Networks Recurrent Neural Networks Then can do backpropagation. Then can do backpropagation. Challenge: Vanishing/Exploding gradients Challenge: Vanishing/Exploding gradients Source: Bayesian Behavior Lab, Northwestern University

Deep Learning: Deep Learning: Deep Learning: Deep Learning: Long Short Term Memory Networks Long Short Term Memory Networks Long Short Term Memory Networks Long Short Term Memory Networks Source: Bayesian Behavior Lab, Northwestern University

Deep Learning: Deep Learning: Deep Learning: Deep Learning: Long Short Term Memory Networks Long Short Term Memory Networks Long Short Term Memory Networks Long Short Term Memory Networks Deep LSTMs for Sequence-to-sequence Learning Suskever et al. 2014 (Google)

Deep Learning: Deep Learning: Deep Learning: Deep Learning: Long Short Term Memory Networks Long Short Term Memory Networks Long Short Term Memory Networks Long Short Term Memory Networks French Original: La dispute fait rage entre les grands constructeurs aéronautiques ̀ propos de la largeur des sìges de la classe touriste sur les vols long-courriers, ouvrant la voie ̀ une confrontation am̀re lors du salon aéronautique de Dubä qui a lieu de mois-ci. LSTM's English Translation: The dispute is raging between large aircraft manufacturers on the size of the tourist seats on the long-haul flights, leading to a bitter confrontation at the Dubai Airshow in the month of October. Ground Truth English Translation: A row has flared up between leading plane makers over the width of tourist-class seats on long-distance flights, setting the tone for a bitter confrontation at this Month's Dubai Airshow. Suskever et al. 2014 (Google)

Deep Learning: Deep Learning: Deep Learning: Deep Learning: Neural Turing Machines Neural Turing Machines Neural Turing Machines Neural Turing Machines Source: Bayesian Behavior Lab, Northwestern University

Deep Learning: Deep Learning: Deep Learning: Deep Learning: Neural Turing Machines Neural Turing Machines Neural Turing Machines Neural Turing Machines Learning to sort! Learning to sort! Vectors for numbers are random

Big Data in Big Data in Feature Engineering Feature Engineering and and Representation Learning Representation Learning

Web Semantics: Web Semantics: Statistics from Big Data as Features Statistics from Big Data as Features ● L a n g u a g e M o d e l s f o r A u t o c o m p l e t i o n

Word Segmentation Word Segmentation Source: Wang et al. An Overview of Microsoft Web N-gram Corpus and Applications

Parsing: Ambiguity Parsing: Ambiguity ● N P C o o r d i n a t i o n Source: Bansal & Klein (2011)

Parsing: Web Semantics Parsing: Web Semantics Source: Bansal & Klein (2011)

Adjective Ordering Adjective Ordering ● Lapata & Keller (2004): The Web as a Baseline (also: Bergsma et al. 2010) ● “big fat Greek wedding” but not “fat Greek big wedding” Source: Shane Bergsma

Coreference Resolution Coreference Resolution Source: Bansal & Klein 2012

Distributional Semantics Distributional Semantics ● Data Sparsity: Brown Corpus E.g. most words are rare (in the “long tail”) Source: Baroni → Missing in training data & Evert ● S o l u t i o n ( Blitzer et al. 2006, Koo & Collins 2008, Huang & Yates 2009, etc.) ● Cluster together similar features ● Use clustered features instead of / in addition to original features

Spelling Correction Spelling Correction Even worse: Arnold Schwarzenegger

Vector Representations Vector Representations Put words into Put words into a vector space a vector space x bird petronia x x (e.g. with d=300 (e.g. with d=300 dimensions) dimensions) sparrow parched arid x x x dry

Word Vector Representations Word Vector Representations Tomas Mikolov et al. Proc. ICLR 2013. Available from https://code.google.com/p/word2vec/

Wikipedia Wikipedia

Text Simplification Text Simplification ● E x p l o i t e d i t h i s t o r y , especially on Simple English Wikipedia ● “collaborate” → “work together” “stands for” → “is the same as”

Answering Questions IBM's Jeopardy!-winning Watson system Gerard de Melo

Knowledge Integration

UWN/MENTA multilingual extension of WordNet for word senses and taxonomical information over 200 languages www.lexvo.org/uwn/

WebChild: WebChild: Common-Sense Knowledge Common-Sense Knowledge WebChild WebChild AAAI 2014 AAAI 2014 WSDM 2014 WSDM 2014 AAAI 2011 AAAI 2011

Challenge: From Really Big Data Challenge: From Really Big Data Challenge: From Really Big Data Challenge: From Really Big Data to Real Insights to Real Insights to Real Insights to Real Insights Image: Brett Ryder

Big Data Mining Big Data Mining in Practice in Practice

Gerard de Melo (Tsinghua University, Bejing China) Aparna Varde (Montclair State University, NJ, USA) DASFAA, Hanoi, Vietnam, April 2015 1

Dr. Aparna Varde 2

 Internet-based computing - shared resources, software & data provided on demand, like the electricity grid  Follows a pay-as-you-go model 3

 Several technologies, e.g., MapReduce & Hadoop  MapReduce: Data-parallel programming model for clusters of commodity machines • Pioneered by Google • Processes 20 PB of data per day  Hadoop: Open-source framework, distributed storage and processing of very large data sets • HDFS (Hadoop Distributed File System) for storage • MapReduce for processing • Developed by Apache 4

• Scalability – To large data volumes – Scan 100 TB on 1 node @ 50 MB/s = 24 days – Scan on 1000-node cluster = 35 minutes • Cost-efficiency – Commodity nodes (cheap, but unreliable) – Commodity network (low bandwidth) – Automatic fault-tolerance (fewer admins) – Easy to use (fewer programmers) 5

 Data type key-value records  Map function (K in , V in )  list(K inter , V inter )  Reduce function (K inter , list(V inter ))  list(K out , V out ) 6

MapReduce Example Input Map Shuffle & Sort Reduce Output the, 1 brown, 1 the quick brown, 2 fox, 1 Map brown fox, 2 fox Reduce how, 1 now, 1 the, 1 the, 3 fox, 1 the, 1 the fox Map ate the quick, 1 mouse how, 1 ate, 1 now, 1 ate, 1 cow, 1 brown, 1 Reduce mouse, 1 mouse, 1 how now Map quick, 1 brown cow, 1 cow 7

 40 nodes/rack, 1000-4000 nodes in cluster  1 Gbps bandwidth in rack, 8 Gbps out of rack  Node specs (Facebook): 8-16 cores, 32 GB RAM, 8 × 1.5 TB disks, no RAID Aggregation switch Rack switch 8

 Files split into 128MB blocks Namenode  Blocks replicated across File1 1 several data nodes (often 3) 2 3  Name node stores metadata 4 (file names, locations, etc)  Optimized for large files, sequential reads  Files are append-only 1 2 1 3 2 1 4 2 4 3 3 4 Datanodes 9

 Hive: Relational D/B on Hadoop developed at Facebook  Provides SQL-like query language 10

 Supports table partitioning, complex data types, sampling, some query optimization  These help discover knowledge by various tasks, e.g., • Search for relevant terms • Operations such as word count • Aggregates like MIN, AVG 11

/* Find documents of enron table with word frequencies within range of 75 and 80 */ SELECT DISTINCT D.DocID FROM docword_enron D WHERE D.count > 75 and D.count < 80 limit 10; OK 1853… 11578 16653 Time taken: 64.788 seconds 12

/* Create a view to find the count for WordID=90 and docID=40, for the nips table */ CREATE VIEW Word_Freq AS SELECT D.DocID, D.WordID, V.word, D.count FROM docword_Nips D JOIN vocabNips V ON D.WordID=V.WordID AND D.DocId=40 and D.WordId=90; OK Time taken: 1.244 seconds 13

/* Find documents which use word "rational" from nips table */ SELECT D.DocID,V.word FROM docword_Nips D JOIN vocabnips V ON D.wordID=V.wordID and V.word="rational" LIMIT 10; OK 434 rational 275 rational 158 rational …. 290 rational 422 rational Time taken: 98.706 seconds 14

/* Find average frequency of all words in the enron table */ SELECT AVG(count) FROM docWord_enron; OK 1.728152608060543 Time taken: 68.2 seconds 15

Query Execution Time for HQL & MySQL on big data sets Similar claims for other SQL packages 16

Server Storage Capacity Max Storage per instance 17

Scalable Learning Technologies Scalable Learning Technologies for - PowerPoint PPT Presentation

DASFAA 2015 Hanoi Tutorial DASFAA 2015 Hanoi Tutorial Scalable Learning Technologies Scalable Learning Technologies for Big Data Mining for Big Data Mining Gerard de Melo, Tsinghua University Gerard de Melo, Tsinghua University

Cache Coherence in Scalable Machines Scalable Cache Coherent Systems Scalable, distributed

Scalable String Matching on the Scalable String Matching on the Scalable String Matching on the

Technologies : Retour sur le Futur ? Technologies : Retour sur le Futur ? Technologies : Retour

BBC Technologies: Our LATAM Experience Who are BBC Technologies? BBC Technologies Where we are

ZEBRA TECHNOLOGIES ZEBRA TECHNOLOGIES DevTalk - Enterprise Browser 2.5 Darryn Campbell SW

Building scalable IoT apps using OSS technologies Pavel Hardak Basho Technologies Disclaimer:

The Scalable Commutativity Rule: Designing Scalable Software for Multicore Processors Austin T.

Dyninst Scalable Tools Workshop Granlibakken Resort Lake Tahoe, California Dyninst Scalable

Scalable Distributed Lineage Authentication Ashish Gehani Scalable Distributed Lineage

What is mobile learning, mobile learning policies and technologies Dr. Mohamed Ally Learning

THANK YOU FOR YOUR INTEREST IN POKE YOKE TECHNOLOGIES POKE YOKE TECHNOLOGIES AT A GLANCE POKE

ZEBRA TECHNOLOGIES ZEBRA TECHNOLOGIES DevTalk Whats new for Zebra developers in Android 10

Lunch & Learn Veronica Trammell Executive Director, Learning Technologies, Training &

The Use of Assistive Technologies as Learning Technologies to Facilitate Flexible Learning in

Scalable Video Scalable Video Bishoy Gamil Stefanos Outline Outline Introduction

WSO2 Message Broker Scalable persistent Messaging System Outline Messaging Scalable

Between complex predicates and regular phrases: German collocational clusters Philippa Cook

Evaluating a German Sketch Grammar: A Case Study on Noun Phrase Case Kremena Ivanova , Ulrich

Neural evidence for a single lexicogrammatical processing system Jennifer Hughes

Merging Classifiers of Different Classification Approaches Incremental Classification, Concept

Tax Litigation in the current climate Amanda Hardy QC Oliver Marre 22 nd May 2020 Tel: 0207 242

Welcome GiPHouse Spring 2019 Employees GiPHouse Student Project Company (founded in 1992)

genomic medicine programs: Lessons from EGAPP Ned Calonge, MD, MPH Chair, EGAPP Working Group

A Corpus for Evidence Based Medicine Summarisation Diego Moll a Centre for Language

Sambuz

Useful Links

Newsletter

Mail Us

Scalable Learning Technologies Scalable Learning Technologies for - PowerPoint PPT Presentation

DASFAA 2015 Hanoi Tutorial DASFAA 2015 Hanoi Tutorial Scalable Learning Technologies Scalable Learning Technologies for Big Data Mining for Big Data Mining Gerard de Melo, Tsinghua University Gerard de Melo, Tsinghua University

Cache Coherence in Scalable Machines Scalable Cache Coherent Systems Scalable, distributed

Scalable String Matching on the Scalable String Matching on the Scalable String Matching on the

Technologies : Retour sur le Futur ? Technologies : Retour sur le Futur ? Technologies : Retour

BBC Technologies: Our LATAM Experience Who are BBC Technologies? BBC Technologies Where we are

ZEBRA TECHNOLOGIES ZEBRA TECHNOLOGIES DevTalk - Enterprise Browser 2.5 Darryn Campbell SW

Building scalable IoT apps using OSS technologies Pavel Hardak Basho Technologies Disclaimer:

The Scalable Commutativity Rule: Designing Scalable Software for Multicore Processors Austin T.

Dyninst Scalable Tools Workshop Granlibakken Resort Lake Tahoe, California Dyninst Scalable

Scalable Distributed Lineage Authentication Ashish Gehani Scalable Distributed Lineage

What is mobile learning, mobile learning policies and technologies Dr. Mohamed Ally Learning

THANK YOU FOR YOUR INTEREST IN POKE YOKE TECHNOLOGIES POKE YOKE TECHNOLOGIES AT A GLANCE POKE

ZEBRA TECHNOLOGIES ZEBRA TECHNOLOGIES DevTalk Whats new for Zebra developers in Android 10

Lunch &amp; Learn Veronica Trammell Executive Director, Learning Technologies, Training &amp;

The Use of Assistive Technologies as Learning Technologies to Facilitate Flexible Learning in

Scalable Video Scalable Video Bishoy Gamil Stefanos Outline Outline Introduction

WSO2 Message Broker Scalable persistent Messaging System Outline Messaging Scalable

Between complex predicates and regular phrases: German collocational clusters Philippa Cook

Evaluating a German Sketch Grammar: A Case Study on Noun Phrase Case Kremena Ivanova , Ulrich

Neural evidence for a single lexicogrammatical processing system Jennifer Hughes

Merging Classifiers of Different Classification Approaches Incremental Classification, Concept

Tax Litigation in the current climate Amanda Hardy QC Oliver Marre 22 nd May 2020 Tel: 0207 242

Welcome GiPHouse Spring 2019 Employees GiPHouse Student Project Company (founded in 1992)

genomic medicine programs: Lessons from EGAPP Ned Calonge, MD, MPH Chair, EGAPP Working Group

A Corpus for Evidence Based Medicine Summarisation Diego Moll a Centre for Language

Sambuz

Useful Links

Newsletter

Mail Us

Lunch & Learn Veronica Trammell Executive Director, Learning Technologies, Training &