November 15 th , 2017 Natural Language Processing Startup Joint - PowerPoint PPT Presentation

DATA130006 Text Management and Analysis Sequence Labeling 魏忠钰复旦大学大数据学院 School of Data Science, Fudan University November 15 th , 2017

Natural Language Processing Startup ▪ 深度好奇

Joint Distributions ▪ A joint distribution over a set of random variables specifies a real number for each assignment (or outcome ): T W P hot sun 0.4 ▪ Must obey: hot rain 0.1 cold sun 0.2 cold rain 0.3 ▪ Size of distribution if n variables with domain sizes d? ▪ Impractical to write out!

Marginal Distributions ▪ Marginal distributions are sub-tables which eliminate variables ▪ Marginalization (summing out): Combine collapsed rows by adding T P T W P hot 0.5 hot sun 0.4 cold 0.5 hot rain 0.1 cold sun 0.2 W P cold rain 0.3 sun 0.6 rain 0.4

Conditional Probabilities ▪ A simple relation between joint and conditional probabilities ▪ In fact, this is taken as the definition of a conditional probability P(a,b) T W P hot sun 0.4 P(a) P(b) hot rain 0.1 cold sun 0.2 cold rain 0.3

Conditional Independence ▪ Unconditional (absolute) independence very rare ▪ Conditional independence is our most basic and robust form of knowledge about uncertain environments. ▪ X is conditionally independent of Y given Z if and only if: or, equivalently, if and only if

Outline ▪ Markov Model ▪ Hidden Markov Model ▪ Hidden Markov Model for Sequence Labeling ▪ Maximum Entropy Markov Model for Sequence Labeling

Markov Model ▪ Value of X at a given time is called the state X 1 X 2 X 3 X 4 ▪ Parameters: called transition probabilities or dynamics, specify how the state evolves over time (also, initial state probabilities) ▪ Stationarity assumption: transition probabilities the same at all times

Joint Distribution of a Markov Model X 1 X 2 X 3 X 4 ▪ Joint distribution: ▪ More generally:

Example Markov Chain: Weather ▪ States: X = {rain, sun} ▪ Initial distribution: 1.0 sun ▪ CPT P(X t | X t-1 ): Two new ways of representing the same CPT X t-1 X t P(X t |X t-1 ) 0.9 0.3 sun sun 0.9 0.9 sun sun rain sun sun rain 0.1 0.1 0.3 rain sun 0.3 rain rain rain rain 0.7 0.7 0.7 0.1

Mini-Forward Algorithm ▪ Question: What’ s P(X) on some day t? X 1 X 2 X 3 X 4 Forward simulation

Example Run of Forward Algorithm X t-1 X t P(X t |X t-1 ) sun sun 0.9 sun rain 0.1 rain sun 0.3 ▪ From initial observation of sun rain rain 0.7 P( X 4 ) P( X 1 ) P( X 2 ) P( X 3 ) P( X  ) ▪ From initial observation of rain P( X 1 ) P( X 2 ) P( X 3 ) P( X 4 ) P( X  ) ▪ From yet another initial distribution P(X 1 ): … P( X 1 ) P( X  )

Stationary Distributions ▪ For most chains: ▪ Stationary distribution: ▪ Influence of the initial distribution ▪ The distribution we end up gets less and less over time. with is called the stationary distribution of the chain ▪ The distribution we end up in is ▪ It satisfies independent of the initial distribution

Example: Stationary Distributions ▪ Question: What’ s P(X) at time t = infinity? X 1 X 2 X 3 X 4 X t-1 X t P(X t |X t-1 ) sun sun 0.9 sun rain 0.1 rain sun 0.3 rain rain 0.7 Also:

Stationary Distribution for Weblink analysis ▪ PageRank over a web graph ▪ Each web page is a state ▪ Initial distribution: uniform over pages ▪ Transitions: ▪ With prob. c, uniform jump to a random page (dotted lines, not all shown) ▪ With prob. 1-c, follow a random outlink (solid lines) ▪ Stationary distribution ▪ Will spend more time on highly reachable pages ▪ Somewhat robust to link spam ▪ Google 1.0 returned the set of pages containing all your keywords in decreasing rank, now all search engines use link analysis along with many other factors (rank actually getting less important over time)

Text as a Graph ▪ Node stands for sentences ▪ Edge stands for similarity

Centrality-based Summarization ▪ Assumption: The centrality of the node is an indication of its importance ▪ Representation: Connectivity Matrix based on intro- sentence cosine similarity ▪ Extraction Mechanism ▪ Compute PageRank score for every sentence u ▪ Extract k sentences with the highest PageRanks score

Hidden Markov Model ▪ Hidden Markov models (HMMs) ▪ Underlying Markov chain over states X ▪ You observe outputs (effects) at each time step X 1 X 2 X 3 X 4 X 5 E 1 E 2 E 3 E 4 E 5

Example: Weather HMM Rain t-1 Rain t+1 Rain t R t R t+1 P(R t+1 |R t ) Umbrella t-1 Umbrella t-1 Umbrella t-1 +r +r 0.7 +r -r 0.3 -r +r 0.3 ▪ An HMM is defined by: -r -r 0.7 ▪ Initial distribution: R t U t P(U t |R t ) ▪ Transitions: +r +u 0.9 ▪ Emissions: +r -u 0.1 -r +u 0.2 -r -u 0.8

Conditional Independence ▪ HMMs have two important independence properties: ▪ Markov hidden process: future depends on past via the present ▪ Current observation independent of all else given current state X 1 X 2 X 3 X 4 X 5 E 1 E 2 E 3 E 4 E 5 ▪ Does this mean that evidence variables are guaranteed to be independent? ▪ [No, they tend to correlated by the hidden state]

Chain Rule and HMMs ▪ From the chain rule, every joint distribution over can be written as: ▪ Assuming that for all t : ▪ State independent of all past states and all past evidence given the previous state, i.e.: ▪ Evidence is independent of all past states and all past evidence given the current state, i.e.: So, we have:

Tasks for HMM ▪ Filtering ▪ Computing the belief state — the posterior distribution over the most recent state — given all evidence to date. ▪ 𝑸 （ 𝒀 𝒖 |𝒇 𝟐:𝒖 ） ▪ Prediction ▪ Computing the posterior distribution over the future state, given all evidence to date. ▪ 𝑸 （ 𝒀 𝒖+𝒍 |𝒇 𝟐:𝒖 ） ▪ Smoothing ▪ Computing the posterior distribution over a past state, given all evidence up to the present. ▪ 𝑸 （ 𝒀 𝒍 |𝒇 𝟐:𝒖 ） ▪ Most Likely Explanation ▪ Given a sequence of observations, find the sequence of states that is most likely to have generated those observations.

Real HMM Examples ▪ Speech recognition HMMs: ▪ Observations are acoustic signals (continuous valued) ▪ States are specific positions in specific words (so, tens of thousands) ▪ Machine translation HMMs: ▪ Observations are words (tens of thousands) ▪ States are translation options ▪ Robot tracking: ▪ Observations are range readings (continuous) ▪ States are positions on a map (continuous)

Filtering / Monitoring ▪ Filtering, or monitoring, is the task of tracking the distribution B t (X) = P t (X t | e 1 , …, e t ) (the belief state) over time ▪ We start with B 1 (X) in an initial setting, usually uniform ▪ As time passes, or we get observations, we update B(X) ▪ The Kalman filter was invented in the 60’ s and first implemented as a method of trajectory estimation for the Apollo program

Inference: Base Cases X 1 X 1 X 2 E 1

Passage of Time ▪ Assume we have current belief P(X | evidence to date) X 1 X 2 ▪ Then, after one time step passes: ▪ Or compactly:

Observation ▪ Assume we have current belief P(X | previous evidence): ▪ Then, after evidence comes in: ▪ Or, compactly:

The Forward Algorithm ▪ We are given evidence at each time and want to know ▪ We can derive the following updates We can normalize as we go if we want to have P(x|e) at each time step, or just once at the end…

Online Belief Updates ▪ Every time step, we start with current P(X | evidence) ▪ We update for time: ▪ We update for evidence: ▪ The forward algorithm does both at once X 2 X 1 X 2 E 2

In-class Quiz R t R t+1 P(R t+1 |R t ) +r +r 0.7 +r -r 0.3 B(+r) = 0.5 B(+r) B(+r) B(-r) = 0.5 B(-r) B(-r) -r +r 0.3 -r -r 0.7 Rain 0 Rain 1 Rain 2 R t U t P(U t |R t ) +r +u 0.9 +r -u 0.1 Umbrella 1 Umbrella 2 -r +u 0.2 -r -u 0.8

quiz: Weather HMM R t R t+1 P(R t+1 |R t ) B’(+r) = 0.5 B’(+r) = 0.627 B’( -r) = 0.5 B’( -r) = 0.373 +r +r 0.7 +r -r 0.3 -r +r 0.3 B(+r) = 0.818 B(+r) = 0.5 B(+r) = 0.883 B(-r) = 0.182 B(-r) = 0.5 B(-r) = 0.117 -r -r 0.7 Rain 0 Rain 1 Rain 2 R t U t P(U t |R t ) +r +u 0.9 +r -u 0.1 Umbrella 1 Umbrella 2 -r +u 0.2 -r -u 0.8

Most Likely Explanation

HMMs: MLE Queries ▪ HMMs defined by ▪ States X ▪ Observations E ▪ Initial distribution: ▪ Transitions: ▪ Emissions: ▪ New query: most likely explanation: X 1 X 2 X 3 X 4 E 1 E 2 E 3 E 4

HMMs: MLE Queries ▪ Graph of states and transitions over time sun sun sun sun rain rain rain rain ▪ Each arc represents some transition ▪ Each arc has weight ▪ Each path is a sequence of states ▪ The product of weights on a path is that sequence’s probability along with the evidence ▪ Forward algorithm computes sums of paths, Viterbi computes best paths

HMMs: MLE Queries sun sun sun sun rain rain rain rain Viterbi Algorithm (Max) Forward Algorithm (Sum)

November 15 th , 2017 Natural Language Processing Startup Joint - PowerPoint PPT Presentation

DATA130006 Text Management and Analysis Sequence Labeling School of Data Science, Fudan University November 15 th , 2017 Natural Language Processing Startup Joint Distributions A joint

SBF AGM 2017 CEO Slides SBF AGM 2017 CEO Slides SBF AGM 2017 CEO Slides SBF AGM 2017 CEO Slides

3Q 2017 Financial Results (1 Jul 2017 to 30 Sep 2017) 3 November 2017 Important Notice This

Novem ember er 2 nd nd , , 2017 2017 Q3 2017 Results November 2 nd , 2017 1 1 SAFE HARBOUR

3/19/2017 Resource Aquisition And Transport in Vascular Plants 1 3/19/2017 2 3/19/2017 3

3/3/2017 Rick Guidotti 1 3/3/2017 2 3/3/2017 ALBINISM 3 3/3/2017 4 3/3/2017 5

November 2019 November 2019 November 2019 November 2019 SAFE HARBOR Some of the information

Merging Merb into Rails Wednesday, November 18, 2009 Me Wednesday, November 18, 2009 Yehuda

Global and regional prospects 7 6.5 2017 (Jan16) 6.2 2017 (Apr16) 6 2017 (Jul16) 2017

resurrection sunday 1 16/4/2017 2 16/4/2017 3 16/4/2017 4 16/4/2017 BLASPHEMY! 5

DES ESIGN & CONSTRUCTION SER ERVICES November 29, 2017 1 11/3/2017 Who is s SCO?

SBI Holdings, Inc. 2017 Information Meeting November 21 Osaka November 27 Nagoya November 30

Investor Presentation November 2017 November 2017 www.amwater.com 1 NYSE: AWK Forward-Looking

Presentation November 2017 | 1 1 | 2017 1 | 2017 2017 Disclaimer THIS PRESENTATION

Firs rst t Qua uart rter er 2017 17 Result lts April 26, 2017 Q1 2017 Results Q2 2017

FULL YEAR RESULTS FULL YEAR RESULTS. 2017 FULL YEAR RESULTS FULL YEAR RESULTS . 2017 . 2017 .

Children's System MCO Contracting Fair November 6, 2017 2 November 7, 2017 Guiding Principles

6.1 Texture Mapping Hao Li (lecturer for 9/28: Justin Solomon) http://cs420.hao-li.com 1

Skyrama: Art Directing a Social Game Hector Moran Head of Art, Sproing Interactive Media About

Photon Detector System Performance Testing Denver Whittington , Stuart Mufson, Bruce Howard

Drawing Drawing models Graphics context Display lists Painters Algorithm Clipping &

J subclass Graphics

Isogeny Based Cryptography: an Introduction Luca De Feo IBM Research Zrich November 28, 2019

Specifying Fractal and GCM Components With UML Solange Ahumada, Ludovic Apvrille, Toms Barros,

Formal Specification and Testing of Model Transformations Manuel W im m er, Loli Burgueo, Lars

November 15 th , 2017 Natural Language Processing Startup Joint - PowerPoint PPT Presentation

DATA130006 Text Management and Analysis Sequence Labeling School of Data Science, Fudan University November 15 th , 2017 Natural Language Processing Startup Joint Distributions A joint

SBF AGM 2017 CEO Slides SBF AGM 2017 CEO Slides SBF AGM 2017 CEO Slides SBF AGM 2017 CEO Slides

3Q 2017 Financial Results (1 Jul 2017 to 30 Sep 2017) 3 November 2017 Important Notice This

Novem ember er 2 nd nd , , 2017 2017 Q3 2017 Results November 2 nd , 2017 1 1 SAFE HARBOUR

3/19/2017 Resource Aquisition And Transport in Vascular Plants 1 3/19/2017 2 3/19/2017 3

3/3/2017 Rick Guidotti 1 3/3/2017 2 3/3/2017 ALBINISM 3 3/3/2017 4 3/3/2017 5

November 2019 November 2019 November 2019 November 2019 SAFE HARBOR Some of the information

Merging Merb into Rails Wednesday, November 18, 2009 Me Wednesday, November 18, 2009 Yehuda

Global and regional prospects 7 6.5 2017 (Jan16) 6.2 2017 (Apr16) 6 2017 (Jul16) 2017

resurrection sunday 1 16/4/2017 2 16/4/2017 3 16/4/2017 4 16/4/2017 BLASPHEMY! 5

DES ESIGN &amp; CONSTRUCTION SER ERVICES November 29, 2017 1 11/3/2017 Who is s SCO?

SBI Holdings, Inc. 2017 Information Meeting November 21 Osaka November 27 Nagoya November 30

Investor Presentation November 2017 November 2017 www.amwater.com 1 NYSE: AWK Forward-Looking

Presentation November 2017 | 1 1 | 2017 1 | 2017 2017 Disclaimer THIS PRESENTATION

Firs rst t Qua uart rter er 2017 17 Result lts April 26, 2017 Q1 2017 Results Q2 2017

FULL YEAR RESULTS FULL YEAR RESULTS. 2017 FULL YEAR RESULTS FULL YEAR RESULTS . 2017 . 2017 .

Children's System MCO Contracting Fair November 6, 2017 2 November 7, 2017 Guiding Principles

6.1 Texture Mapping Hao Li (lecturer for 9/28: Justin Solomon) http://cs420.hao-li.com 1

Skyrama: Art Directing a Social Game Hector Moran Head of Art, Sproing Interactive Media About

Photon Detector System Performance Testing Denver Whittington , Stuart Mufson, Bruce Howard

Drawing Drawing models Graphics context Display lists Painters Algorithm Clipping &amp;

J subclass Graphics

Isogeny Based Cryptography: an Introduction Luca De Feo IBM Research Zrich November 28, 2019

Specifying Fractal and GCM Components With UML Solange Ahumada, Ludovic Apvrille, Toms Barros,

Formal Specification and Testing of Model Transformations Manuel W im m er, Loli Burgueo, Lars

DES ESIGN & CONSTRUCTION SER ERVICES November 29, 2017 1 11/3/2017 Who is s SCO?

Drawing Drawing models Graphics context Display lists Painters Algorithm Clipping &