CS 4501 Machine Learning for NLP Introduction Yangfeng Ji Department of Computer Science University of Virginia
Overview 1. Course Information 2. Basic Linear Algebra 3. Basic Probability Theory 4. Statistical Estimation 1
About Online Lectures ◮ All lectures will be recorded and uploaded to Collab ◮ By default, participants are muted upon entry. If you have a question ◮ Chime in ◮ Use the “Raise Hand” feature ◮ Send a message via Chat ◮ By default, video is off upon entry 2
About Online Lectures ◮ All lectures will be recorded and uploaded to Collab ◮ By default, participants are muted upon entry. If you have a question ◮ Chime in ◮ Use the “Raise Hand” feature ◮ Send a message via Chat ◮ By default, video is off upon entry ◮ Create a Slack workspace for this course (?) 2
Course Information
Course Webpage http://yangfengji.net/uva-nlp-course/ 4
Instructors ◮ Instructor ◮ Yangfeng Ji ◮ Office hour: TBD 5
Instructors ◮ Instructor ◮ Yangfeng Ji ◮ Office hour: TBD ◮ TA: ◮ Stephanie Schoch ◮ Office hour: TBD 5
Clarification This is not the class if you want to ◮ learn programming ◮ learn basic machine learning ◮ learn how to use PyTorch 6
Goal of This Course 1. Explain the fundamental NLP techniques ◮ Text classification ◮ Language modeling ◮ Word embeddings ◮ Sequence labeling ◮ Machine translation 2. Advanced topics ◮ Discourse processing, text generation, interpretability in NLP 3. Opportunities of working on some NLP problems ◮ Final project 7
Assignments ◮ No exam 8
Assignments ◮ No exam ◮ Six homeworks ◮ 14% × 6 = 84% 8
Assignments ◮ No exam ◮ Six homeworks ◮ 14% × 6 = 84% ◮ One final project ◮ 2 – 3 students per group ◮ Proposal: 4% ◮ Final presentation: 6% ◮ Final project report: 6% 8
Policy: late penalty Homework submission will be accepted up to 72 hours late, with 20% deduction per 24 hours on the points as a penalty. For example, ◮ Deadline: August 30th, 11:59 PM ◮ Submission timestamp: September 1st, 9:00 AM ( ≤ 48 hours) ◮ Original points of a homework: 7 ◮ Actual points: 7 × ( 1 − 40% ) = 4 . 2 (1) It is usually better if students just turn in what they have in time. 9
Policy: collaboration ◮ Homeworks ◮ Collaboration is not encouraged ◮ Students are allowed to discuss with their classmates ◮ Final project ◮ It should be a team effort 10
Policy: grades 11
Textbooks ◮ Textbook ◮ Eisenstein, Natural Language Processing , 2018 All free online 12
Textbooks ◮ Textbook ◮ Eisenstein, Natural Language Processing , 2018 ◮ Additional textbooks ◮ Jurafsky and Martin, Speech and Language Processing , 3rd Edition, 2019 ◮ Smith, Linguistic Structure Prediction , 2009 ◮ Shalev-Shwartz and Ben-David, Understanding Machine Learning: From Theory to Algorithms , 2014 ◮ Goodfellow, Bengio and Courville, Deep Learning , 2016 All free online 12
Piazza https://piazza.com/virginia/fall2020/cs4501003 ◮ course announcements ◮ online QA 13
Question? 14
Basic Linear Algebra
Linear Equations Consider the following system of equations 푥 1 − 푥 2 = 1 (2) Each equation represents a line in the following 2-D space 푥 2 푥 1 16
Linear Equations Consider the following system of equations 푥 1 − 푥 2 = 1 (2) 푥 1 + 2 푥 2 = 2 Each equation represents a line in the following 2-D space 푥 2 푥 1 16
Linear Equations Consider the following system of equations 푥 1 − 푥 2 = 1 (3) 푥 1 + 2 푥 2 = 2 In matrix notation, it can be written as a more compact from A 풙 = 풃 (4) with � � � � � � − 1 푥 1 1 1 A = 풙 = 풃 = (5) 1 2 푥 2 2 17
Basic Notations � � � � � � − 1 1 푥 1 1 A = 풙 = 풃 = 1 2 푥 2 2 ◮ A ∈ ℝ 푚 × 푛 : a matrix with 푚 rows and 푛 columns ◮ The element on the 푖 -th row and the 푗 -th column is denoted as 푎 푖,푗 ◮ 풙 ∈ ℝ 푛 : a vector with 푛 entries. By convention, an 푛 -dimensional vector is often thought of as matrix with 푛 rows and 1 column, known as a column vector. ◮ The 푖 -th element is denoted as 푥 푖 18
Basic Notations � � � � � � − 1 1 푥 1 1 A = 풙 = 풃 = 1 2 푥 2 2 ◮ A ∈ ℝ 푚 × 푛 : a matrix with 푚 rows and 푛 columns ◮ The element on the 푖 -th row and the 푗 -th column is denoted as 푎 푖,푗 ◮ 풙 ∈ ℝ 푛 : a vector with 푛 entries. By convention, an 푛 -dimensional vector is often thought of as matrix with 푛 rows and 1 column, known as a column vector. ◮ The 푖 -th element is denoted as 푥 푖 Problem : Solve a matrix-vector multiplication with hands and with PyTorch 18
ℓ 2 Norm The ℓ 2 norm of a vector 풙 ∈ ℝ 푛 is defined as � � 푛 � 푥 2 � 풙 � 2 = (6) 푖 푖 = 1 푥 2 풙 � 풙 � 2 푥 1 19
ℓ 1 Norms The ℓ 1 norm of a vector 풙 ∈ ℝ 푛 is defined as 푛 � � 풙 � 1 = | 푥 푖 | (7) 푖 = 1 20
Dot Product The dot product of 풙 , 풚 ∈ ℝ 푛 is defined as 푛 � � 풙 , 풚 � = 풙 T 풚 = 푥 푖 푦 푖 (8) 푖 = 1 where 풙 T is the transpose of 풙 . ◮ � 풙 � 2 2 = � 풙 , 풙 � 21
Dot Product The dot product of 풙 , 풚 ∈ ℝ 푛 is defined as 푛 � � 풙 , 풚 � = 풙 T 풚 = 푥 푖 푦 푖 (8) 푖 = 1 where 풙 T is the transpose of 풙 . ◮ � 풙 � 2 2 = � 풙 , 풙 � ◮ If 풙 = ( 0 , 0 , . . . , , . . . , 0 ) , then � 풙 , 풚 � = 푦 푖 1 ���� 푥 푖 21
Dot Product The dot product of 풙 , 풚 ∈ ℝ 푛 is defined as 푛 � � 풙 , 풚 � = 풙 T 풚 = 푥 푖 푦 푖 (8) 푖 = 1 where 풙 T is the transpose of 풙 . ◮ � 풙 � 2 2 = � 풙 , 풙 � ◮ If 풙 = ( 0 , 0 , . . . , , . . . , 0 ) , then � 풙 , 풚 � = 푦 푖 1 ���� 푥 푖 ◮ If 풙 is an unit vector ( � 풙 � 2 = 1 ), then � 풙 , 풚 � is the projection of 풚 on the direction of 풙 풚 풙 21
Frobenius Norm The Forbenius norm of a matrix A = [ 푎 푖,푗 ] ∈ ℝ 푚 × 푛 denoted by � · � 퐹 is defined as � A � 퐹 = � � � � 1 / 2 푎 2 (9) 푖,푗 푖 푗 ◮ The Frobenius norm can be interpreted as the ℓ 2 norm of a vector when treating A as a vector of size 푚푛 . 22
Two Special Matrices ◮ The identity matrix, denoted as I ∈ ℝ 푛 × 푛 ] , is a square matrix with ones on the diagonal and zeros everywhere else. 1 ... I = (10) 1 23
Two Special Matrices ◮ The identity matrix, denoted as I ∈ ℝ 푛 × 푛 ] , is a square matrix with ones on the diagonal and zeros everywhere else. 1 ... I = (10) 1 ◮ A diagonal matrix, denoted as D = diag ( 푑 1 , 푑 2 , . . . , 푑 푛 ) , is a matrix where all non-diagonal elements are 0. 푑 1 ... (11) D = 푑 푛 23
Inverse The inverse of a square matrix A ∈ ℝ 푛 × 푛 is denoted as A − 1 , which is the unique matrix such that A − 1 A = I = AA − 1 (12) ◮ Non-square matrices do not have inverses (by definition) ◮ Not all square matrices are invertible ◮ The solution of the linear equations in Eq. (3) is 풙 = A − 1 풃 24
Orthogonal Matrices ◮ Tw o vectors 풙 , 풚 ∈ ℝ 푛 are orthogonal if � 풙 , 풚 � = 0 풚 풙 25
Orthogonal Matrices ◮ Tw o vectors 풙 , 풚 ∈ ℝ 푛 are orthogonal if � 풙 , 풚 � = 0 풚 풙 ◮ A square matrix U ∈ ℝ 푛 × 푛 is orthogonal, if all its columns are orthogonal to each other and normalized (orthonormal) � 풖 푖 , 풖 푗 � = 0 , � 풖 푖 � = 1 , � 풖 푗 � = 1 (13) for 푖, 푗 ∈ [ 푛 ] and 푖 ≠ 푗 ◮ Furthermore, U T U = I = UU T , which further implies U − 1 = U T 25
Orthogonal Matrices ◮ Tw o vectors 풙 , 풚 ∈ ℝ 푛 are orthogonal if � 풙 , 풚 � = 0 풚 풙 ◮ A square matrix U ∈ ℝ 푛 × 푛 is orthogonal, if all its columns are orthogonal to each other and normalized (orthonormal) � 풖 푖 , 풖 푗 � = 0 , � 풖 푖 � = 1 , � 풖 푗 � = 1 (13) for 푖, 푗 ∈ [ 푛 ] and 푖 ≠ 푗 ◮ Furthermore, U T U = I = UU T , which further implies U − 1 = U T Problem : Create special matrices using PyTorch 25
Symmetric Matrices A symmetric matrix A ∈ ℝ 푛 × 푛 is defined as A T = A (14) or, in other words, 푎 푖,푗 = 푎 푗,푖 ∀ 푖, 푗 ∈ [ 푛 ] (15) Comments ◮ The identity matrix I is symmetric ◮ A diagonal matrix is symmetric 26
Quiz Quiz The identity matrix I is ◮ a diagonal matrix? ◮ a symmetric matrix? ◮ an orthogonal matrix? Further reference [Kolter, 2015] 27
Quiz Quiz The identity matrix I is ◮ a diagonal matrix? � ◮ a symmetric matrix? � ◮ an orthogonal matrix? � Further reference [Kolter, 2015] 27
Basic Probability Theory
What is Probability? The probability of landing heads is 0.52 29
Two interpretations Frequentist Probability represents the long-run frequency of an event ◮ If we flip the coin many times, we expect it to land heads about 52% times 30
Recommend
More recommend