cs 4501 machine learning for nlp

CS 4501 Machine Learning for NLP Introduction Yangfeng Ji - PowerPoint PPT Presentation

CS 4501 Machine Learning for NLP Introduction Yangfeng Ji Department of Computer Science University of Virginia Overview 1. Course Information 2. Basic Linear Algebra 3. Basic Probability Theory 4. Statistical Estimation 1 About Online

  1. CS 4501 Machine Learning for NLP Introduction Yangfeng Ji Department of Computer Science University of Virginia

  2. Overview 1. Course Information 2. Basic Linear Algebra 3. Basic Probability Theory 4. Statistical Estimation 1

  3. About Online Lectures ◮ All lectures will be recorded and uploaded to Collab ◮ By default, participants are muted upon entry. If you have a question ◮ Chime in ◮ Use the “Raise Hand” feature ◮ Send a message via Chat ◮ By default, video is off upon entry 2

  4. About Online Lectures ◮ All lectures will be recorded and uploaded to Collab ◮ By default, participants are muted upon entry. If you have a question ◮ Chime in ◮ Use the “Raise Hand” feature ◮ Send a message via Chat ◮ By default, video is off upon entry ◮ Create a Slack workspace for this course (?) 2

  5. Course Information

  6. Course Webpage http://yangfengji.net/uva-nlp-course/ 4

  7. Instructors ◮ Instructor ◮ Yangfeng Ji ◮ Office hour: TBD 5

  8. Instructors ◮ Instructor ◮ Yangfeng Ji ◮ Office hour: TBD ◮ TA: ◮ Stephanie Schoch ◮ Office hour: TBD 5

  9. Clarification This is not the class if you want to ◮ learn programming ◮ learn basic machine learning ◮ learn how to use PyTorch 6

  10. Goal of This Course 1. Explain the fundamental NLP techniques ◮ Text classification ◮ Language modeling ◮ Word embeddings ◮ Sequence labeling ◮ Machine translation 2. Advanced topics ◮ Discourse processing, text generation, interpretability in NLP 3. Opportunities of working on some NLP problems ◮ Final project 7

  11. Assignments ◮ No exam 8

  12. Assignments ◮ No exam ◮ Six homeworks ◮ 14% × 6 = 84% 8

  13. Assignments ◮ No exam ◮ Six homeworks ◮ 14% × 6 = 84% ◮ One final project ◮ 2 – 3 students per group ◮ Proposal: 4% ◮ Final presentation: 6% ◮ Final project report: 6% 8

  14. Policy: late penalty Homework submission will be accepted up to 72 hours late, with 20% deduction per 24 hours on the points as a penalty. For example, ◮ Deadline: August 30th, 11:59 PM ◮ Submission timestamp: September 1st, 9:00 AM ( ≤ 48 hours) ◮ Original points of a homework: 7 ◮ Actual points: 7 × ( 1 − 40% ) = 4 . 2 (1) It is usually better if students just turn in what they have in time. 9

  15. Policy: collaboration ◮ Homeworks ◮ Collaboration is not encouraged ◮ Students are allowed to discuss with their classmates ◮ Final project ◮ It should be a team effort 10

  16. Policy: grades 11

  17. Textbooks ◮ Textbook ◮ Eisenstein, Natural Language Processing , 2018 All free online 12

  18. Textbooks ◮ Textbook ◮ Eisenstein, Natural Language Processing , 2018 ◮ Additional textbooks ◮ Jurafsky and Martin, Speech and Language Processing , 3rd Edition, 2019 ◮ Smith, Linguistic Structure Prediction , 2009 ◮ Shalev-Shwartz and Ben-David, Understanding Machine Learning: From Theory to Algorithms , 2014 ◮ Goodfellow, Bengio and Courville, Deep Learning , 2016 All free online 12

  19. Piazza https://piazza.com/virginia/fall2020/cs4501003 ◮ course announcements ◮ online QA 13

  20. Question? 14

  21. Basic Linear Algebra

  22. Linear Equations Consider the following system of equations 푥 1 − 푥 2 = 1 (2) Each equation represents a line in the following 2-D space 푥 2 푥 1 16

  23. Linear Equations Consider the following system of equations 푥 1 − 푥 2 = 1 (2) 푥 1 + 2 푥 2 = 2 Each equation represents a line in the following 2-D space 푥 2 푥 1 16

  24. Linear Equations Consider the following system of equations 푥 1 − 푥 2 = 1 (3) 푥 1 + 2 푥 2 = 2 In matrix notation, it can be written as a more compact from A 풙 = 풃 (4) with � � � � � � − 1 푥 1 1 1 A = 풙 = 풃 = (5) 1 2 푥 2 2 17

  25. Basic Notations � � � � � � − 1 1 푥 1 1 A = 풙 = 풃 = 1 2 푥 2 2 ◮ A ∈ ℝ 푚 × 푛 : a matrix with 푚 rows and 푛 columns ◮ The element on the 푖 -th row and the 푗 -th column is denoted as 푎 푖,푗 ◮ 풙 ∈ ℝ 푛 : a vector with 푛 entries. By convention, an 푛 -dimensional vector is often thought of as matrix with 푛 rows and 1 column, known as a column vector. ◮ The 푖 -th element is denoted as 푥 푖 18

  26. Basic Notations � � � � � � − 1 1 푥 1 1 A = 풙 = 풃 = 1 2 푥 2 2 ◮ A ∈ ℝ 푚 × 푛 : a matrix with 푚 rows and 푛 columns ◮ The element on the 푖 -th row and the 푗 -th column is denoted as 푎 푖,푗 ◮ 풙 ∈ ℝ 푛 : a vector with 푛 entries. By convention, an 푛 -dimensional vector is often thought of as matrix with 푛 rows and 1 column, known as a column vector. ◮ The 푖 -th element is denoted as 푥 푖 Problem : Solve a matrix-vector multiplication with hands and with PyTorch 18

  27. ℓ 2 Norm The ℓ 2 norm of a vector 풙 ∈ ℝ 푛 is defined as � � 푛 � 푥 2 � 풙 � 2 = (6) 푖 푖 = 1 푥 2 풙 � 풙 � 2 푥 1 19

  28. ℓ 1 Norms The ℓ 1 norm of a vector 풙 ∈ ℝ 푛 is defined as 푛 � � 풙 � 1 = | 푥 푖 | (7) 푖 = 1 20

  29. Dot Product The dot product of 풙 , 풚 ∈ ℝ 푛 is defined as 푛 � � 풙 , 풚 � = 풙 T 풚 = 푥 푖 푦 푖 (8) 푖 = 1 where 풙 T is the transpose of 풙 . ◮ � 풙 � 2 2 = � 풙 , 풙 � 21

  30. Dot Product The dot product of 풙 , 풚 ∈ ℝ 푛 is defined as 푛 � � 풙 , 풚 � = 풙 T 풚 = 푥 푖 푦 푖 (8) 푖 = 1 where 풙 T is the transpose of 풙 . ◮ � 풙 � 2 2 = � 풙 , 풙 � ◮ If 풙 = ( 0 , 0 , . . . , , . . . , 0 ) , then � 풙 , 풚 � = 푦 푖 1 ���� 푥 푖 21

  31. Dot Product The dot product of 풙 , 풚 ∈ ℝ 푛 is defined as 푛 � � 풙 , 풚 � = 풙 T 풚 = 푥 푖 푦 푖 (8) 푖 = 1 where 풙 T is the transpose of 풙 . ◮ � 풙 � 2 2 = � 풙 , 풙 � ◮ If 풙 = ( 0 , 0 , . . . , , . . . , 0 ) , then � 풙 , 풚 � = 푦 푖 1 ���� 푥 푖 ◮ If 풙 is an unit vector ( � 풙 � 2 = 1 ), then � 풙 , 풚 � is the projection of 풚 on the direction of 풙 풚 풙 21

  32. Frobenius Norm The Forbenius norm of a matrix A = [ 푎 푖,푗 ] ∈ ℝ 푚 × 푛 denoted by � · � 퐹 is defined as � A � 퐹 = � � � � 1 / 2 푎 2 (9) 푖,푗 푖 푗 ◮ The Frobenius norm can be interpreted as the ℓ 2 norm of a vector when treating A as a vector of size 푚푛 . 22

  33. Two Special Matrices ◮ The identity matrix, denoted as I ∈ ℝ 푛 × 푛 ] , is a square matrix with ones on the diagonal and zeros everywhere else.   1     ... I = (10)       1   23

  34. Two Special Matrices ◮ The identity matrix, denoted as I ∈ ℝ 푛 × 푛 ] , is a square matrix with ones on the diagonal and zeros everywhere else.   1     ... I = (10)       1   ◮ A diagonal matrix, denoted as D = diag ( 푑 1 , 푑 2 , . . . , 푑 푛 ) , is a matrix where all non-diagonal elements are 0.   푑 1     ... (11) D =       푑 푛   23

  35. Inverse The inverse of a square matrix A ∈ ℝ 푛 × 푛 is denoted as A − 1 , which is the unique matrix such that A − 1 A = I = AA − 1 (12) ◮ Non-square matrices do not have inverses (by definition) ◮ Not all square matrices are invertible ◮ The solution of the linear equations in Eq. (3) is 풙 = A − 1 풃 24

  36. Orthogonal Matrices ◮ Tw o vectors 풙 , 풚 ∈ ℝ 푛 are orthogonal if � 풙 , 풚 � = 0 풚 풙 25

  37. Orthogonal Matrices ◮ Tw o vectors 풙 , 풚 ∈ ℝ 푛 are orthogonal if � 풙 , 풚 � = 0 풚 풙 ◮ A square matrix U ∈ ℝ 푛 × 푛 is orthogonal, if all its columns are orthogonal to each other and normalized (orthonormal) � 풖 푖 , 풖 푗 � = 0 , � 풖 푖 � = 1 , � 풖 푗 � = 1 (13) for 푖, 푗 ∈ [ 푛 ] and 푖 ≠ 푗 ◮ Furthermore, U T U = I = UU T , which further implies U − 1 = U T 25

  38. Orthogonal Matrices ◮ Tw o vectors 풙 , 풚 ∈ ℝ 푛 are orthogonal if � 풙 , 풚 � = 0 풚 풙 ◮ A square matrix U ∈ ℝ 푛 × 푛 is orthogonal, if all its columns are orthogonal to each other and normalized (orthonormal) � 풖 푖 , 풖 푗 � = 0 , � 풖 푖 � = 1 , � 풖 푗 � = 1 (13) for 푖, 푗 ∈ [ 푛 ] and 푖 ≠ 푗 ◮ Furthermore, U T U = I = UU T , which further implies U − 1 = U T Problem : Create special matrices using PyTorch 25

  39. Symmetric Matrices A symmetric matrix A ∈ ℝ 푛 × 푛 is defined as A T = A (14) or, in other words, 푎 푖,푗 = 푎 푗,푖 ∀ 푖, 푗 ∈ [ 푛 ] (15) Comments ◮ The identity matrix I is symmetric ◮ A diagonal matrix is symmetric 26

  40. Quiz Quiz The identity matrix I is ◮ a diagonal matrix? ◮ a symmetric matrix? ◮ an orthogonal matrix? Further reference [Kolter, 2015] 27

  41. Quiz Quiz The identity matrix I is ◮ a diagonal matrix? � ◮ a symmetric matrix? � ◮ an orthogonal matrix? � Further reference [Kolter, 2015] 27

  42. Basic Probability Theory

  43. What is Probability? The probability of landing heads is 0.52 29

  44. Two interpretations Frequentist Probability represents the long-run frequency of an event ◮ If we flip the coin many times, we expect it to land heads about 52% times 30


More recommend