controlling linguistic style aspects in neural language
play

Controlling Linguistic Style Aspects in Neural Language Generation - PowerPoint PPT Presentation

Controlling Linguistic Style Aspects in Neural Language Generation Jessica Ficler and Yoav Goldberg ISCOL 2017 Controlling Linguistic Style Aspects in Neural Language Generation Jessica Ficler and Yoav Goldberg ISCOL 2017 Our goal is to


  1. Model a conditioned language model: - 𝑄 𝑥 " … 𝑥 - 𝑑 = 0 𝑄(𝑥 2 |𝑥 " , … , 𝑥 25" , 𝑑) 27" Condition each word on the history , as well as on a context c .

  2. Model In our case, c is a concatenation of the parameters values embedding vectors c: Personal:False Sentiment:Positive Length:≤10 Descriptive:True Proffesional:True Theme:Plot

  3. Model In our case, c is a concatenation of the parameters values embedding vectors start c: Personal:False Sentiment:Positive Length:≤10 Descriptive:True Proffesional:True Theme:Plot

  4. Model In our case, c is a concatenation of the parameters values embedding vectors start c: Personal:False Sentiment:Positive Length:≤10 Descriptive:True Proffesional:True Theme:Plot

  5. Model In our case, c is a concatenation of the parameters values embedding vectors An start c: Personal:False Sentiment:Positive Length:≤10 Descriptive:True Proffesional:True Theme:Plot

  6. Model In our case, c is a concatenation of the parameters values embedding vectors An start An c: Personal:False Sentiment:Positive Length:≤10 Descriptive:True Proffesional:True Theme:Plot

  7. Model In our case, c is a concatenation of the parameters values embedding vectors An start An c: Personal:False Sentiment:Positive Length:≤10 Descriptive:True Proffesional:True Theme:Plot

  8. Model In our case, c is a concatenation of the parameters values embedding vectors An entertaining start An c: Personal:False Sentiment:Positive Length:≤10 Descriptive:True Proffesional:True Theme:Plot

  9. Model In our case, c is a concatenation of the parameters values embedding vectors family- . friendly attractive story An visually and entertaining family-friendly entertaining attractive visually story start and An c: Personal:False Sentiment:Positive Length:≤10 Descriptive:True Proffesional:True Theme:Plot

  10. The model is simple, but… we need training data annotated with the appropriate values.

  11. Text extract Parameters

  12. Text Meta data extract Heuristics Parameters

  13. Text Meta data extract train Heuristics Parameters

  14. Rotten-Tomatoeswebsite. 7,500 movies. 1,002,625 movie reviews. Text Meta data extract train Heuristics Parameters

  15. Rotten-Tomatoeswebsite. 7,500 movies. 1,002,625 movie reviews. Text Meta data extract train Heuristics Parameters

  16. Professional

  17. Professional In rottentomatoes the critic reviews are separated from the audience review

  18. Professional Non In rottentomatoes the critic reviews are separated from the audience review Professional Professional

  19. Some of the non-professional reviewers are considered as “super reviewers” Also professional

  20. Sentiment

  21. Sentiment Sentiment scores

  22. Sentiment Sentiment We normalized the critics scores to be on 0-5 scale Negative Neutral Positive 0-2 3 4-5

  23. Rotten-Tomatoeswebsite. 7,500 movies. 1,002,625 movie reviews. Text Meta data extract train Heuristics Parameters

  24. Rotten-Tomatoeswebsite. 7,500 movies. 1,002,625 movie reviews. Text Content Meta data words extract train Function Heuristics words POS tags Parameters

  25. Theme Content words To determine the value for the theme parameter we searched for words that are related to the 4 topics and are common in our data set Theme Plot Acting Production Effects Story Acting Effects Director Storytelling Cast Song Directed Plot Performance Music Production Script Play Voice co-production Manuscript Role Visual Tale Miscasting Soundtrack Scene Actor Shot

  26. Theme Content words To determine the value for the theme parameter we searched for words that are related to the 4 topics and are common in our data set Theme Plot Acting Production Effects Story Acting Effects Director Storytelling Cast Song Directed Plot Performance Music Production Script Play Voice co-production Manuscript Role Visual Tale Miscasting Soundtrack Scene Actor Shot Each sentence was labeled with the category that has the most words in the sentence. Sentences that do not include any words from our lists are labeled as other

  27. Personal Voice Personal Pronouns To determine weather a review is written in personal voice we search for words that express subjectivity Personal True False I Other cases My

  28. Descriptiveness Distribution of part-of-speech tags We assume that descriptive texts make heavy use of adjectives Descriptive True False % JJ ≥ 35 Other cases

  29. Length Length 21-40 words ≤ 10 words > 40 words 11-20 words

  30. Dataset Statistics Our final data-set includes 2,773,435 sentences We divided the data set to training (~2.7M), development (~2K) and test (~2K) sets Each sentence is labeled with the 6 parameters

  31. easy Parameters Values Text

  32. easy Parameters Values Text

  33. easy Parameters Values Text hard

  34. extract Parameters Values Text hard

  35. extract Parameters Values Text Conditioned Language Model

  36. extract Parameters Values Text Conditioned Language Model Does this work?

  37. Examples of Generated Sentences Parameter Value Professional False Personal True Length 11-20 Descriptive True Theme Other Sentiment Negative

  38. Examples of Generated Sentences “Ultimately, I can honestly say that this movie is full of stupid stupid and stupid stupid stupid stupid stupid.” Parameter Value Professional False Personal True Length 11-20 Descriptive True Theme Other Sentiment Negative

  39. Examples of Generated Sentences “Ultimately, I can honestly say that this movie is full of stupid stupid and stupid stupid stupid stupid stupid.” Parameter Value Professional False Personal True Length 11-20 Descriptive True Theme Other Sentiment Negative

  40. Examples of Generated Sentences “Ultimately, I can honestly say that this movie is full of stupid stupid and stupid stupid stupid stupid stupid.” Parameter Value Professional False Personal True Length 11-20 Descriptive True Theme Other Sentiment Negative

  41. Examples of Generated Sentences “Ultimately, I can honestly say that this movie is full of stupid stupid and stupid stupid stupid stupid stupid.” Parameter Value Professional False Personal True Length 11-20 Descriptive True Theme Other Sentiment Negative

  42. Examples of Generated Sentences “Ultimately, I can honestly say that this movie is full of stupid stupid and stupid stupid stupid stupid stupid.” Parameter Value Professional False Personal True Length 11-20 Descriptive True Theme Other Sentiment Negative

  43. Examples of Generated Sentences “Ultimately, I can honestly say that this movie is full of stupid stupid and stupid stupid stupid stupid stupid.” Parameter Value Professional False Personal True Length 11-20 Descriptive True Theme Other Sentiment Negative

  44. Examples of Generated Sentences “Ultimately, I can honestly say that this movie is full of stupid stupid and stupid stupid stupid stupid stupid.” Parameter Value Professional False Personal True Length 11-20 Descriptive True Theme Other Sentiment Negative

  45. Examples of Generated Sentences “Ultimately, I can honestly say that this movie “The film’s simple, and a refreshing take on the is full of stupid stupid and stupid stupid stupid complex family drama of the regions of human stupid stupid.” intelligence.” Parameter Value Parameter Value Professional False Professional True Personal True Personal False Length 11-20 Length 11-20 Descriptive True Descriptive False Theme Other Theme Other Sentiment Negative Sentiment Positive

  46. Examples of Generated Sentences “Ultimately, I can honestly say that this movie “The film’s simple, and a refreshing take on the is full of stupid stupid and stupid stupid stupid complex family drama of the regions of human stupid stupid.” intelligence.” Parameter Value Parameter Value Professional False Professional True Personal True Personal False Length 11-20 Length 11-20 Descriptive True Descriptive False Theme Other Theme Other Sentiment Negative Sentiment Positive We would like to quantitatively measure our model capabilities.

  47. Evaluation • Evaluating LM Quality (Perplexity) • Evaluating the Generated Sentences

  48. Evaluating LM Quality

  49. Sanity Check 1. Conditioned vs. Unconditioned Does knowing the parameters indeed helps in achieving better language modeling results?

  50. Sanity Check 1. Conditioned vs. Unconditioned Does knowing the parameters indeed helps in achieving better language modeling results? Dev Test Not-conditioned 25.8 24.4 Conditioned 24.8 23.3 Knowing the correct parameter values indeed results in better perplexity!

  51. Baseline 2. Conditioned vs. Dedicated LMs Is our model effective comparing to train a separate unconditioned LM on subset of the data (dedicated LM)?

  52. Baseline 2. Conditioned vs. Dedicated LMs Is our model effective comparing to train a separate unconditioned LM on subset of the data (dedicated LM)? Data Set

  53. when generating text, we would choose the model that corresponds to the requested Baseline 2. Conditioned vs. Dedicated LMs Is our model effective comparing to train a separate unconditioned LM on subset of the data (dedicated LM)? Sentiment:Positive When generating text, we would choose the model that corresponds Data Set to the requested value Sentiment:Neutral Sentiment:Negative

Recommend


More recommend