A Framework for Political Portmanteau Decomposition Nabil Hossain Minh Tran Henry Kautz nhossain@cs.rochester.edu Dept. Computer Science University of Rochester, NY
Political Portmanteau • Portmanteau • words formed by combining sounds and meanings of two words • brunch = br eakfast + l unch motel = mo tor + ho tel • Political portmanteau (PP) • portmanteau in which at least one word refers to political entity • libtard = lib eral + re tard repugnican = repugn ant + republ ican • political framing • creative, sticky • novel slang; can be used in hate speech
Political Portmanteau • Portmanteau • words formed by combining sounds and meanings of two words • brunch = br eakfast + l unch motel = mo tor + ho tel • Political portmanteau (PP) • portmanteau in which at least one word refers to political entity • libtard = lib eral + re tard repugnican = repugn ant + republ ican • political framing • creative, sticky • novel slang; can be used in hate speech
Political Portmanteau • Portmanteau • words formed by combining sounds and meanings of two words • brunch = br eakfast + l unch motel = mo tor + ho tel • Political portmanteau (PP) • portmanteau in which at least one word refers to political entity • libtard = lib eral + re tard repugnican = repugn ant + republ ican • o ff ensive; political framing • creative, humorous, slang, sticky • can be used in hate speech
Contributions • Framework for identifying political portmanteau from the web • Algorithm for PP detection and decomposition into root words • First shared dataset of PP
Method ICWSM 2018 Slang Detection Reddit Comments Potential Slang • Extract words from Reddit news comments • Apply slang detection algorithm • Classify the detected words into PP vs not-PP • Decompose detected PP into root words: • � [ where X or Y is a political X + Y → PP term ] Hossain, Nabil, Thanh Thuy Trang Tran, and Henry Kautz. "Discovering Political Slang in Readers' Comments." In ICWSM 2018.
Method ICWSM 2018 Expert PP Annotators Detection Slang Detection Reddit Not-PP PP Comments (repub) (libtard) Potential Slang • Extract words from Reddit news comments • Apply slang detection algorithm • Classify the detected words into PP vs not-PP • Decompose detected PP into root words: • � [ where X or Y is a political X + Y → PP term ] Hossain, Nabil, Thanh Thuy Trang Tran, and Henry Kautz. "Discovering Political Slang in Readers' Comments." In ICWSM 2018.
Method ICWSM 2018 Expert PP Annotators Detection Slang Detection Reddit Not-PP PP Comments (repub) (libtard) Potential Slang PP Decomposition Political Entities Prefix/suffix match • Extract words from Reddit news comments (liberal, cruz, …) • Apply slang detection algorithm lib + C = libtard • Classify the detected words into PP vs not-PP Comment Wordlist Classifier Context • Decompose detected PP into root words: • � C = {retard, dotard, custard, …} or � E + C → PP C + E → PP Hossain, Nabil, Thanh Thuy Trang Tran, and Henry Kautz. "Discovering Political Slang in Readers' Comments." In ICWSM 2018.
Model Details • � distribution Model — no contextual features β • Edit distances, word length, usage frequency • capture sound blending and word popularity • XGBoost — uses pre-trained GloVe word vector features from comments • also uses � distribution model features β PP Decomposition Accuracy PP Detection Accuracy Questions: nhossain@cs.rochester.edu
Results • � distribution Model — no contextual features β • Edit distances, word length, usage frequency • capture sound blending and word popularity • XGBoost — uses pre-trained GloVe word vector features from comments • also uses � distribution model features β PP Decomposition Accuracy PP Detection Accuracy Questions: nhossain@cs.rochester.edu
Results • � distribution Model — no contextual features β • Edit distances, word length, usage frequency • capture sound blending and word popularity • XGBoost — uses pre-trained GloVe word vector features from comments • also uses � distribution model features β PP Decomposition Accuracy PP Detection Accuracy Questions: nhossain@cs.rochester.edu Website: https://cs.rochester.edu/u/nhossain
Results • � distribution Model — no contextual features β • Edit distances, word length, usage frequency • capture sound blending and word popularity • XGBoost — uses pre-trained GloVe word vector features from comments • also uses � distribution model features β PP Decomposition Accuracy PP Detection Accuracy Questions: nhossain@cs.rochester.edu Website: https://cs.rochester.edu/u/nhossain
Recommend
More recommend