modeling lexical semantic shi ts during ad hoc
play

Modeling lexical semantic shits during ad-hoc coordination Alexandre - PowerPoint PPT Presentation

Modeling lexical semantic shits during ad-hoc coordination Alexandre Kabbach 12 Aurlie Herbelot 2 18.05.2020 GeCKo 2020 1 University of Geneva 2 CIMeC University of Trento Problem Conceptual variability and communication Speakers


  1. Modeling lexical semantic shiħts during ad-hoc coordination Alexandre Kabbach 12 Aurélie Herbelot 2 18.05.2020 – GeCKo 2020 1 University of Geneva 2 CIMeC – University of Trento

  2. Problem

  3. Conceptual variability and communication Speakers form conceptual representations for words based on difgerent background experiences (Connell and Lynott, 2014). How can speakers nonetheless communicate with one another if the words they utter do not refer to the exact same concepts? 1

  4. Conceptual variability and communication Speakers form conceptual representations for words based on difgerent background experiences (Connell and Lynott, 2014). How can speakers nonetheless communicate with one another if the words they utter do not refer to the exact same concepts? 1

  5. Coordination: a possible solution? Speakers coordinate with one-another during each communication instance in order to settle for specific word meanings (Clark, 1992, 1996). In doing so, they contextualize their generic conceptual representations during communication. 2

  6. Question How can we integrate coordination to standard Distributional Semantic Models (DSMs; Turney and Pantel, 2010; Clark, 2012; Erk, 2012; Lenci, 2018)? Problems: 1. DSMs do not distinguish background linguistic stimuli from active coordination in their acquisition process 2. DSMs consider conceptual representations to remain invariant during communication 3

  7. Proposal

  8. Model We distinguish background experience from ad-hoc coordination in a standard count-based PPMI-weighted DSM: • background experience = corpus data fed to the DSM • ad-hoc coordination = singular vector sampling in the SVD We replace the variance-preservation bias in the SVD of the DSM by an explicit coordination bias, sampling the set of d singular vectors which maximize the correlation with a particular similarity dataset (MEN and SimLex). 4

  9. Model We distinguish background experience from ad-hoc coordination in a standard count-based PPMI-weighted DSM: • background experience = corpus data fed to the DSM • ad-hoc coordination = singular vector sampling in the SVD We replace the variance-preservation bias in the SVD of the DSM by an explicit coordination bias, sampling the set of d singular vectors which maximize the correlation with a particular similarity dataset (MEN and SimLex). 4

  10. Model We distinguish background experience from ad-hoc coordination in a standard count-based PPMI-weighted DSM: • background experience = corpus data fed to the DSM • ad-hoc coordination = singular vector sampling in the SVD We replace the variance-preservation bias in the SVD of the DSM by an explicit coordination bias, sampling the set of d singular vectors which maximize the correlation with a particular similarity dataset (MEN and SimLex). 4

  11. Model We distinguish background experience from ad-hoc coordination in a standard count-based PPMI-weighted DSM: • background experience = corpus data fed to the DSM • ad-hoc coordination = singular vector sampling in the SVD We replace the variance-preservation bias in the SVD of the DSM by an explicit coordination bias, sampling the set of d singular vectors which maximize the correlation with a particular similarity dataset (MEN and SimLex). 4

  12. Assumptions 1. a single DSM can capture difgerent kinds of semantic relations from the same corpus, so that a collection of possible meaning spaces could coexist within the same set of data 2. aligning similarity judgments across sets of word pairs provides a nice approximation of ad-hoc coordination between two speakers originally disagreeing and ultimately converging to a form of agreement with respect to some lexical decision 5

  13. Assumptions 1. a single DSM can capture difgerent kinds of semantic relations from the same corpus, so that a collection of possible meaning spaces could coexist within the same set of data 2. aligning similarity judgments across sets of word pairs provides a nice approximation of ad-hoc coordination between two speakers originally disagreeing and ultimately converging to a form of agreement with respect to some lexical decision 5

  14. Assumptions 1. a single DSM can capture difgerent kinds of semantic relations from the same corpus, so that a collection of possible meaning spaces could coexist within the same set of data 2. aligning similarity judgments across sets of word pairs provides a nice approximation of ad-hoc coordination between two speakers originally disagreeing and ultimately converging to a form of agreement with respect to some lexical decision 5

  15. Results 1. replacing the variance preservation bias with an explicit sampling bias actually reduces the variability across models generated from difgerent corpora 2. DSMs generated from difgerent corpora can be aligned in difgerent ways. Alignment does not necessarily equate conceptual agreement but in some cases, mere compatibility , so that coordinating one’s conceptual spaces might simply be the cooperative act of avoiding conflict , rather than being in full agreement 6

  16. Results 1. replacing the variance preservation bias with an explicit sampling bias actually reduces the variability across models generated from difgerent corpora 2. DSMs generated from difgerent corpora can be aligned in difgerent ways. Alignment does not necessarily equate conceptual agreement but in some cases, mere compatibility , so that coordinating one’s conceptual spaces might simply be the cooperative act of avoiding conflict , rather than being in full agreement 6

  17. Results 1. replacing the variance preservation bias with an explicit sampling bias actually reduces the variability across models generated from difgerent corpora 2. DSMs generated from difgerent corpora can be aligned in difgerent ways. Alignment does not necessarily equate conceptual agreement but in some cases, mere compatibility , so that coordinating one’s conceptual spaces might simply be the cooperative act of avoiding conflict , rather than being in full agreement 6

  18. Model

  19. PPMI-weighted DSM d 7 P ( w , c ) PMI ( w , c ) = log P ( w ) · P ( c ) PPMI = max ( PMI ( w , c ) , 0 ) W = U · Σ · V ⊤ W d = U d · Σ α α ∈ [ 0 , 1 ]

  20. Singular vector sampling d Replace the variance-preservation bias by the following add-reduce algorithm: • add : iterate over all singular vectors and selects only those that increase performance on a given lexical similarity dataset • reduce : iterate over the set of added singular vectors and removes all those that do not negatively alter performance on the given lexical similarity dataset 8 W d = U d · Σ α α ∈ [ 0 , 1 ]

  21. Singular vector sampling d Replace the variance-preservation bias by the following add-reduce algorithm: • add : iterate over all singular vectors and selects only those that increase performance on a given lexical similarity dataset • reduce : iterate over the set of added singular vectors and removes all those that do not negatively alter performance on the given lexical similarity dataset 8 W d = U d · Σ α α ∈ [ 0 , 1 ]

  22. Singular vector sampling d Replace the variance-preservation bias by the following add-reduce algorithm: • add : iterate over all singular vectors and selects only those that increase performance on a given lexical similarity dataset • reduce : iterate over the set of added singular vectors and removes all those that do not negatively alter performance on the given lexical similarity dataset 8 W d = U d · Σ α α ∈ [ 0 , 1 ]

  23. Singular vector sampling d Replace the variance-preservation bias by the following add-reduce algorithm: • add : iterate over all singular vectors and selects only those that increase performance on a given lexical similarity dataset • reduce : iterate over the set of added singular vectors and removes all those that do not negatively alter performance on the given lexical similarity dataset 8 W d = U d · Σ α α ∈ [ 0 , 1 ]

  24. Conceptual similarity a i transformation (rotation + scaling). applying cosine similarity-preserving linear scaling (Dev et al., 2018) which minimizes the RMSE while Models are aligned using absolute orientation with 2 b i 1 We model structural similarity between two DSMs as the i A A 1 RMSE A B minimized Root Mean Square Error (RMSE) between them. 9

  25. Conceptual similarity We model structural similarity between two DSMs as the transformation (rotation + scaling). applying cosine similarity-preserving linear scaling (Dev et al., 2018) which minimizes the RMSE while Models are aligned using absolute orientation with 9 minimized Root Mean Square Error (RMSE) between them. � | A | � � � RMSE ( A , B ) = � 1 || a i − b i || 2 | A | i = 1

  26. Conceptual similarity We model structural similarity between two DSMs as the transformation (rotation + scaling). applying cosine similarity-preserving linear scaling (Dev et al., 2018) which minimizes the RMSE while Models are aligned using absolute orientation with 9 minimized Root Mean Square Error (RMSE) between them. � | A | � � � RMSE ( A , B ) = � 1 || a i − b i || 2 | A | i = 1

  27. Experimental setup: corpora 53M Table 1: Corpora used to generate DSMs Full English Wikipedia of January 20 2019 2 600M WIKI 4% of the English Wikipedia 106M WIKI4 British National Corpus 113M BNC 2% of the English Wikipedia WIKI2 Corpus ACL anthology reference corpus 58M ACL .7% of the English Wikipedia 19M WIKI07 Open American National Corpus 17M OANC Details Word Count 10

Recommend


More recommend