(with-personality-traits (*writer-big-five*) (with-global-constraints ((all-echo)(all-different)) (with-pervasive-predicates (#'syllable-bonus-few) (with-typographic-style (poem) (bind ((w1 (<choose> verb-cognition :+sense [know certain])) (w2 (snow noun-substance :rhyme [though])) (w3 (<choose> adj :+sense [queer odd unusual weird demented stupid silly])) (w4 (or (year noun-quantity) (week noun-quantity) (month noun-quantity) (season noun-quantity))) (w5 (<choose> verb-motion :+sense [cause move back forth shake])) (w6 (<choose> noun-object :+sense [small fragment broken break whole flake])) (here (here adj)) (near (near adj)) (mile (mile noun-quantity pl)) (sleep (sleep verb)) (woods (<choose> noun-plant pl :+sense [forest trees plants wooded area] :-sense [wood]))) "Whose (ref woods) these are I (<choose> verb-cognition :+sense [think guess surmise]) I (ref w1). His (house noun) is in the (village noun) though; He will not see me (stop verb gerund) (ref here :rhyme near) To (watch verb-perception) his (ref woods) (fill verb) up with (ref w2 :different w1 :rhyme w1). ...
My (<choose> adj :sense little-sense) (horse noun-animal) must (<choose> verb-cognition :sense think-sense) it (ref w3 :echo w1) To (stop verb) without a (<choose> noun :sense farmhouse-sense) (binding near :rhyme w3) Between the (ref woods) and (frozen adj) (<choose> noun-object :sense lake-sense) The (<choose> adj -est :sense darkest-sense) (<choose> noun-time :sense evening-sense) of the (ref w4 :different w3 :rhyme w3). He gives his (harness noun) (bell noun pl) a (ref w5 :echo w3) To (ask verb) if there is some (mistake noun :rhyme w5). The only other (<choose> noun :sense sound-sense) is the (<choose> noun-act :sense arc-sense) Of (<choose> ADJ :sense easy-sense) (wind noun) and (<choose> ADJ :sense downy-sense) (ref w6 :different w5 :rhyme w5). The (ref woods) are (<choose> adj :sense lovely-sense), (<choose> adj :sense dark-sense), and (<choose> adj :sense deep-sense), But I have (promise noun pl) to (keep verb :rhyme sleep), And (ref mile) to go before I (ref sleep), And (ref mile) to go before I (sleep verb :different sleep :rhyme sleep)."))))
little-sense little small diminuitive tiny puny farmhouse-sense farm house farmhouse shed lake-sense body fresh water surrounded surround land lake darkest-sense devoid deficient light brightness shadowed shadow black color have hue dark evening-sense latter day period decreasing decrease daylight late afternoon nightfall early part night dinner bedtime spent spend special way evening sound-sense particular auditory effect produce give given cause subjective sensation hear hearing mechanical vibration transmitted transmit an elastic medium sound arc-sense movement arc sweep easy-sense pose posing difficulty require little effort hurried hurry forced force free worry anxiety easy lovely-sense lovely pretty appealing dark-sense dark devoid light black dismal dejected unilluminated deep-sense deep depth penetration extreme intense strong
InkWell • expanded WordNet synonym dictionary: 160,000 words • 20,000 most common words • expanded CMU phonetic dictionary: 220,000 words • rhyming dictionary: 42,000 words • stem dictionary: 163,000 entries (+ Porter Stemmer +
InkWell • n-grams: 50m from general literature (heavily processed Google 2–5-grams);100,000– 1,000,000 per writer (~75 writers pre-compiled) • ~40,000 lines of (Common Lisp) code • 15gb image when all this is loaded and running
Static Bonuses • *writer-word-bonus* • *common-word-bonus* • *halo-word-bonus* • *avoid-word-penalty* • *synonym-proximity-bonus* • *local-halo-bonus* • *local-sense-bonus* • *local-predicates-bonus*
Dynamic Bonus Types • general constraint • all-different • echo • numeric-constraint • rhyme • general n-grams (4) • all-rhyme • writer-n-grams (4) • all-echo • personality traits (10)
Personality: Big Five • Openness to experience • Conscientiousness vs. Unconscientious • Extraversion vs. Introversion • Agreeableness vs. Disagreeable • Emotional stability vs. Neuroticism
NIH Public Access Author Manuscript J Res Pers . Author manuscript; available in PMC 2011 June 1. Published in final edited form as: NIH-PA Author Manuscript J Res Pers . 2010 June 1; 44(3): 363–373. doi:10.1016/j.jrp.2010.04.001. Personality in 100,000 Words: A large-scale analysis of personality and word use among bloggers Tal Yarkoni University of Colorado at Boulder Abstract Previous studies have found systematic associations between personality and individual differences in word use. Such studies have typically focused on broad associations between major personality domains and aggregate word categories, potentially masking more specific associations. Here I report the results of a large-scale analysis of personality and word use in a large sample of blogs (N=694). The size of the dataset enabled pervasive correlations with personality to be identified for a broad NIH-PA Author Manuscript range of lexical variables, including both aggregate word categories and individual English words. The results replicated category-level findings from previous offline studies, identified numerous novel associations at both a categorical and single-word level, and underscored the value of complementary approaches to the study of personality and word use. People differ considerably from each other in their habitual patterns of thought, feeling and action. Not surprisingly, these differences are reflected not only in what people think, feel, and do, but also in what they say about what they think, feel, or do. Recent studies have identified systematic associations between personality and language use in a variety of different contexts,
Sentiment Analysis Linguistic Inquiry and Word Count
1 All pronouns 18 Anger 35 Family 52 Home 2 1st person singular 19 Sadness 36 Humans 53 Sport/exercise 3 1st person plural 20 Cognition 37 Time 54 TV/movies 4 Total 1st person 21 Cause@Causation 38 Past 55 Music 5 Total 2nd person 22 Insight 39 Present 56 Money 6 Total 3rd person 23 Discrepancy 40 Future 57 Metaphysical 7 Negations 24 Inhibition 41 Space 58 Religion 8 Assents 25 Tentativeness 42 Up 59 Death 9 Articles 26 Certainty 43 Down 60 Physical states/factors 10 Prepositions 27 Sensation/perception 44 Inclusion 61 Symptoms & sensations 11 Numbers 28 Seeing 45 Exclusion 62 Sexual 12 Affect 29 Hearing 46 Motion 63 Eating/drinking 13 Positive affect 30 Touching 47 Occupation 64 Sleeping/dreaming 14 Positive feelings 31 Social 48 School 65 Grooming 15 Optimism 32 Communication 49 Job 66 Swear words 16 Negative affect 33 Reference to others 50 Achievement 67 Non-fluencies 17 anxiety 34 Friends 51 Leisure 68 Fillers
n X Tr = Y i P i 1 • Scan text, count instances of categories • P i is the percentage of words in category i • Y i is the Yarkoni coefficient for category i • n is the number of categories • Tr is the resulting strength of trait
I (<choose> verb-emotion :+sense [like favor love]) (or (<choose> adj :+sense [happy]) (<choose> adj :+sense [sad]) (<choose> adj :+sense [angry])) (<choose> noun-animal pl :+sense [dog wolf]). I love happy dogs. No Personality Target I love passionate bird dogs. High Extraversion: 46.15% I yearn for humiliated attack dogs. High Neuroticism: 32.36% I relish unclean housedogs. Low Openness: -54.20%
WordNet ABBREVIATION-FOR NIL ALSO-SEE NIL ANTONYM NIL ATTRIBUTE NIL BASIC-SYNS (("canis_familiaris" NOUN) ("dog" NOUN)) CAUSE NIL DERIVATIONALLY-RELATED-FORM NIL DOMAIN-OF-SYNSET-REGION NIL DOMAIN-OF-SYNSET-TOPIC NIL DOMAIN-OF-SYNSET-USAGE NIL ENTAILMENT NIL GLOSS "a member of the genus Canis (probably descended from the common wolf) that has been domesticated by man since prehistoric times; occurs in many breeds; ‘the dog barked all night’ " GLOSS-WORDS ("breed" "many" "in" "occur" "time" "prehistoric" "man" "by" "domesticate" "domesticated" “be" "have" "wolf" "common" "descend" "probably" "canis" "genus" "member" "a") HYPERNYM (("domestic_animal" NOUN 1317541) ("domesticated_animal" NOUN 1317541) ("canine" NOUN 2083346) ("canid" NOUN 2083346)) HYPERNYM-CHAIN ("domestic_dog" "domestic" "dog" "physical_entity" "entity" "physical_object" "object" "whole" "animate_thing" "thing" "being" "domesticated_animal" "carnivore" "chordate" "vertebrate" "mammalian" "eutherian" "canine" "animal") HYPONYM (("puppy" NOUN 1322604) ("bow-wow" NOUN 2084732) ("barker" NOUN 2084732) ("doggy" NOUN 2084732) ("doggie" NOUN 2084732) ("pooch" NOUN 2084732) ("mutt" NOUN 2084861) ("mongrel" NOUN 2084861) ("cur" NOUN 2084861) ("lapdog" NOUN 2085272) ("toy" NOUN 2085374) ("toy_dog" NOUN 2085374) ("hunting_dog" NOUN 2087122) ("working_dog" NOUN 2103406) ("carriage_dog" NOUN 2110341) ("coach_dog" NOUN 2110341) ("dalmatian" NOUN 2110341) ("basenji" NOUN 2110806) ("pug-dog" NOUN 2110958) ("pug" NOUN 2110958) ("leonberg" NOUN 2111129) ("newfoundland_dog" NOUN 2111277) ("newfoundland" NOUN 2111277) ("great_pyrenees" NOUN 2111500) ("spitz" NOUN 2111626) ("belgian_griffon" NOUN 2112497) ("brussels_griffon" NOUN 2112497) ("griffon" NOUN 2112497) ("welsh_corgi" NOUN 2112826) ("corgi" NOUN 2112826) ("poodle_dog" NOUN 2113335) ("poodle" NOUN 2113335) ("mexican_hairless" NOUN 2113978))
WordNet INSTANCE-HYPERNYM NIL INSTANCE-HYPONYM NIL MEMBER-HOLONYM (("genus_canis" NOUN 2083863) ("canis" NOUN 2083863) ("pack" NOUN 7994941)) MEMBER-MERONYM NIL MEMBER-OF-THIS-DOMAIN-REGION NIL MEMBER-OF-THIS-DOMAIN-TOPIC NIL MEMBER-OF-THIS-DOMAIN-USAGE NIL ORDER 1 ORIGINAL-WORD "domestic_dog" PART-HOLONYM NIL PART-MERONYM (("flag" NOUN 2158846)) PARTICIPLE-OF-VERB NIL PERTAINYM-PERTAINS-TO-NOUN NIL POINTER-SYNS NIL PRIMARY-SYNONYMS ("genus_canis" "canis" "pack" "flag" "domestic_animal" "domesticated_animal" "canine" "canid" "puppy" "bow-wow" "barker" "doggy" "doggie" "pooch" "mutt" "mongrel" "cur" "lapdog" "toy" "toy_dog" "hunting_dog" "working_dog" "carriage_dog" "coach_dog" "dalmatian" "basenji" "pug-dog" "pug" "leonberg" "newfoundland_dog" "newfoundland" "great_pyrenees" "spitz" "belgian_griffon" "brussels_griffon" "griffon" "welsh_corgi" "corgi" "poodle_dog" "poodle" "mexican_hairless" "canis_familiaris" "dog" "domestic_dog") RELATED-TERM NIL SEMANTIC-TYPE NOUN-ANIMAL SENSE-DATA NIL SIMILAR-TO NIL SUBSTANCE-HOLONYM NIL SUBSTANCE-MERONYM NIL SYNSET-ID (2084071 NOUN) TYPE NOUN VERB-GROUP NIL WORD "domestic_dog" WORD-SENSE "domestic_dog%1"
Weird WordNet Entries • hypernym: more general (hypernym(“tent”) = “shelter”) • hyponym: more specific, inverse of above • instance-hyponym: specific example (instance-hyponym(“river”) = “Mississippi River”) • instance-hypernym: inverse of above (instance-hypernym(“Mississippi River”) =
Weird WordNet Entries • holonym: whole that contains the specified part (part-holonym(“fuselage”) =“airplane”) • meronym: inverse of above
CMU Phonetic Dictionary "fiction" (((F) (IH 1) (K) (SH) (AH 0) (N))) "fictional" (((F) (IH 1) (K) (SH) (AH 0) (N) (AH 0) (L))) "fictional_animal" (((F) (IH 1) (K) (SH) (AH 0) (N) (AH 0) (L) (AE 1) (N) (AH 0) (M) (AH 0) (L))) "fictional_character" (((F) (IH 1) (K) (SH) (AH 0) (N) (AH 0) (L) (K) (EH 1) (R) (IH 0) (K) (T) (ER 0))) "fictionalize" (((F) (IH 1) (K) (SH) (AH 0) (N) (AH 0) (L) (AY 2) (Z))) "fictionalized" (((F) (IH 1) (K) (SH) (AH 0) (N) (AH 0) (L) (AY 2) (Z) (D))) "fictions" (((F) (IH 1) (K) (SH) (AH 0) (N) (Z))) "fictitious" (((F) (IH 0) (K) (T) (IH 1) (SH) (AH 0) (S))) "fictitious_character" (((F) (IH 0) (K) (T) (IH 1) (SH) (AH 0) (S) (K) (EH 1) (R) (IH 0) (K) (T) (ER 0))) "fictitious_name" (((F) (IH 0) (K) (T) (IH 1) (SH) (AH 0) (S) (N) (EY 1) (M))) "fictitious_place" (((F) (IH 0) (K) (T) (IH 1) (SH) (AH 0) (S) (P) (L) (EY 1) (S))) "fictitiously" (((F) (IH 0) (K) (T) (IH 1) (SH) (AH 0) (S) (L) (AY 1))) "ficus" (((F) (IY 1) (S) (IY 1) (Y) (UW 1) (EH 1) (S)) ((F) (IY 1) (K) (AH 1) (S)) ((F) (AY 1) (S) (IY 1) (Y) (UW 1) (EH 1) (S)) ((F) (AY 1) (K) (AH 1) (S))) (IH 1) = (<sound> <stress>)
Algorithmic Rhyming • rhyme computation for words and phrases • how much do vowels and consonants contribute to rhyme? • how much do early syllables in a word or phrase contribute to rhyminess?
Vowels Rhyme This Much [AA, AO]: 0.5 [OW, UW]: 0.2 [AE, AH]: 0.5 [AE, IH]: 0.2 [AA, AW]: 0.4 [EH, IH]: 0.2 [AH, EH]: 0.35 [IH, UH]: 0.2 [AH, AO]: 0.35 [UH, UW]:0.15 [AA, AH]: 0.3 [AA, IH]: 0.15 [AE, EH]: 0.3 [AO, UH]: 0.15 [AA, AE]: 0.3 [AA, OW]:0.15 [AH, IH]: 0.25 [AH, OW]: 0.13 [AH, UH]: 0.25 [AO, AW]:0.1 [AH, AW]: 0.2 [EH, UW]:0.1 [AA, EH]: 0.2 [AE, AO]: 0.1 [AY, EY]: 0.2
Consonants Rhyme This Much [G, K]: 0.97 [F, TH]: 0.2 [CH, JH]: 0.95 [S, TH]: 0.2 [F, V]: 0.95 [SH, TH]: 0.2 [DH, TH]: 0.95 [TH, ZH]: 0.2 [SH, ZH]: 0.9 [DH, ZH]: 0.2 [B, P]: 0.9 [V, ZH]: 0.2 [S, Z]: 0.9 [TH, Z]: 0.2 [N, NG]: 0.7 [F, ZH]: 0.2 [G, T]: 0.6 [DH, Z]: 0.2 [D, P]: 0.6 [DH, SH]: 0.2 [D, T]: 0.6 [TH, V]: 0.2 [B, K]: 0.6 [S, ZH]: 0.2 [B, G]: 0.6 [DH, V]: 0.2 [K, P]: 0.6 [V, Z]: 0.2 [K, T]: 0.6 [F, Z]: 0.2 [P, T]: 0.6 [DH, F]: 0.2 [D, K]: 0.6 [F, SH]: 0.2 [D, G]: 0.6 [DH, S]: 0.2 [B, D]: 0.6 [SH, Z]: 0.2 [B, T]: 0.6 [S, SH]: 0.2 [G, P]: 0.6 [S, V]: 0.2 [M, N]: 0.4 [SH, V]: 0.2 [M, NG]: 0.4 [Z, ZH]: 0.2 [L, R]: 0.4 [W, Y]: 0.1 [F, S]: 0.2
fictional / critical • get phonetic spelling for each word • scan from back to front • as you scan left, reduce the strength of possible rhymes - syllables near the front of words count less
fictional / critical • compare rhyminess phoneme by phoneme - vowel/vowel: lookup rhyminess, count similarity of stresses - consonant/consonant: lookup rhyminess - vowel/consonant: realign and compute offset distance • when one word ends, compute how many phonemes are left in the other word
fictional / critical • compute a complicated expression that takes into account - how many opportunities there were to compare sounds (phoneme alignments) - how much those opportunities sounded alike - how well the stresses lined up - how many offsets were required to align the words - how much final consonants sounded alike (ends of words)
◆ 1 . 1 ✓ 2 V 2 S M · . 98 | F | · ( p 1 + p 2 ) · ( y 1 + y 2 ) • M is the rhyminess of the last consonant • F is the number of offsets needed to align the words • V is the total amount of overlap (phonemes that lined up) • p i is the number of phonemes in word i • S is the stress similarity • y i is the number of syllables in word i
((F) (IH 1) (K) (SH) (AH 0) (N) (AH 0) (L)) ((K) (R) (IH 1) (T) (IH 0) (K) (AH 0) (L) • L/L: similarity 1.0 • (AH 0)/(AH 0): similarity 1.0, discounted to .9 • N/K: similarity 0.0 • (AH 0)/(IH 0): similarity 0.25, discounted to 0.18 • K-SH/T: similarity 0.12, discounted to 0.08 • (IH 1)/(IH 1): similarity 1.0, discounted to 0.6 • F/K-R: similarity 0.0
((F) (IH 1) (K) (SH) (AH 0) (N) (AH 0) (L)) ((K) (R) (IH 1) (T) (IH 0) (K) (AH 0) (L) • offsets: 2 • comparable phonemes w/discounting: 2.75 • stress similarity: 3.0 • leftover phonemes: 0 • result = 50%
Rhyme Cache "$boatspot: 0.5924662 "$chipwhip: 0.67664785 "'draw_upfuckup: 0.6016158 "&bundletrouble: 0.7353128 "%brookprick: 0.7495342 "$tripskip: 0.6957108 "%stompslip: 0.5133457 clever representation to speed up hashing
Echoes • counts the number common sounds, weighted by their sonic similarity • takes int0 account similar ordering of common sounds—how close do the words come to rhyming • echo scores are larger than rhyme scores (because of ordering in rhymes); echo and rhyme scores are not commensurable • “fictional” and “critical” echo this much: 70%
Echo Cache "$paleplague: 0.9090182 "$sortwrought: 0.9025195 ")propheticdecrepit: 0.91340137 "$deemmeat: 0.9029019 "+neighboringtapering: 0.9001467 "%wholehollow: 0.9090182 "(skeletalselect: 0.9000825 "'presentpristine: 0.9177177 "'load_uphold_up: 0.94193936 "'destinediagnose: 0.90495605 "'presentresident: 0.92621255
Best Rhymes in King James for “programming” • protesting (64%) • the crackling (54%) • no reckoning (54%)
Notable Echoes of “computer programmer” • provoked me to anger (King James Bible) • most persecuted by man (Origin of Species, Darwin) • compound for grey amber (Moby Dick, Melville) • come from yond poor girl (Sonnets & Plays, Shakespeare) • program a computer (Patterns of Software, rpg)
What is an N-Gram? are huge and 0.14455404 one copy of 0.380291 sooner rather than 0.3726525 was wrong and 0.32407436 be candid with 0.12155247 decided to sleep 0.12155247 had appeared as 0.20447502 up the remnants 0.11795887 its opinion the 0.121112496 answer is absolutely 0.1 which are irrelevant 0.122132495 of bare ground 0.10501691
n-Grams "$!\"!iamajournalist": #(("$!\"!iamajournalist" . 0.146445)) "$!\"!iamalawyer": #(("$!\"!iamalawyer" . 0.166102)) "$!\"!iamaliber": #(("$!\"!iamaliberal" . 0.107989)) "$!\"!iamalifelong": #(("$!\"!iamalifelong" . 0.122386)) "$!\"!iamamarri": #(("$!\"!iamamarried" . 0.104837)) "$!\"!iamaperson": #(("$!\"!iamaperson" . 0.238564))
Counting 2-Grams [Most of] [of the] [the big] [big shore] [shore places] [places were] [were closed] [closed now] [now and] [and there] [there were] [were hardly] [hardly any] [any lights] [lights except] [except the] [the shadowy] [shadowy moving] [moving glow] [glow of] [of a] [a ferryboat] [ferryboat across] [across the] [the Sound] [Sound And] [And as] [as the] [the moon] [moon rose] [rose higher] [higher the] [the inessential] [inessential houses] [houses began] [began to] [to melt] [melt away] [away until] [until gradually] [gradually I] [I became] [became aware] [aware of] [of the] [the old] [old island] [island here] [here that] [that flowered] [flowered once] [once for] [for Dutch] [Dutch sailors] [sailors eyes] [eyes a] [a fresh] [fresh green] [green breast] [breast of] [of the] [the new] [new world] 53/63
I (<choose> verb-emotion :+sense [like favor love]) (or (<choose> adj :+sense [happy]) (<choose> adj :+sense [sad]) (<choose> adj :+sense [angry])) (<choose> noun-animal pl :+sense [dog wolf]). I hero-worship easygoing working dogs. High Conscientiousness: 30.67% High Music, Echoes: 300 Unusual wording Common Word Bonus: -200.0 2-Gram Bonus: -100.0 3-Gram Bonus: -100.0 4-Gram Bonus: -100.0 5-Gram Bonus: -100.0
Final Word Adjustments • tenses, plurals, possessives, comparatives, superlatives • words and phrases • lots of code • large cache
Adjust Word Cache ("hard_to_please" ADJ (-ER)): "harder to please" ("hard_to_please" ADJ (-EST)): "hardest to please" ("hard_to_please" ADJ (CAP)): "Hard to please" ("hard_to_please" ADJ NIL): "hard to please"
(RHYME? (WORD-ADJUSTED-WORDS (SENTENCE-WORD WA-WORD 128)) (WORD-ADJUSTED-WORDS (SENTENCE-WORD WA-WORD 137)))) 100.0) (* (MAX (RHYME? (WORD (SENTENCE-WORD WA-WORD 128)) (WORD (SENTENCE-WORD WA-WORD 119))) (RHYME? (WORD-ADJUSTED-WORDS (SENTENCE-WORD WA-WORD 128)) (WORD-ADJUSTED-WORDS (SENTENCE-WORD WA-WORD 119)))) 100.0) (* (MAX (RHYME? (WORD (SENTENCE-WORD WA-WORD 99)) (WORD (SENTENCE-WORD WA-WORD 75))) (RHYME? (WORD-ADJUSTED-WORDS (SENTENCE-WORD WA-WORD 99)) (WORD-ADJUSTED-WORDS (SENTENCE-WORD WA-WORD 75)))) 1000) (* (MAX (RHYME? (WORD (SENTENCE-WORD WA-WORD 75)) (WORD (SENTENCE-WORD WA-WORD 83))) (RHYME? (WORD-ADJUSTED-WORDS (SENTENCE-WORD WA-WORD 75)) (WORD-ADJUSTED-WORDS (SENTENCE-WORD WA-WORD 83)))) 100.0) (* (MAX (RHYME? (WORD (SENTENCE-WORD WA-WORD 65)) (WORD (SENTENCE-WORD WA-WORD 44))) (RHYME? (WORD-ADJUSTED-WORDS (SENTENCE-WORD WA-WORD 65)) (WORD-ADJUSTED-WORDS (SENTENCE-WORD WA-WORD 44)))) 1000) (* (MAX (RHYME? (WORD (SENTENCE-WORD WA-WORD 51)) (WORD (SENTENCE-WORD WA-WORD 44))) (RHYME? (WORD-ADJUSTED-WORDS (SENTENCE-WORD WA-WORD 51)) (WORD-ADJUSTED-WORDS (SENTENCE-WORD WA-WORD 44)))) 100.0) (* (MAX (RHYME? (WORD (SENTENCE-WORD WA-WORD 34)) (WORD (SENTENCE-WORD WA-WORD 7))) (RHYME? (WORD-ADJUSTED-WORDS (SENTENCE-WORD WA-WORD 34)) (WORD-ADJUSTED-WORDS (SENTENCE-WORD WA-WORD 7)))) 1000) (* (MAX (RHYME? (WORD (SENTENCE-WORD WA-WORD 25)) (WORD (SENTENCE-WORD WA-WORD 51))) (RHYME? (WORD-ADJUSTED-WORDS (SENTENCE-WORD WA-WORD 25)) (WORD-ADJUSTED-WORDS (SENTENCE-WORD WA-WORD 51)))) 100.0))) (- 200.0 (+ (IF (STRING-DIFFERENT (WORD (SENTENCE-WORD WA-WORD 128)) (WORD (SENTENCE-WORD WA-WORD 137))) 50.0 0.0) (IF (STRING-DIFFERENT (WORD (SENTENCE-WORD WA-WORD 99)) (WORD (SENTENCE-WORD WA-WORD 75))) 50.0 0.0) (IF (STRING-DIFFERENT (WORD (SENTENCE-WORD WA-WORD 65)) (WORD (SENTENCE-WORD WA-WORD 44))) 50.0 0.0) (IF (STRING-DIFFERENT (WORD (SENTENCE-WORD WA-WORD 34)) (WORD (SENTENCE-WORD WA-WORD 7))) 50.0 0.0))) (- 200.0 (+ (* (MAX (ECHO? (WORD (SENTENCE-WORD WA-WORD 75)) (WORD (SENTENCE-WORD WA-WORD 44))) (ECHO? (WORD-ADJUSTED-WORDS (SENTENCE-WORD WA-WORD 75)) (WORD-ADJUSTED-WORDS (SENTENCE-WORD WA-WORD 44)))) 100.0) (* (MAX (ECHO? (WORD (SENTENCE-WORD WA-WORD 44)) (WORD (SENTENCE-WORD WA-WORD 7))) (ECHO? (WORD-ADJUSTED-WORDS (SENTENCE-WORD WA-WORD 44)) (WORD-ADJUSTED-WORDS (SENTENCE-WORD WA-WORD 7)))) 100.0))) (- 100.0 (+ (* (ALL-ECHO? (STRING-APPEND (WORD (SENTENCE-WORD WA-WORD 1)) (WORD (SENTENCE-WORD WA-WORD 5)) (WORD (SENTENCE-WORD WA-WORD 7)) (WORD (SENTENCE-WORD WA-WORD 11)) (WORD (SENTENCE-WORD WA-WORD 15)) (WORD (SENTENCE-WORD WA-WORD 24)) (WORD (SENTENCE-WORD WA-WORD 25)) (WORD (SENTENCE-WORD WA-WORD 28)) (WORD (SENTENCE-WORD WA-WORD 30)) (WORD (SENTENCE-WORD WA-WORD 31)) (WORD (SENTENCE-WORD WA-WORD 34)) (WORD (SENTENCE-WORD WA-WORD 39)) (WORD (SENTENCE-WORD WA-WORD 40)) (WORD (SENTENCE-WORD WA-WORD 42)) (WORD (SENTENCE-WORD WA-WORD 47)) (WORD (SENTENCE-WORD WA-WORD 50)) (WORD (SENTENCE-WORD WA-WORD 51)) (WORD (SENTENCE-WORD WA-WORD 55)) (WORD (SENTENCE-WORD WA-WORD 57)) (WORD (SENTENCE-WORD WA-WORD 58)) (WORD (SENTENCE-WORD WA-WORD 61)) (WORD (SENTENCE-WORD WA-WORD 62)) (WORD (SENTENCE-WORD WA-WORD 72)) (WORD (SENTENCE-WORD WA-WORD 73)) (WORD (SENTENCE-WORD WA-WORD 75)) (WORD (SENTENCE-WORD WA-WORD 78)) (WORD (SENTENCE-WORD WA-WORD 83)) (WORD (SENTENCE-WORD WA-WORD 89)) (WORD (SENTENCE-WORD WA-WORD 92)) (WORD (SENTENCE-WORD WA-WORD 95))
For a Very Short Run • optimization function invoked 200,000 times • algorithmic rhyme invoked 3,200,000 times • algorithmic echo invoked 181,400,000 times • n-gram computation 400,000 times • personality computation 1,000,000 times • all-different 200,000 times
Performance? • Parallel word and phrase choice • Precompute lots of stuff • Parallel simulated annealing • Clever representations • Algorithm hacking • Big caches & lots of them • Minimize synchronization
Parallel Word and Phrase Choice • Obvious • Includes parallel computation of wording alternatives and static constraints
Parallel Simulated Annealing • Run n independent copies of the SA engine • Collect best from each & combine • Not proven “correct,” but is effective
Clever Representations 2-gram representation (e.g.) string char1 char2word1word2 number of words length of word1 use a string= hash table indexed by stem(word1)·stem(word2)
Caches • *adjustword-cache* • *dynamic-writer-3-gram-cache* • *all-different-cache* • *dynamic-writer-4-gram-cache* • *all-echo-cache* • *dynamic-writer-5-gram-cache* • *all-rhyme-cache* • *echo-cache* • *dynamic-2-gram-cache* • *marshall-alternatives-cache* • *dynamic-3-gram-cache* • *morphy-cache* • *dynamic-4-gram-cache* • *rhyme-cache* • *dynamic-5-gram-cache* • *rhyme-group-cache* • *dynamic-writer-2-gram-cache* For rhymes, steady state is 12x improvement
Minimize Synchronization • Each SA process fills common caches • Hash tables are atomic, so unsynchronized access might be nutty at times but not “wrong”—the steady state is all-reads
Compilers • Template -> minimization function + constraints -> Lisp code • string patterns -> Lisp code • Text -> writer personalities + language models (n-grams) • LIWC phrases -> Lisp code • constructed hash tables -> fast loadable hash tables • touchup transformations -> Lisp code
Senses + Semantic Types I like (<choose> noun-animal pl :+sense [dog wolf]). semantic type sense
Semantic Types • help select synonym sets (synsets) • used in synonym spreading • when absent, heuristics are used to guess the sense of a word
WordNet dog, NOUN Senses: 1: domestic_dog, NOUN-ANIMAL: a member of the genus Canis… 2: dog, NOUN-PERSON: a dull unattractive unpleasant girl or woman… 3: dog, NOUN-PERSON: informal term for a man; "you lucky dog" 4: cad, NOUN-PERSON: someone who is morally reprehensible; "you dirty dog" 5: wiener, NOUN-FOOD: a smooth-textured sausage of minced beef or pork… 6: detent, NOUN-ARTIFACT: a hinged catch that fits into a notch… 7: dog, NOUN-ARTIFACT: metal supports for logs in a fireplace…
no semantic I like (dog noun pl) help I like detents I like franks I like clicks I like cads I like wieners I like heels I like hounds I like frankfurters I like weenies I like wienerwursts I like hotdogs I like bounders I like blackguards I like firedogs I like frumps I like andirons I like dog-irons I like pawls I like domestic dogs I like hot dogs
semantic I like (dog noun-animal pl) type I like curs I like animals I like Pekes I like toys I like beasts I like creatures I like puppies I like wolves I like foxes I like barkers I like Newfoundlands I like canines I like brutes I like strays I like corgis I like lapdogs I like doggies
Senses • data structure consisting of words + relevances, based on decaying synonym-network spreading • can be used to compute semantic similarity of words / similarity of meaning, kind of like a cosine distance • like a magnet passed over all the words InkWell knows—it picks up to various heights words attracted to it
arctic_wolf 0.75 canis_niger 0.75 doggy 0.75 mongrel 0.75 red_wolf 0.75 barker 0.75 canis_rufus 0.75 domestic_dog 0.75 mutt 0.75 spitz 0.75 carriage_do basenji 0.75 0.75 gray_wolf 0.75 newfoundland 0.75 timber_wolf 0.75 g newfoundland belgian_griffon 0.75 coach_dog 0.75 great_pyrenees 0.75 0.75 toy 0.75 _dog bow-wow 0.75 corgi 0.75 grey_wolf 0.75 pooch 0.75 toy_dog 0.75 brush_wolf 0.75 coydog 0.56 griffon 0.75 poodle 0.75 welsh_corgi 0.75 brussels_griffo 0.75 coyote 0.75 hunting_dog 0.75 poodle_dog 0.75 white_wolf 0.75 n canis_familiari 0.75 cur 0.75 lapdog 0.75 prairie_wolf 0.75 wolf 1 s canis_latrans 0.75 dalmatian 0.75 leonberg 0.75 pug 0.75 wolf_cub 0.75 canis_lupus 0.75 dog 1 maned_wolf 0.75 pug-dog 0.75 wolf_pup 0.75 canis_lupus_ mexican_hairles 0.75 doggie 0.75 0.75 puppy 0.75 working_dog 0.75 tundrarum s
I like (<choose> noun-animal pl :+sense [dog wolf]) NOUN-ANIMAL: any of several breeds of very small dogs kept purely I like dogs as pets I like toys I like Newfoundlands I like huskies I like retrievers I like hounds I like cubs I like puppies I like pups I like whelps I like Canidaes I like wolfhounds I like Carnivoras I like Canises I like jackals I like curs NOUN-ANIMAL: offspring I like Pekes of a coyote and a dog I like Leonbergs I like coydogs I like Pomeranians
n-Grams (<choose> noun-animal pl :+sense [dog wolf]) (or is are) animals. n-grams select right verb
(<choose> noun-animal pl :+sense [dog wolf]) (or is are) animals. Dogs are animals. Toys are animals. Huskies are animals. Hounds are animals. Cubs are animals. Pups are animals. Puppies are animals. Coyotes are animals. (in-3-grams "dogs" "are" "animals"): 0.0 (in-3-grams "dogs" "is" "animals"): 0.0 (in-2-grams "dogs" "are"): 0.36829454 (in-2-grams "dogs" "is"): 0.23805899 (in-2-grams "are" "animals"): 0.27010748 (in-2-grams "is" "animals"): 0.13000134
<null-word> (<choose> noun-animal pl :+sense [dog wolf]) (or is are) (or a an <null-word>) animals. n-grams eliminate stupid word
(<choose> noun-animal pl :+sense [dog wolf]) (or is are) (or a an <null-word>) animals. Pups are animals. Dogs are animals. Text n-gram score Coyotes are animals. --------------------------------- Puppies are animals. Dogs are a animals: 5.11 Cubs are animals. Dogs are an animals: 5.35 Hounds are animals. Dogs is a animals: 5.99 Huskies are animals. Dogs is an animals: 6.66 Toys are animals. Dogs is animals: 7.89 Wolves are animals. Dogs are animals: 8.79 Foxes are animals.
Halos The woods are lovely, dark, and deep The woods are bright, light, and high Happiness Halo Delighted, Ebullient, Ecstatic, Elated, Energetic, Enthusiastic, Euphoric, Excited, Exhilarated, Overjoyed, Thrilled, Tickled pink, Turned on, Vibrant, Zippy
The woods are lovely, dark, and deep The woods are hot, rough, and cold Anger Halo Affronted, Belligerent, Bitter, Burned up, Enraged, Fuming, Furious, Heated, Incensed, Infuriated, Intense, Outraged, Provoked, Seething, Storming, Truculent, Vengeful, Vindictive, Wild
Halos • Anger.halo • Caring.halo • Confusion.halo • Depression.halo • Fear.halo • Frost.halo • Happiness.halo • Hurt.halo • Inadequateness.halo • Loneliness.halo • Remorse.halo
Recommend
More recommend