sorting for wordclouds text processing data visualization
play

+ Sorting for WordClouds + Text Processing Data Visualization - PowerPoint PPT Presentation

+ Sorting for WordClouds + Text Processing Data Visualization Process Text Visualization n Acquire - Obtain the data from n Source = Document some source n Parse = Words n Parse - Give the data some structure, clean up n Filter


  1. + Sorting for WordClouds

  2. + Text Processing Data Visualization Process Text Visualization n Acquire - Obtain the data from n Source = Document some source n Parse = Words n Parse - Give the data some structure, clean up n Filter = Word Set with counts n Filter - Remove all but the data of interest n Mine = Get relevant words n Mine - Use the data to derive interesting properties n Represent = Fonts/Placement n Represent - Chose a visual representation n Refine/Interact n Refine – Improve to make it more visually engaging n Interact - Make it interactive

  3. + Acquire data: Source = Document n // Sketch 7-1: Parsing an input text file � String inputTextFile = "Obama.txt"; � String [] fileContents; � fileContents = loadStrings(inputTextFile); � n fileContents has the source! � n What next? �

  4. � + Parse n How do we turn fileContents into words? n join array into one long string String rawText; � rawText = join(fileContents, " "); � n make all same case rawText = rawText.toLowerCase(); � n remove symbols and split string into words String delimiters = " ,./?<>;:'\"[{]}\\|=+-_()*&^%$#@!~"; � tokens = splitTokens(rawText, delimiters); �

  5. + Display the words n Let's start by displaying all of the words: for (String t : tokens) { � //textSize(15); � if(random(100) > 40) { // more red than green � fill(random(150,250),0, 0,190); // make red � } else { � fill(0,random(150,250), 0,190); // make green � } � text(t, random(0,width-50), random(20,height)); � } // for �

  6. + Count the words (second way) n Use a HashMap ( a dictionary from words è counts) n HashMap <String,Integer> wordCountSet = � new HashMap<String,Integer>(); � n to add a new word: n wordCountSet.put(word,1); // initial count is 1 � n to get the frequency of a word: n Integer frequency = � wordCountSet.get(word); // if null, then none � n to update the frequency of a word: n wordCountSet.put(word, frequeny + 1); �

  7. + Count the words (second way)

  8. + Display the UNIQUE words n Instead of tokens, we want the keys of the HashMap: n wordCountSet.keySet() for (String t : wordCountSet.keySet()) { � //Let's change the text size based on the frequency � //textSize(<what goes here?>); � if(random(100) > 40) { // more red than green � fill(random(150,250),0, 0,190); // make red � } else { � fill(0,random(150,250), 0,190); // make green � } � text(t, random(0,width-50), random(20,height)); � } // for �

  9. + Display the UNIQUE words n Instead of tokens, we want the keys of the HashMap: n wordCountSet.keySet() for (String t : wordCountSet.keySet()) { � //Let's change the text size based on the frequency � textSize(wordCountSet.get(t)); � if(random(100) > 40) { // more red than green � fill(random(150,250),0, 0,190); // make red � } else { � fill(0,random(150,250), 0,190); // make green � } � text(t, random(0,width-50), random(20,height)); � } // for �

  10. + Display the most frequent words n Lazy way n check all frequencies of words in set and only display words above a threshold frequency. n First find the threshold (loop once) n Next use the threshold (loop second time) n Systematic way n sort word set by frequency n only display top N words

  11. + Code from class (max frequency)

  12. + Filter and size text using max frequency and map()

  13. + Sorting n Any process of arranging items in sequence n Build-in sort() n Works on arrays of simple types, i.e. int , float and String n float[] a = { 3.4, 3.6, 2, 0, 7.1 }; n a = sort(a); // sort all elements in place n String[] s = { "deer", "elephant", "bear", "aardvark", "cat" }; n s = sort(s, 3); // sort the first three elements n Convenient, but not very flexible

  14. + Sorting (implement your own) n Easy to code (but slow) n Selection Sort n Bubble Sort n Insertion Sort n Animations n https://www.cs.usfca.edu/~galles/visualization/ ComparisonSort.html n http://www.sorting-algorithms.com/

  15. + Selection sort n Basic idea: n step forward on each item of the array starting with the first item, if there is a smallest item in front of the item being stepped on, then swap the two items. Repeat until you've stepped on every item. n Implementation: n nested loop n first loop marks the current item n inner loop finds the smallest item between the current item and the last item inclusively, then swaps the items n Time Complexity?

  16. + Bubble sort n Basic idea: n start with the first item in the array compare adjacent items if they are not sorted, swap them, go to the next item and repeat until you get to the end. n repeat the above process until sorted n Implementation: n nested loop n first loop checks if the array is sorted n inner compares and swaps n Time Complexity?

  17. + Insertion Sort n Basic idea: n start with a sorted subarray, insert the next item from your unsorted list into the right position of the sorted list. n When you get to the end of the unsorted list, you are done n Implementation: n nested loop n first loop gets next item to insert n inner compares, copies and makes space n inserts into space n Time Complexity?

Recommend


More recommend