Announcements � VisualStudio Express Ink Analysis � Free, Hobbyist version of VS 2005 � Presentations, Tuesday Jan 23 � 15 minute presentation + 3 minutes Richard Anderson discussion CSE 481b � PowerPoint slides Winter 2007 � Group order: A, B, C, D Today’s lecture Ink Analysis for Search � Output � Handwriting Recognition � Mapping of search results to source � Structure Recognition � Reflect underlying structure � Classification � Handle different types of search queries � Annotation � Raw text � Boolean � JNT Note Format � Typed queries (“481 as a course number”) � Object queries (Course numbers) � Environment (List, Prose, Mathematics, . . . ) Handwriting Recognition: Ink Analysis Pipeline Identify the following words Filter Structure Recognize Classify Annotate 1
Recognizer Architecture Recognition results Ink Segments Top 10 List TDNN dog 68 clog 57 dug 51 doom 42 Output Matrix divvy 37 88 8 68226357 4 a Lexicon ooze 35 b 23 4 61 44 5757 4 … Beam Search … … cloy 34 a o 92 81 51 9 4720 14 d a 00 g 57 … g doxy 29 b 13 31 8 2 14 3 3 e o 12 l b 00 t 12 b t … client 22 c l 07 b 6 a g 711252 8 79 90 90 c 00 t a dozy 13 d h 1717 5 7 43 13 7 a 73 o t 5 d 00 g e … … o 09 n 7 18 57 2857 6 5 … g 68 t 53 16 79 914415 12 o t 8 Slide from Jay Pittman, Microsoft Recognizer Training Tablet PC Recognition API � Collect large set of training data � Basic idea: � Samples of known inputs that can be � Ink In, Text Out used to set “weights” in reco engine � Needed to build a recognizer � Dictionary � Language samples � Commercial recognizers based on massive data sets Recognition Code I Recognition Code II private void OnRecoClick(object sender, EventArgs e) { private Recognizers recognizers; RecognizerContext recoContext = this.recognizer.CreateRecognizerContext(); private Recognizer recognizer; recoContext.Factoid = GetFactoid(); recoContext.Strokes = this.inkCollector.Ink.Strokes; public Form1() { recoContext.EndInkInput(); InitializeComponent(); this.inkCollector = new InkCollector(this.inkPanel.Handle); RecognitionStatus recoStatus; this.inkCollector.Enabled = true; RecognitionResult recoResult = recoContext.Recognize(out recoStatus); this.recognizers = new Recognizers(); this.recognizer = recognizers.GetDefaultRecognizer(); if (recoStatus != RecognitionStatus.NoError) } return; string result = recoResult.TopString; RecognitionAlternate topAlt = recoResult.TopAlternate; 2
Factoids Reading Journal Notes � Bias the recognizer towards certain � Journal Reader to import .JNT types of content � .JNT -> XML -> Custom Format � DEFAULT � Journal format gives an initial parsing � CURRENCY � You may want to undo this parsing and � NUMBER work with ink at the page level � TELEPHONE � EMAIL � UPPERCHAR JNT Format Journal Drawing � Journal Document � Uninterpreted Ink � List of Journal Pages � Base64String � Journal Page � List of Content if (childNode.Name.ToLower().Equals("inkobject")) { string base64Ink = childNode.InnerText; � Content ink = new Ink(); ink.Load(Convert.FromBase64String(base64Ink)); � Journal Drawing, Journal Paragraph, other } stuff Text Structure Shape recognition � JournalParagraph � Surprisingly challenging because of drawing artifacts � List of JournalLines � Open figures � JournalLine � Multiple strokes � List of JournalInkWords � Imprecise corners � JournalInkWord � Arrows � Alternate List � Uninterpreted Ink 3
Basic approach for structure Structure recognition recognition � Grouping by rectangular region � Heuristics for separating regions � White space � Separating lines General approach to recognition/classification Clustering � Extraction of features � Objects become points in high dimensional space � Construct mapping from features to classes Learning Heuristic � Programmatically determine classification based on features 4
Classification Classification � Identify different types of text � Mathematics � Prose � Lists � Brainstorming � Code � Domains � Chemistry, Physics, Algorithms, Annotation Annotation � Identify annotation marks � Highlighted text � Circles � Check marks � Cross out 5
Recommend
More recommend