Using Java in JAPE The GATE Ontology API GATE APIs Track II, Module 6 Second GATE Training Course May 2010 GATE APIs 1 / 62
Using Java in JAPE The GATE Ontology API Outline Using Java in JAPE 1 Basic JAPE Java on the RHS Common idioms 2 The GATE Ontology API 5 minute guide to ontologies Ontologies in GATE Embedded GATE APIs 2 / 62
Basic JAPE Using Java in JAPE Java on the RHS The GATE Ontology API Common idioms Outline Using Java in JAPE 1 Basic JAPE Java on the RHS Common idioms 2 The GATE Ontology API 5 minute guide to ontologies Ontologies in GATE Embedded GATE APIs 3 / 62
Basic JAPE Using Java in JAPE Java on the RHS The GATE Ontology API Common idioms JAPE Pattern matching over annotations JAPE is a language for doing regular-expression-style pattern matching over annotations rather than text. Each JAPE rule consists of Left hand side specifying the patterns to match Right hand side specifying what to do when a match is found JAPE rules combine to create a phase Phases combine to create a grammar GATE APIs 4 / 62
Basic JAPE Using Java in JAPE Java on the RHS The GATE Ontology API Common idioms An Example JAPE Rule 1 Rule: University1 2 ( {Token.string == "University"} 3 {Token.string == "of"} 4 {Lookup.minorType == city} 5 6 ):orgName 7 --> 8 :orgName.Organisation = {kind = "university", rule = "University1"} 9 Left hand side specifies annotations to match, optionally labelling some of them for use on the right hand side. GATE APIs 5 / 62
Basic JAPE Using Java in JAPE Java on the RHS The GATE Ontology API Common idioms LHS Patterns Elements Left hand side of the rule specifies the pattern to match, in various ways Annotation type: {Token} Feature constraints: {Token.string == "University"} {Token.length > 4} Also supports < , <= , >= , != and regular expressions =~ , ==~ , !~ , !=~ . Negative constraints: {Token.length > 4, !Lookup.majorType == "stopword"} This matches a Token of more than 4 characters that does not start at the same location as a "stopword" Lookup. Overlap constraints: {Person within {Section.title == "authors"}} GATE APIs 6 / 62
Basic JAPE Using Java in JAPE Java on the RHS The GATE Ontology API Common idioms LHS Patterns Combinations Pattern elements can be combined in various ways Sequencing: {Token}{Token} Alternatives: {Token} | {Lookup} Grouping with parentheses Usual regular expression multiplicity operators zero-or-one: ({MyAnnot})? zero-or-more: ({MyAnnot})* one-or-more: ({MyAnnot})+ exactly n : ({MyAnnot})[n] between n and m (inclusive): ({MyAnnot})[n,m] GATE APIs 7 / 62
Basic JAPE Using Java in JAPE Java on the RHS The GATE Ontology API Common idioms LHS Patterns Labelling Groups can be labelled. This has no effect on the matching process, but makes matched annotations available to the RHS 1 ( {Token.string == "University"} 2 {Token.string == "of"} 3 ({Lookup.minorType == city}):uniTown 4 5 ):orgName GATE APIs 8 / 62
Basic JAPE Using Java in JAPE Java on the RHS The GATE Ontology API Common idioms RHS Actions On the RHS, you can use the labels from the LHS to create new annotations: 6 --> 7 :uniTown.UniversityTown = {}, 8 :orgName.Organisation = {kind = "university", rule = "University1"} 9 The :label.AnnotationType = {features} syntax creates a new annotation of the given type whose span covers all the annotations bound to the label. so the Organisation annotation will span from the start of the “University” Token to the end of the Lookup. GATE APIs 9 / 62
Basic JAPE Using Java in JAPE Java on the RHS The GATE Ontology API Common idioms Outline Using Java in JAPE 1 Basic JAPE Java on the RHS Common idioms 2 The GATE Ontology API 5 minute guide to ontologies Ontologies in GATE Embedded GATE APIs 10 / 62
Basic JAPE Using Java in JAPE Java on the RHS The GATE Ontology API Common idioms Beyond Simple Actions It’s often useful to do more complex operations on the RHS than simply adding annotations, e.g. Set a new feature on one of the matched annotations Delete annotations from the input More complex feature value mappings, e.g. concatenate several LHS features to make one RHS one. Collect statistics, e.g. count the number of matched annotations and store the count as a document feature. Populate an ontology (later). JAPE has no special syntax for these operations, but allows blocks of arbitrary Java code on the RHS. GATE APIs 11 / 62
Basic JAPE Using Java in JAPE Java on the RHS The GATE Ontology API Common idioms Java on the RHS 1 Rule: HelloWorld 2 ( {Token.string == "Hello"} 3 {Token.string == "World"} 4 5 ):hello 6 --> 7 { System.out.println("Hello world"); 8 9 } The RHS of a JAPE rule can have any number of :bind.Type = {} assignment expressions and blocks of Java code, separated by commas. GATE APIs 12 / 62
Basic JAPE Using Java in JAPE Java on the RHS The GATE Ontology API Common idioms How JAPE Rules are Compiled For each JAPE rule, GATE creates a Java class 1 package japeactionclasses; / / v a r i o u s i m p o r t s , see below 2 3 4 public class / ∗ g en e rat e d c l a s s name ∗ / implements RhsAction { 5 public void doit( 6 Document doc, 7 Map<String, AnnotationSet> bindings, 8 AnnotationSet annotations, / / d e p r e c a t e d 9 AnnotationSet inputAS, 10 AnnotationSet outputAS, 11 Ontology ontology) throws JapeException { 12 / / . . . 13 } 14 15 } GATE APIs 13 / 62
Basic JAPE Using Java in JAPE Java on the RHS The GATE Ontology API Common idioms JAPE Action Classes Each block or assignment on the RHS becomes a block of Java code. These blocks are concatenated together to make the body of the doit method. Local variables are local to each block, not shared. At runtime, whenever the rule matches, doit is called. GATE APIs 14 / 62
Basic JAPE Using Java in JAPE Java on the RHS The GATE Ontology API Common idioms Java Block Parameters The parameters available to Java RHS blocks are: doc The document currently being processed. inputAS The AnnotationSet specified by the inputASName runtime parameter to the JAPE transducer PR. Read or delete annotations from here. outputAS The AnnotationSet specified by the outputASName runtime parameter to the JAPE transducer PR. Create new annotations in here. ontology The ontology (if any) provided as a runtime parameter to the JAPE transducer PR. bindings The bindings map. . . GATE APIs 15 / 62
Basic JAPE Using Java in JAPE Java on the RHS The GATE Ontology API Common idioms Bindings bindings is a Map from string to AnnotationSet Keys are labels from the LHS. Values are the annotations matched by the label. 1 ( {Token.string == "University"} 2 {Token.string == "of"} 3 ({Lookup.minorType == city}):uniTown 4 5 ):orgName bindings.get("uniTown") contains one annotation (the Lookup ) bindings.get("orgName") contains three annotations (two Token s plus the Lookup ) GATE APIs 16 / 62
Basic JAPE Using Java in JAPE Java on the RHS The GATE Ontology API Common idioms Hands-on exercises The easiest way to experiment with JAPE is to use GATE Developer. The hands-on directory contains a number of sample JAPE files for you to modify, which will be described for each individual exercise. There is an .xgapp file for each exercise to load the right PRs and documents. Good idea to disable session saving using Options → Configuration → Advanced (or GATE 5.2 → Preferences → Advanced on Mac OS X). GATE APIs 17 / 62
Basic JAPE Using Java in JAPE Java on the RHS The GATE Ontology API Common idioms Exercise 1: A simple JAPE RHS Start GATE Developer. Load hands-on/jape/exercise1.xgapp This is the default ANNIE application with an additional JAPE transducer “exercise 1” at the end. This transducer loads the file hands-on/jape/resources/simple.jape , which contains a single simple JAPE rule. Modify the Java RHS block to print out the type and features of each annotation the rule matches. You need to right click the “Exercise 1 Transducer” and reinitialize after saving the .jape file. Test it by running the “Exercise 1” application. GATE APIs 18 / 62
Basic JAPE Using Java in JAPE Java on the RHS The GATE Ontology API Common idioms Exercise 1: Solution A possible solution: 1 Rule: ListEntities 2 ({Person}|{Organization}|{Location}):ent 3 --> 4 { AnnotationSet ents = bindings.get("ent"); 5 for (Annotation e : ents) { 6 System.out.println("Found " + e.getType() 7 + " annotation"); 8 System.out.println(" features: " 9 + e.getFeatures()); 10 } 11 12 } GATE APIs
Basic JAPE Using Java in JAPE Java on the RHS The GATE Ontology API Common idioms Imports By default, every action class imports java.io.*, java.util.*, gate.*, gate.jape.*, gate.creole.ontology.*, gate.annotation.*, and gate.util.*. So classes from these packages can be used unqualified in RHS blocks. You can add additional imports by putting an import block at the top of the JAPE file, before the Phase: line: 1 Imports: { import my.pkg.*; 2 import static gate.Utils.*; 3 4 } You can import any class available in the GATE core or in any loaded plugin. GATE APIs 19 / 62
Recommend
More recommend