P art-of-Sp eec h T agging Guidelines for the P enn T reebank Pro ject (3rd Revision, 2nd prin ting) Beatrice San torini 1 June 1990 1 Second prin ting (F ebruary 1995) up dated and sligh tly reformatted b y Rob ert MacIn t yre. The text of this v ersion app ears to b e the same as the �rst prin ting, but subtle di�erences ma y exist. The tags for prop er noun and p ersonal pronoun w ere altered in late 1992 in order to a v oid con�icts with brac k eting tags; this v ersion re�ects the new tag names.
Con ten ts 1 In tro duction 1 2 List of parts of sp eec h with corresp onding tag 1 3 List of tags with corresp onding part of sp eec h 6 4 Problematic cases 7 4.1 Confusing parts of sp eec h : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 7 4.2 Sp eci�c w ords and collo cations : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 22 5 General tagging con v en tions 31 5.1 P art of sp eec h and syn tactic function : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 31 5.2 V ertical slash con v en tion : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 31 5.3 Capitalized w ords : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 32 5.4 Abbreviations : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 32 1
1 INTR ODUCTION 1 1 In tro duction This section addresses the linguistic issues that arise in connection with annotating texts b y part of sp eec h (\tagging"). Section 2 is an alphab etical list of the parts of sp eec h enco ded in the annotation system of the P enn T reebank Pro ject, along with their corresp onding abbreviations (\tags") and some information concerning their de�nition. This section allo ws y ou to �nd an unfamiliar tag b y lo oking up a familia r part of sp eec h. Section 3 recapitulates the information in Section 2, but this time the information is alphab etically ordered b y tags. This is the section to consult in order to �nd out what an unfamilia r tag means. Since the parts of sp eec h are probably familia r to y ou from high sc ho ol English, y ou should ha v e little di�cult y in assimilating the tags themselv es. Ho w ev er, it is often quite di�cult to decide whic h tag is appropriate in a particular con text. The t w o sections 4.1 and 4.2 therefore include examples and guidelines on ho w to tag problematic cases. If y ou are uncertain ab out whether a giv en tag is correct or not, refer to these sections in order to ensure a consisten tly annotated text. Section 4.1 discusses parts of sp eec h that are easily confused and giv es guidelines on ho w to tag suc h cases, while Section 4.2 con tains an alphab etical list of sp eci�c problematic w ords and collo cations. Finally , Section 5 discusses some general tagging con v en tions. One general rule, ho w ev er, is so imp ortan t that w e state it here. Man y texts are not mo dels of go o d prose, and some con tain outrigh t errors and slips of the p en. Do not b e tempted to correct a tag to what it w ould b e if the text w ere correct; rather, it is the incorrect w ord that should b e tagged correctly . � If y ou ha v e questions that y ou do not �nd co v ered, b e sure to let us kno w so that w e can incorp orate a discussion of them in to up dates of this guide. 2 List of parts of sp eec h with corresp onding tag Adjectiv e |JJ Hyphenated comp ounds that are used as mo di�ers are tagged as adjectiv es (JJ). EXAMPLES: happ y-go-luc ky/JJ one-of-a-kind/JJ run-of-the-mill/JJ Ordinal n um b ers are tagged as adjectiv es (JJ), as are comp ounds of the form n -th X-est , lik e f ourth-lar gest . Adjectiv e, comparativ e |JJR Adjectiv es with the comparativ e ending -er and a comparativ e meaning are tagged JJR. M or e and l ess when used as adjectiv es, as in m or e or less mail , are also tagged as JJR. M or e and l ess can also b e tagged as JJR when they o ccur b y themselv es; see the en tries for these w ords in Section 4.2. Adjectiv es with a comparativ e meaning but without the comparativ e ending -er , lik e s up erior , should simply b e tagged as JJ. Adjectiv es with the ending -er but without a strictly comparativ e meaning (\more X"), lik e f urther in f urther details , should also simply b e tagged as JJ. Adjectiv e, sup erlativ e | JJS Adjectiv es with the sup erlativ e ending -est (as w ell as w orst ) are tagged as JJS. M ost and l e ast when used as adjectiv es, as in t he most or the le ast mail , are also tagged as JJS. M ost and l e ast can also b e tagged as JJS when they o ccur b y themselv es; see the en tries for these w ords in Section 4.2. Adjectiv es with a sup erlativ e meaning but without the sup erlativ e ending -est , lik e f irst , l ast or u nsurp asse d , should simply b e tagged as JJ.
2 LIST OF P AR TS OF SPEECH WITH CORRESPONDING T A G 2 Adv erb |RB This category includes most w ords that end in -ly as w ell as degree w ords lik e q uite , t o o and v ery , p osthead mo di�ers lik e e nough and i nde e d (as in g o o d enough , v ery wel l inde e d ), and negativ e mark ers lik e n ot , n ' t and n ever . Adv erb, comparativ e |RBR Adv erbs with the comparativ e ending -er but without a strictly comparativ e meaning, lik e l ater in W e c an always c ome by later , should simply b e tagged as RB. Adv erb, sup erlativ e |RBS Article |DT (see \Determiner") Cardinal n um b er |CD Common noun, plural |NNS (see \Noun, plural") Common noun, singular or mass |NN (see \Noun, singular or mass") Comparativ e adjectiv e |JJR (see \Adjectiv e, comparativ e") Comparativ e adv erb |RBR (see \Adv erb, comparativ e") Conjunction, co ordinating | CC (see \Co ordinating conjunction") Conjunction, sub ordinati ng | IN (see \Prep osition or sub ordinating conjunction") Co ordinating conjunction |CC This category includes a nd , b ut , n or , o r , y et (as in Y et it's che ap , c he ap yet go o d ), as w ell as the mathematical op erators p lus , m inus , l ess , t imes (in the sense of \m ultipli ed b y") and o ver (in the sense of \divided b y"), when they are sp elled out. F or in the sense of \b ecause" is a co ordinating conjunction (CC) rather than a sub ordinating conjunction (IN). EXAMPLE: He ask ed to b e transferred, for/CC he w as unhapp y . S o in the sense of \so that," on the other hand, is a sub ordinating conjunction (IN). Determiner | DT This category includes the articles a (n) , e very , n o and t he , the inde�nite determiners a nother , a ny and s ome , e ach , e ither (as in e ither way ), n either (as in n either de cision ), t hat , t hese , t his and t hose , and instances of a l l and b oth when they do not precede a determiner or p ossessiv e pronoun (as in a l l r o ads or b oth times ). (Instances of a l l or b oth that do precede a determiner or p ossessiv e pronoun are tagged as predeterminers (PDT).) Since an y noun phrase can con tain at most one determiner, the fact that s uch can o ccur together with a determiner (as in t he only such c ase ) means that it should b e tagged as an adjectiv e (JJ), unless it precedes a determiner, as in s uch a go o d time , in whic h case it is a predeterminer (PDT).
Recommend
More recommend