Storing morphology information in a wiki Radovan Garabk . tr - PowerPoint PPT Presentation

Aug 26, 2023 •352 likes •455 views

Storing morphology information in a wiki Radovan Garabk . tr Institute of linguistics Pansk 26 813 64 Bratislava Slovakia e-mail: korpus@korpus.juls.savba.sk www: http://korpus.juls.savba.sk Morphology analysers different ways

Storing morphology information in a wiki Radovan Garabík Ľ. Štúr Institute of linguistics Panská 26 813 64 Bratislava Slovakia e-mail: korpus@korpus.juls.savba.sk www: http://korpus.juls.savba.sk
Morphology analysers ● different ways of describing morphology information ● Slavic languages – (prefix)+root+affix ● changes in the root, morphing of suffixes ● paradigm classes – common root (or lemma) modifications ● special treatment to either: reduce number of paradigms, allow guessing of unknown words or accommodate different linguistic premises ● partial paradigms ● our approach: no paradigms at all, for each word the paradigm is spelt out in full
Wiki ● to store all the information: wiki – easy collaborative editing, tracking of changes ● software of choice: MoinMoin http://moinmo.in/ ● Python http://www.python.org/ ● everything in UTF-8: minus one big problem ● plugins ● built in full text search engine or more efficient Xapian search engine bindings ● ~70 kwords (pages), ~2.5·10⁶ wordforms ● design: easily computer parseable, but also human readable
== Lema == ucho == Paradigma == SSns1: ucho SSns2: ucha SSns3: uchu SSns4: ucho SSns5: ucho SSns6: uchu SSns7: uchom SSnp1: uši, uchá SSnp2: úch, ušú, uší SSnp3: ušiam, uchám SSnp4: uši, uchá SSnp5: uši, uchá SSnp6: uchách, ušiach SSnp7: ušami, uchami ---- [[Kategória:Substantíva]] ● sections: Lema, Paradigma, kategórie
● homonymy: special page names: mať (V) , mať (S) ● disambiguation pages == Lema == mať == Pozri == [[mať_(S)]] [[mať_(V)]] ---- [[Kategória:Dezambiguácia]]
Quirks ● reflexive verbs: very efficient solution: we just ignore them :-) ● reflexive particle/pronoun tag R ● analytical forms: we ignore them too ● conditional particle tag Y ● analytical verbs: hey, it's just byť + infinitive or L- participle ● words cannot contain spaces/hyphens
28163 verbs 26061 substantives 13100 adjectives 5069 adverbs 1297 abbreviations 1104 participles 656 interjections 369 particles 369 pronouns 311 numerals 123 prepositions 110 conjunctions 72 citation elements 26 part of multiword expression 2 sa/si 1 by 716 disambiguation pages
Scalability ● each page in its own directory (several files) ● tens of thousands of directory entries in the main directory ● filesystem capable of efficiently handling such amount of data ● all the major contemporary Linux filesystems ● but the winner is.... ● reiserfs (B-trees, tail packing)
Issues ● built in full text search engine cannot cope with such amount of data – multi minute long searches ● Xapian is fine ● category pages do not work conveniently – formatting of moderately long pages ● solution: hide category pages form the users ● otherwise everything works fine
To be continued... ● design interwiki links – easy ● design interwiki data transfer – tricky ● design data transfer to/from external data sources - ??? ● XML-RPC? ● macros for easier editing (new entries)

Recommend

Wiki Wiki |wik| Etymology Coined by programmer Ward Cunningham (1949- ), from Hawaiian

Research Wiki for ESM Wiki Wiki |wik| Etymology Coined by programmer Ward Cunningham (1949- ), from Hawaiian wiki-wiki quick-quick. What is a wiki? Wiki: Wiki |wik| a web site that allows collaborative editing of its

436 views • 14 slides

CAMPUS WIKI ANURAG MISRA DURGESH DEEP Whats a Wiki? A wiki is a type of website

CAMPUS WIKI ANURAG MISRA DURGESH DEEP Whats a Wiki? A wiki is a type of website that allows users to easily add, remove or otherwise edit and change most available content In effect, a wiki is actually a very simple,

386 views • 24 slides

Update on morphology WP activities M. Huertas-Company (GAL-SWG - morphology) EUCLID France - 7

Update on morphology WP activities M. Huertas-Company (GAL-SWG - morphology) EUCLID France - 7 Janvier 2016 Morphology WP in a nutshell Legacy Galaxies WP Provide? Request? shape / morphology measurements for EUCLID galaxies France

411 views • 18 slides

Editing Your Wiki Adding Links on Your Wiki Adding Images to Your Wiki Adding Media to

Why Wiki? What was tried before Wiki Stevenson High School Stevenson High School The Anatomy Team The Anatomy Team Christine Christina Amerigo Collaborative Learning Traditional Carnazzola Pfaffinger Wood Lab

225 views • 4 slides

Lexical Phonology and Morphology February 4, 2016 Lexical Phonology and Morphology Paul

Lexical Phonology and Morphology Lexical Phonology and Morphology February 4, 2016 Lexical Phonology and Morphology Paul Kiparsky: early 1980s 1 Developing work by Dorothy Siegel, Steven Strauss, Mark Aronoff, David Pesetsky. 2 A theory of many

522 views • 22 slides

Databases and PHP Storing and Retrieving information Database Basics l A database is just

Databases and PHP Storing and Retrieving information Database Basics l A database is just information or data stored in a structured manner l Database goal: l To organize some data in a manner that makes it easy to relate, store, and retrieve

427 views • 23 slides

Discrete Morphology and Distances on graphs Jean Cousty Four-Day Course on Mathematical

Discrete Morphology and Distances on graphs Jean Cousty Four-Day Course on Mathematical Morphology in image analysis Bangalore 19-22 October 2010 J. Serra, J. Cousty, B.S. Daya Sagar : Course on Math. Morphology 1/34 Mathematical Morphology

1.1k views • 95 slides

Introduction to English Linguistics 3: Morphology and Word Formation Part I: Morphology Part II:

Introduction to English Linguistics 3: Morphology and Word Formation Part I: Morphology Part II: Word Formation Morphology morpheme (meaning-carrying) allomorph (meaningless variant) morph (concrete form) Systems and Nomenclature

507 views • 38 slides

Introduction to English Linguistics 3: Morphology and Word Formation Part I: Morphology Part II:

714 views • 36 slides

Information System for storing and processing data of environmental monitoring Molorodov Y. I. 2 ,

Preamble Programming solution System components Technologies and production environment Results Information System for storing and processing data of environmental monitoring Molorodov Y. I. 2 , Minkov V.S. 12 , Shirshov P.E. 12 1 Novosibirsk

968 views • 47 slides

Storing Data in The Client Saves information unique to users On Server: Less network

Storing Data in The Client Saves information unique to users On Server: Less network traffic Less processing power f-iacobelli@neiu.edu Web App Dev HTML 5 Ideas behind the standard Better use of resources (for Mobile) Get

272 views • 13 slides

Morphology and Corpora: Introduction Marco Baroni University of Bologna Granada Morphology

Morphology and Corpora: Introduction Marco Baroni University of Bologna Granada Morphology and Corpora Seminar Outline Corpora General overview Data sparseness and the need for larger corpora Morphology Derivational vs. inflectional

746 views • 63 slides

Measuring wiki viability An empirical assessment of the social dynamics of a large sample of

Viable wikis Wiki dynamics Wiki growth enhancers and regulators Research directions Measuring wiki viability An empirical assessment of the social dynamics of a large sample of wiki-based communities Dario Taraborelli

543 views • 31 slides

11. Persistence The use of files, streams and serialization for storing object model data

11. Persistence The use of files, streams and serialization for storing object model data Storing Application Data Without some way of storing data off-line computers would be virtually unusable imagine a Word Processor which forced you

288 views • 16 slides

CS371m - Mobile Computing Persistence Storing Data Multiple options for storing data

CS371m - Mobile Computing Persistence Storing Data Multiple options for storing data associated with apps Shared Preferences Internal Storage device memory External Storage SQLite Database Network Connection 2 Saving

1.23k views • 55 slides

-Cygni -Cygni the GeV to TeV Morphology the GeV to TeV Morphology with with MAGIC and

-Cygni -Cygni the GeV to TeV Morphology the GeV to TeV Morphology with with MAGIC and Fermi-LAT MAGIC and Fermi-LAT Marcel C. Strzys a I. Vovk a , C. Fruck a , S. Masuda b and T. Saito b for the MAGIC Collaboration a) Max Planck

107 views • 9 slides

Economics of Information Storage: The Value in Storing the Long Tail James Hughes 1975 History

Economics of Information Storage: The Value in Storing the Long Tail James Hughes 1975 History Density has grown 36%/yr: 1956: 2 kb/in 2 2005: 100 gb/in 2 Efficiency (B/$) grew 51%/yr: 1974: 200 MB disk drive price $450 k 1

723 views • 27 slides

01. INSERT IMAGE HERE HISTORY & MORPHOLOGY HISTORY & MORPHOLOGY Teochew Eight

01. *INSERT IMAGE HERE* HISTORY & MORPHOLOGY HISTORY & MORPHOLOGY Teochew Eight Hokkien Huay Hainan Districts Association Kuan Association Association Activity Center MAPPING OF CLAN Kwong Siew Hakka Association Heritage

2.08k views • 130 slides

Mathematical Morphology a non exhaustive overview Adrien Bousseau Mathematical Morphology

Mathematical Morphology a non exhaustive overview Adrien Bousseau Mathematical Morphology Shape oriented operations, that simplify image data, preserving their essential shape characteristics and eliminating irrelevancies

761 views • 42 slides

1 17 January 2009 Workshop on the Division of Labour between Morphology and Phonology Sharon

Workshop on the Division of Labour between Morphology and Phonology Sharon Inkelas The Morphology-Phonology Connection SHARON INKELAS University of California, Berkeley 1 Introduction Morphology: generalizations about form and meaning that

355 views • 12 slides

Storing Crawled Content Crawling, session 8 CS6200: Information Retrieval Slides by: Jesse

Storing Crawled Content Crawling, session 8 CS6200: Information Retrieval Slides by: Jesse Anderton Content Conversion Downloaded page content generally needs to be converted into a stream of HTML PDF RTF tokens before it can be indexed.

506 views • 8 slides

Origins The evolution of 1999 - second oldest Wiki still going the Tclers Wiki wikit -

Origins The evolution of 1999 - second oldest Wiki still going the Tclers Wiki wikit - written by Jean-Claude Wippler minimal markup Jos Decoster - jos.decoster@gmail.com Web + Tk renderer Steve Landers -

386 views • 10 slides

Morphology and Syntax A Typological Approach David R. Mortensen Language Technologies Institute

Morphology and Syntax A Typological Approach David R. Mortensen Language Technologies Institute Carnegie Mellon University November 1, 2018 1 Morphology What is Morphology? What is a Word? Formal Operations Morphological Functions

721 views • 43 slides

Word processing Lecture 9 COMPSCI111/111G Todays lecture u Storing information using ASCII

Word processing Lecture 9 COMPSCI111/111G Todays lecture u Storing information using ASCII u Word processor basics: u File formats u WYSIWYG u Basic features of a word processor: u Font and paragraphs u Styles u Headers, footers, footnotes,

886 views • 34 slides

Storing morphology information in a wiki Radovan Garabk . tr - PowerPoint PPT Presentation

Storing morphology information in a wiki Radovan Garabk . tr Institute of linguistics Pansk 26 813 64 Bratislava Slovakia e-mail: korpus@korpus.juls.savba.sk www: http://korpus.juls.savba.sk Morphology analysers different ways

Wiki Wiki |wik| Etymology Coined by programmer Ward Cunningham (1949- ), from Hawaiian

CAMPUS WIKI ANURAG MISRA DURGESH DEEP Whats a Wiki? A wiki is a type of website

Update on morphology WP activities M. Huertas-Company (GAL-SWG - morphology) EUCLID France - 7

Editing Your Wiki Adding Links on Your Wiki Adding Images to Your Wiki Adding Media to

Lexical Phonology and Morphology February 4, 2016 Lexical Phonology and Morphology Paul

Databases and PHP Storing and Retrieving information Database Basics l A database is just

Discrete Morphology and Distances on graphs Jean Cousty Four-Day Course on Mathematical

Introduction to English Linguistics 3: Morphology and Word Formation Part I: Morphology Part II:

Introduction to English Linguistics 3: Morphology and Word Formation Part I: Morphology Part II:

Information System for storing and processing data of environmental monitoring Molorodov Y. I. 2 ,

Storing Data in The Client Saves information unique to users On Server: Less network

Morphology and Corpora: Introduction Marco Baroni University of Bologna Granada Morphology

Measuring wiki viability An empirical assessment of the social dynamics of a large sample of

11. Persistence The use of files, streams and serialization for storing object model data

CS371m - Mobile Computing Persistence Storing Data Multiple options for storing data

-Cygni -Cygni the GeV to TeV Morphology the GeV to TeV Morphology with with MAGIC and

Economics of Information Storage: The Value in Storing the Long Tail James Hughes 1975 History

01. *INSERT IMAGE HERE* HISTORY &amp; MORPHOLOGY HISTORY &amp; MORPHOLOGY Teochew Eight

Mathematical Morphology a non exhaustive overview Adrien Bousseau Mathematical Morphology

1 17 January 2009 Workshop on the Division of Labour between Morphology and Phonology Sharon

Storing Crawled Content Crawling, session 8 CS6200: Information Retrieval Slides by: Jesse

Origins The evolution of 1999 - second oldest Wiki still going the Tclers Wiki wikit -

Morphology and Syntax A Typological Approach David R. Mortensen Language Technologies Institute

Word processing Lecture 9 COMPSCI111/111G Todays lecture u Storing information using ASCII

01. INSERT IMAGE HERE HISTORY & MORPHOLOGY HISTORY & MORPHOLOGY Teochew Eight