Ling 555 — Programming for Linguists Version control, edit distance and nltk Robert Albert Felty Speech Research Laboratory Indiana University Nov. 10, 2008
L555 Outline Nov. 10 Homework questions and comments homework 1 Version Version Control 2 Control intro example editdist concepts nltk Two types of version control subversion Edit Distance 3 theory Edit distance usage Natural language toolkit 4 NLTK intro NLTK demo 2
L555 Version Control intro Nov. 10 homework Definition Version Version control is an essential tool for programmers, Control providing several key functions: intro The ability to track code changes 1 example The ability to collaborate easily 2 concepts types The ability to create and potentially merge different 3 subversion versions of the same project editdist Also referred to as RCS (revision control system) nltk SCM (source control management) 3
L555 A small example Nov. 10 Suppose Joe the programmer and I are working on homework developing a python module. Version Control Joe’s copy intro """ this module does X""" example import re,sys,os,time concepts foo = 1 types bar = 2 subversion editdist My copy nltk """ this module does X""" import re,sys,os foo = 1 bar = 2 another = 23 4
L555 Basic concepts Nov. 10 revision Every time someone commits something new homework to the repository, a new revision is created, Version which is like a snapshot of the project at one Control particular point in time intro repository The repository contains all of the project’s example files, and most importantly a history of all the concepts types changes to it subversion working copy A working copy is your own personal editdist copy of the repository. It contains only 1 revision of the repository nltk 5
L555 Version Control types Nov. 10 homework Centralized Version All the code is stored on a central server. Whenever Control developers want to download the newest version, or intro upload some changes, they must use the server example RCS concepts CVS (concurrent version system) types Subversion subversion editdist Distributed nltk Every person gets a complete copy of the code, including all the history and changes git mercurial bazaar 6
L555 subversion intro Nov. 10 Download from subversion.tigris.org homework Version Why subversion? Control Subversion is designed as a replacement for CVS. intro CVS was the most widely-used version control example system. concepts types Subversion is becoming the most widely-used, and subversion fixes lots of problems with CVS. editdist free available for almost every operating system nltk well documented relatively easy 7
L555 subversion commands Nov. 10 svn help Get help on using subversion. homework svn checkout Download a fresh copy of a repository Version Control svn update Get the latest updates for your working intro copy example svn commit Commit some changes you have made concepts to the repository types svn add Add a file or directory to svn (the next subversion time you commit) editdist svn mv Change the location of a file nltk svn diff Compare your working copy to the version in the repository 8
L555 L555 repository Nov. 10 I have created a subversion repository for the class. homework svn checkout svn://robfelty.com \ Version /home/robfelty/svn/l555 myl555 Control There is a subdirectory for each student intro example You have read-only permissions on everything concepts You have read-write permissions on your own types directory subversion editdist nltk 9
L555 Edit distance Nov. 10 homework Definition Version Edit distance (also known as Levenshtein distance) is Control the minimal number of additions, deletions, and/or substitutions to change one string into another editdist theory practice nltk 10
L555 Edit distance usage Nov. 10 DNA sequencing homework plagiarism detection Version Control measuring phonological similarity editdist spell checking theory speech recognition practice nltk 11
L555 NLTK intro Nov. 10 Extensive toolkit for doing / learning computational homework linguistics Version Control Written in python Includes many corpora editdist Has a variety of tools for NLP, tagging, making nltk trees, and grammars intro open-source demo Extensible Well-documented Actively maintained 12
L555 NLTK demo Nov. 10 For some nltk demos, look on the delicious page homework delicious.com/robfelty/l555 Version Control editdist nltk intro demo 13
Recommend
More recommend