LoonyBin: Keeping Language Technologists Sane through Automated Management of (Hyper)Workflows Jonathan Clark and Alon Lavie Carnegie-Mellon University LREC 2010 Thursday, May 20, 2010
Outline • Empirical NLP Research • Day-to-day issues • Current problems • LoonyBin’s solutions • Workflows • HyperWorkflows 2
Empirical NLP • Plumbing: Gluing (Linux) tools together • Recording results • Sanity checking • Running variations • Moving between clusters & schedulers 3
Empirical NLP • Plumbing: Gluing (Linux) tools together • Recording results • Sanity checking • Running variations • Moving between clusters & schedulers 3
Empirical NLP • Plumbing: Gluing (Linux) tools together • Recording results • Sanity checking • Running variations • Moving between clusters & schedulers 3
Empirical NLP • Plumbing: Gluing (Linux) tools together • Recording results • Sanity checking • Running variations • Moving between clusters & schedulers 3
Empirical NLP • Plumbing: Gluing (Linux) tools together • Recording results A B C • Sanity checking • Running variations • Moving between clusters & schedulers 3
Empirical NLP • Plumbing: Gluing (Linux) tools together • Recording results A B C • Sanity checking • Running variations • Moving between clusters & schedulers 3
Empirical NLP • Plumbing: Gluing (Linux) tools together • Recording results A B C • Sanity checking • Running variations • Moving between clusters & schedulers 3
Empirical NLP • Plumbing: Gluing (Linux) tools together • Recording results A B C • Sanity checking • Running variations • Moving between clusters & schedulers 3
Empirical NLP X • Plumbing: Gluing (Linux) tools together • Recording results A B C • Sanity checking • Running variations • Moving between clusters & schedulers 3
Empirical NLP X X • Plumbing: Gluing (Linux) tools together • Recording results A B C • Sanity checking • Running variations • Moving between clusters & schedulers 3
Empirical NLP X X • Plumbing: Gluing (Linux) tools X together • Recording results A B C • Sanity checking • Running variations • Moving between clusters & schedulers 3
Empirical NLP X X • Plumbing: Gluing (Linux) tools X together • Recording results A B C • Sanity checking • Running variations • Moving between clusters & schedulers 3
Empirical NLP X X • Plumbing: Gluing (Linux) tools X together • Recording results A B C • Sanity checking • Running variations • Moving between clusters & schedulers 3
Empirical NLP X X • Plumbing: Gluing (Linux) tools X together • Recording results A B C • Sanity checking • Running variations X • Moving between clusters & schedulers 3
Empirical NLP X X • Plumbing: Gluing (Linux) tools X together • Recording results A B C • Sanity checking • Running variations X • Moving between clusters X & schedulers 3
Proposed Solution: HyperWorkflow Management 4
LoonyBin • Define the tools (inputs/outputs/parameters → shell commands) • Define the workflow (DAG of steps and dependencies) • Generate & run a shell script 5
LoonyBin • Define the tools (inputs/outputs/parameters → shell commands) • Define the workflow (DAG of steps and dependencies) • Generate & run a shell script 5
6
Available Tools 6
Drag and Drop Available Tools 6
Drag and Drop Available Tools 6
Drag and Drop Available Tools 6
Drag and Drop Available Tools 6
Drag and Drop Available Tools 6
Drag and Drop Tooltips for Params Available Tools 6
Drag and Drop Tooltips for Params Available Tools Machine Assignment 6
Generating a Script for A W B INPUTS OUTPUTS alignments foreignCorpus nativeCorpus PARAMETERS fertility Python Tool Descriptor 7
Generating a Script for A W B INPUTS OUTPUTS A’s output “x” alignments foreignCorpus nativeCorpus B’s output “y” PARAMETERS 0.01 fertility Parameters & dependencies from workflow Python Tool Descriptor 7
Generating a Script for A W B INPUTS OUTPUTS A’s output “x” …/outputs/wa …/inputs/f alignments foreignCorpus …/inputs/n nativeCorpus B’s output “y” LoonyBin assigns PARAMETERS paths 0.01 fertility Parameters & dependencies from workflow Python Tool Descriptor 7
Generating a Script for A W B INPUTS OUTPUTS A’s output “x” …/outputs/wa …/inputs/f alignments foreignCorpus …/inputs/n nativeCorpus B’s output “y” LoonyBin assigns PARAMETERS paths 0.01 fertility Parameters & dependencies from workflow Python Tool java edu.cmu.Tokenizer ../inputs/f Descriptor ../inputs/n > ../outputs/wa 7
So far... • Complaints about current implementation of empirical NLP experiments • Define the tools (inputs/outputs/parameters) • Define the workflow (DAG of steps and dependencies) • Generate & run a shell script 8
HyperWorkflows • HyperWorkflows: Shared substructure in experiments • Encode small variations in a HyperDAG moses Moses Phrase Table Training syntax Word Alignment Parallel Build Decode Filter Minimum Corpus Syntactic Sentences Corpus Error Translation st {syntax-st, Stanford Rate Model syntax-ch, Parser Training {st,ch} moses} ch Target {syntax-st, Build Language syntax-ch, Language Charniak Corpus moses} Model Parser 9
HyperWorkflows • HyperWorkflows: Shared substructure in experiments • Encode small variations in a HyperDAG moses Packing Moses Phrase Table Training Node syntax Word Alignment Parallel Build Decode Filter Minimum Corpus Syntactic Sentences Corpus Error Translation st {syntax-st, Stanford Rate Model syntax-ch, Parser Training {st,ch} moses} ch Target {syntax-st, Build Language syntax-ch, Language Charniak Corpus moses} Model Parser 9
HyperWorkflows • HyperWorkflows: Shared substructure in experiments • Encode small variations in a HyperDAG moses Packing Moses Phrase Table Training Node syntax Word Alignment Parallel Build Decode Filter Minimum Corpus Syntactic Sentences Corpus Error Translation st {syntax-st, Stanford Rate Model syntax-ch, Parser Training {st,ch} moses} ch Target {syntax-st, Build Language syntax-ch, Language Charniak Corpus moses} Model Parser 9
HyperWorkflows • HyperWorkflows: Shared substructure in experiments • Encode small variations in a HyperDAG moses Packing Moses Phrase Table Training Node syntax Word Alignment Parallel Build Decode Filter Minimum Corpus Syntactic Sentences Corpus Error Translation st {syntax-st, Stanford Rate Model syntax-ch, Parser Training {st,ch} moses} ch Target {syntax-st, Build Language syntax-ch, Language Charniak Realizations Corpus moses} Model Parser 9
HyperWorkflows • HyperWorkflows: Shared substructure in experiments • Encode small variations in a HyperDAG moses Packing Moses Phrase Table Training Node Don’t re-run syntax Word Alignment Parallel Build Decode Filter Minimum Corpus Syntactic Sentences Corpus Error Translation st {syntax-st, Stanford Rate Model syntax-ch, Parser Training {st,ch} moses} ch Target {syntax-st, Build Language syntax-ch, Language Charniak Realizations Corpus moses} Model Parser 9
HyperWorkflows Organized directory • HyperWorkflows: Shared substructure in structure experiments & easy- • Encode small variations in a HyperDAG to-parse moses logs Packing Moses Phrase Table Training Node Don’t re-run syntax Word Alignment Parallel Build Decode Filter Minimum Corpus Syntactic Sentences Corpus Error Translation st {syntax-st, Stanford Rate Model syntax-ch, Parser Training {st,ch} moses} ch Target {syntax-st, Build Language syntax-ch, Language Charniak Realizations Corpus moses} Model Parser 9
Multiple Machines and Schedulers Design Machine Java 10
Multiple Machines and Schedulers Manually Copy Home Bash Script Design Execution Machine Machine Java UNIX 10
Multiple Machines Remote and Schedulers Execution Machine Passwordless Manually SSH Copy Home Bash UNIX Script Design Execution Passwordless SSH Machine Machine Remote Execution Java UNIX Machine UNIX 10
Multiple Machines Bash Remote and Schedulers Execution Machine Passwordless Manually SSH Copy Home Bash UNIX Script Design Execution Passwordless SSH Machine Machine Remote Condor Execution Java UNIX Machine Sun Grid Engine UNIX 10
Other Things to Make Life Easier • Sanity checking at each step (embedded in Tool Descriptors) • Copying of files (including to HDFS) • Text-based workflow definition (in SVN) • Open-source LGPL License 11
WANTED Users & Contributors Machine Translation Toolpack (released) Corpus Processing Toolpack? Parsing Toolpack? Question Answering Toolpack? Resource Directory Toolpack? Speech Recognition Toolpack? 12
Recommend
More recommend