from natural language specifications to program input
play

From Natural Language Specifications to Program Input Parsers Tao - PowerPoint PPT Presentation

From Natural Language Specifications to Program Input Parsers Tao Lei , Fan Long, Regina Barzilay, Martin Rinard CSAIL, MIT 1 Translating Natural Language to Input Parser Input Parser: Input Specification: Defines the format of input data


  1. From Natural Language Specifications to Program Input Parsers Tao Lei , Fan Long, Regina Barzilay, Martin Rinard CSAIL, MIT 1

  2. Translating Natural Language to Input Parser Input Parser: Input Specification: Defines the format of input data Part of a program that reads and stores data int n, r, x[], y[]; - The input starts with a line containing two integers n and r. Scanner scanner = new Scanner( new File( “input.txt” )); - This is followed by n lines, n = scanner.nextInt(); each containing two integers xi, r = scanner.nextInt(); yi, giving the coordinates of the x = new int [n]; polygon vertices. y = new int [n]; for ( int i = 0; i < n; i++) { x[i] = scanner.nextInt(); y[i] = scanner.nextInt(); } Two Input Examples: 4 10 3 6 -8 2 0 4 8 14 0 0 0 14 5 1 0 6 2

  3. Translating Natural Language to Input Parser Input Parser: Input Specification: Defines the format of input data Part of a program that reads and stores data int n, r, x[], y[]; - The input starts with a line containing two integers n and r. Scanner scanner = new Scanner( new File( “input.txt” )); - This is followed by n lines, n = scanner.nextInt(); each containing two integers xi, r = scanner.nextInt(); yi, giving the coordinates of the x = new int [n]; polygon vertices. y = new int [n]; for ( int i = 0; i < n; i++) { x[i] = scanner.nextInt(); y[i] = scanner.nextInt(); } Two Input Examples: 4 10 3 6 -8 2 0 4 8 14 0 0 Goal: generating input parser by reading natural language 0 14 5 1 0 6 3

  4. Motivation • Reading and processing data is a common task • Writing input parsers is mechanical, tedious and time-consuming MST dependency POS tagger data format data format This DT John ate an apple NN VB DT NN is VBZ a DT SUBJ ROOT MOD OBJ 2 0 4 2 short JJ sentence NN CONLL dependency . . The dog barks data format DT NN VB So RB MOD SUBJ ROOT 1 Cathy Cathy N N … 2 su is VBZ 2 3 0 2 zag zie V V … 0 ROOT this DT 3 hen hen Pron Pron … 2 obj1 4 wild wild Adj Adj … 5 mod 5 zwaaien zwaai N N … 2 vc 6 . . Punc Punc … 5 punct … 4

  5. Motivation • Reading and processing data is a common task • Writing input parsers is mechanical, tedious and time-consuming Input Specification: Input Example: 10 “The input is one integer abc xyz uvw followed by a list of strings.” efg … Parser Generator Allows natural language as (our model) the interface to specify input Input Parser (in C++, Java, …) 5

  6. Motivation • Reading and processing data is a common task • Writing input parsers is mechanical, tedious and time-consuming Input Specification: Input Example: 10 “The input is one integer abc xyz uvw followed by a list of strings.” efg … Parser Generator Allows natural language as (our model) the interface to specify input Input Parser (in C++, Java, …) Advantage: reducing programming effort and the chance of making code mistakes 6

  7. How to Translate NL to Input Parser? • Need an abstraction that connects NL and input parser Input Parser: Input Specification: sentence = [ ]; with open( “input . txt” ) as fin: The input consists of multiple sentences. line = fin.readline().strip(); while line: ? • The first line of each sentence is the list of if line != “” : words in the sentence; word = line.split(); • The second line of each sentence contains pos = fin.readline().split(); the POS tokens; label = fin.readline().split(); • parent = fin.readline().split(); The third line are dependency labels; parent = [ int (x) for x in parent ]; • The last line are integers representing the positions of each word’s parent. sentence.append( (word, pos, label, parent) ); line = fin.readline().strip(); 7

  8. How to Translate NL to Input Parser? • Need an abstraction that connects NL and input parser Input Example: John ate an apple NN VB DT NN SUBJ ROOT MOD OBJ 2 0 4 2 The dog barks DT NN VB MOD SUBJ ROOT 2 3 0 … 8

  9. How to Translate NL to Input Parser? • Need an abstraction that connects NL and input parser Input Example: Input John ate an apple NN VB DT NN SUBJ ROOT MOD OBJ 2 0 4 2 The dog barks DT NN VB MOD SUBJ ROOT 2 3 0 … 9

  10. How to Translate NL to Input Parser? • Need an abstraction that connects NL and input parser Input Example: Input John ate an apple NN VB DT NN SUBJ ROOT MOD OBJ Sentences 2 0 4 2 The dog barks DT NN VB MOD SUBJ ROOT 2 3 0 … 10

  11. How to Translate NL to Input Parser? • Need an abstraction that connects NL and input parser Input Example: Input John ate an apple NN VB DT NN SUBJ ROOT MOD OBJ Sentences 2 0 4 2 The dog barks Words DT NN VB MOD SUBJ ROOT 2 3 0 … 11

  12. How to Translate NL to Input Parser? • Need an abstraction that connects NL and input parser Input Example: Input John ate an apple NN VB DT NN SUBJ ROOT MOD OBJ Sentences 2 0 4 2 POS The dog barks Words Tokens DT NN VB MOD SUBJ ROOT 2 3 0 … 12

  13. How to Translate NL to Input Parser? • Need an abstraction that connects NL and input parser Input Example: Input John ate an apple NN VB DT NN SUBJ ROOT MOD OBJ Sentences 2 0 4 2 POS The dog barks Words Labels Tokens DT NN VB MOD SUBJ ROOT 2 3 0 … 13

  14. How to Translate NL to Input Parser? • Need an abstraction that connects NL and input parser Input Example: Input John ate an apple NN VB DT NN SUBJ ROOT MOD OBJ Sentences 2 0 4 2 POS Position The dog barks Words Labels Tokens Integers DT NN VB MOD SUBJ ROOT 2 3 0 … 14

  15. How to Translate NL to Input Parser? • Need an abstraction that connects NL and input parser • Specification tree of nested input formats Input Example: Input John ate an apple NN VB DT NN SUBJ ROOT MOD OBJ Sentences 2 0 4 2 POS Position The dog barks Words Labels Tokens Integers DT NN VB MOD SUBJ ROOT 2 3 0 Specification Tree … 15

  16. How to Translate NL to Input Parser? • Need an abstraction that connects NL and input parser • Specification tree of nested input formats Input Specification Specification Tree The input parser is deterministically  generated from the specification tree . Input Parser 16

  17. How to Translate NL to Input Parser? • Need an abstraction that connects NL and input parser • Specification tree of nested input formats Input Specification Specification Tree The input parser is deterministically  generated from the specification tree . Input Parser Focus: translating input specifications into specification trees 17

  18. How to Translate NL to Specification Tree? Input Specification Specification tree is a dependency tree  over noun phrases in the NL specification. Specification Tree Input Specification: Input The input consists of multiple sentences. • The first line of each parse is the list of Sentences words in the sentence; • The second line of each parse contains the POS tokens; POS Position • Words Labels The third line are dependency labels; Tokens Integers • The last line are integers representing the positions of each word’s parent. Task: translation as an NLP problem 18

  19. Learning Scenario Input N input specifications The input consists of a single test case. A 𝒙 = 𝑥 1 ,… , 𝑥 𝑂 test case consists of two lines. The first line contains an integer n indicating the number of molecule types. The second line contains n eight-character strings, each describing a single type of molecule, separated by single spaces. Each string consists of four two-character connector labels Input Example: Input Example: some input examples Input Example: 3 3 A+00A+A+ 00B+D+A- B-C+00C+ 3 for each specification A+00A+A+ 00B+D+A- B-C+00C+ A+00A+A+ 00B+D+A- B-C+00C+ No human annotation specification trees 𝒖 ~ 𝑄 𝒖 𝒙 𝒖 = 𝑢 1 ,… , 𝑢 𝑂 corresponding input parsers 19

  20. Learning Scenario Input N input specifications The input consists of a single test case. A 𝒙 = 𝑥 1 ,… , 𝑥 𝑂 test case consists of two lines. The first line contains an integer n indicating the number of molecule types. The second line contains n eight-character strings, each describing a single type of molecule, separated by single spaces. Each string consists of four two-character connector labels Input Example: Input Example: some input examples Input Example: 3 3 A+00A+A+ 00B+D+A- B-C+00C+ 3 for each specification A+00A+A+ 00B+D+A- B-C+00C+ A+00A+A+ 00B+D+A- B-C+00C+ No human annotation specification trees 𝒖 ~ 𝑄 𝒖 𝒙 𝒖 = 𝑢 1 ,… , 𝑢 𝑂 Idea : learning from feedback -- testing input parser on input examples corresponding input parsers 20

  21. Key Intuitions a correct tree should read all • Necessary but NOT sufficient condition • input examples successfully False-positive parsers a list of integers? 5 -8 a list of integer pairs? 8 0 0 … -8 a list of strings? Input Example Possible Interpretations Many input parsers can read the same input 21

Recommend


More recommend