Automatic Extraction of Conceptual Interoperability Constraints from - PowerPoint PPT Presentation

Automatic Extraction of Conceptual Interoperability Constraints from API Documentation “Master Thesis” Mohammed Abujayyab First supervisor: Prof. Dr. Dr. h.c. H. Dieter Rombach Second supervisor: Hadil Abukwaik, MSc . 13.04.2016

Outline – Background – Motivation scenario – Problem – Research methodology – Research part one – Research part two – Conclusion and future work Slide 2

Background Conceptual Interoperability Constraints (COINs) are the restrictions on interoperable software units and their related data elements at different conceptual levels (i.e., syntax, semantics, structure, dynamics, context, and quality) [1]. Conceptual Interoperability Constraints [1] Slide 3

Motivation scenario COINs Find 1- Not-COIN 2- Dynamic Read COINs 3- Semantic manually 4- Syntax 5- Structure 6- Context 7- Quality Software Architects / COIN class Analysts <Output> Sound Cloud API documentation <Input> Slide 4

Motivation scenario Example: https://developers.soundcloud.com/docs/api Slide 5

Problem • Time • For example, it took one of the authors more than 10 hours to only browse (reading) the documentation of the Ebay web service operation [7]. • Mental Effort • Linguistic and analytical skills • API reading experiences • Accuracy • Human analysis can be error prone • Missing COINs • Wrong COINs Slide 6

Goal & Research Questions Goal  To : support the conceptual interoperability analysis task.  For the purpose of: improvement.  With respect to : effectiveness and efficiency of detecting COINs.  From the viewpoint of : software architects and analysts.  In the context of : analyzing text in API documentation within integration projects. • RQ1: What are the observed patterns in specifying the conceptual interoperability constraints COINs in the NL text of API documentation? • RQ2: How effective and efficient would it be to use Natural Language Processing ( NLP ) along with Machine Learning ( ML ) technologies to automate the extraction of COINs from the text in API documentations? Slide 7

Idea (overview) 1 COINs Classification Corpus Manually (for each sentence) < Output > API Document Text Pattern Identification (extraction) <Input> ( keywords , sentence structure ) < Output > < Input > API Document Text <Input> + Machine Learning Natural Language Classification 2 (ML) Processing (NLP) Model Automatically COINs Classification (for each sentence) < Output > Slide 8

Research methodology Research Part One (Multiple-Case Study)  Answering RQ1 Research Part Two (Utilizing ML for Identifying The COINs)  Answering RQ2 Slide 9

Research part one (Study design) Holistic multiple-case study (Action Research ) • with literal replication of cases from different domains Holistic multiple-case study [3] Slide 10

Research part one (Study Execution) Study protocol . three main activities: 1- Case Selection 2- Case Execution 3- Cross-Case Analysis Slide 11

Research part one (Study Execution) - Case selection criteria (six cases) 1- Case Selection API type: Platform API, Web-Service API popularity 2- Case Execution. API domain: music, maps, development 3- Cross-Case Analysis. Total Document Sentence Total Total API Document number of manual filtering Classification efforts efforts sentences (Minutes) (Hours) (Hours) (Minutes) Sound Cloud 219 40 7 7.7 460 GoogleMaps 473 60 5.5 6.5 390 AppleWatch 360 60 7 8.0 480 Eclipse Plugin Dev 651 60 11 12.0 720 Skype 325 30 4 4.5 270 Instagram 253 20 4.5 4.8 290 Total 2281 270 39 43.5 2610 Slide 12

Research part one (Study Execution) 1- Case Selection • Manual Classification (Building the corpus) 2- Case Execution • Input: API document 2.1 Manual Classification • Output: COIN Corpus 2.2 Pattern Identification – Seven-COIN corpus 3- Cross-Case Analysis. – Two-COIN corpus structure not-COIN dynamic not-COIN semantic syntax COIN structure context quality Seven-COIN corpus Two-COIN corpus Example: from SoundCloud API: Semantic Our API gives you the ability to upload, manage and share sounds on the web. Slide 13

Research part one (Study Execution) 1- Case Selection • Pattern identification 2- Case Execution 2.1 Manual Classification 2.2 Pattern Identification Snapshot from GoogleMaps API documentation 3- Cross-Case Analysis. detected patterns 1 2 3 4 5 6 Sentence Conditional Technical Structure Method Input/Output explanation statement terms Terms call these web services use HTTP requests to specific URLs, requests HTTP passing URL parameters as arguments to the services. for example ,? is used within URLs to indicate the beginning for example query of the query string. when processing XML responses , you should use an nodes, appropriate query language for selecting nodes within the response when XML elements, XML document , rather than assume the elements reside at document absolute positions within the XML markup. XPath elements by default, XPath expressions match all elements. XML, this object can then process passed XML and XPath evaluate() XPath expressions using the evaluate() method. Slide 14

Research part one (Data analysis and findings) 1- Case Selection 2- Case Execution. 3- Cross-Case Analysis. COINs distribution Slide 15

Research part one (Data analysis and findings) • RQ1: – What are the observed patterns in specifying the conceptual 1- Case Selection interoperability constraints COINs in the NL text of API documentation? 2- Case Execution. • Answer: – Pattern Table 3- Cross-Case Analysis. COINs % COIN Pattern Example XML, iOS, XPath, JSON, OSGi, SDK, HTTP, Technical Not-COIN 30.7% GET, POST, etc. keywords Action Verbs create, use, request, access, plug, lock, 35.8% include, set-up, run, start ,call-up ,redirect. Dynamic Conditional if , when, once, while, as long as ,unless 24.0% statement Output/Input return, receive, display, response, send, 18.8% verbs Supporting verbs support, provide, Suggest, give, propose. 16.4% Semantic Admission verbs allow, enable, admit, grant, permit, 13.5% facilitate, authorize, prevent Slide 16

Research part one ( Thread to validity) Generalizability • We decided to include multiple cases ( six cases ) Completeness • We have selected inclusive parts of the large API documentations (e.g. in the API document of Eclipse ( 651 sentences) Researcher bias • It was replicated by another researcher. Slide 17

Research part two ( Utilizing ML for Identifying The COINs) Research Part One (Multiple-Case Study)  Answering RQ1 Research Part Two (Utilizing ML for Identifying The COINs)  Answering RQ2 Slide 18 Slide 18

Research part two ( Utilizing ML for Identifying The COINs) Feature selection (Alternatives): 1- Rule Based: using manually identified patterns 2- Bag-of-Words (BOWs) [5]: automatically . 'Process Flow' of the classification model BOWs [5 ]: is a simple technique for text classification, in this approach, each word in a sentence is considered as a feature and a document is represented as a matrix of weighted values using some kind of a weighting method such as TF-IDF (Term frequency – Inverse Document Frequency) Slide 19

Research part two ( Utilizing ML for Identifying The COINs) Explored ML Classification Algorithms: Classification Algorithm Logistic Regression Naïve Bayes Complement Naive Bayes Decision Tree (J48) Neural Network Random Forest Tree KNN, k=18 Support Vector Machine Slide 20

Research part two ( Utilizing ML for Identifying The COINs) • Configuring and running tests for the ML classification algorithms (u sing Weka 3.7.13) - K-fold Cross-Validation [4] for training and testing: k=10, 9 for training and 1 for testing for 10 rounds. Take average of the 10 rounds - Evaluate the experimental results in terms of: Precision Recall F-Measure Weka 3.7.13: Weka is a collection of machine learning algorithms for data mining tasks. Slide 21 URL: http://www.cs.waikato.ac.nz/ml/weka

Research part two (Evaluation) Answering RQ2 : Effectiveness using ML for automated COINs Identification. 1- Evaluation of the first approach “ Rule-based ” Corpus Classification Algorithm Recall Precision F-Measure Seven-COIN 47.0% 51.7% 47.6 % Logistic Regression 65.7% Two-COIN Logistic Regression 66.5% 66.1% 2- Evaluation of the first approach “ BOWs” Corpus Classification Algorithm Recall Precision F-Measure 70.0% Seven-COIN ComplementNaïveBayes 70.4% 70.2% 81.9% Two-COIN 81.9% 82.0% ComplementNaïveBayes Slide 22

Technical support Classifier Ensemble Plugin – COIN (CEP-COIN) Http Request the“COIN Class” 3 CEP-COIN tool 2 Sentence 4 <Input> 1 Web server 5 Http Response “COIN Class” COIN class < Output > Soft. Architect Slide 23

Technical support Practical using of the tool Slide 24

Automatic Extraction of Conceptual Interoperability Constraints from - PowerPoint PPT Presentation

Automatic Extraction of Conceptual Interoperability Constraints from API Documentation Master Thesis Mohammed Abujayyab First supervisor: Prof. Dr. Dr. h.c. H. Dieter Rombach Second supervisor: Hadil Abukwaik, MSc . 13.04.2016 Outline

Interoperability of retail DFS What is Interoperability? DFS interoperability models

uf: Minimizing the Coq Extraction TCB Eric Mullen , Stuart Pernsteiner, James Wilcox, Zachary

Interoperability in Ten Minutes What is interoperability? How do we achieve interoperability How

Automatic Extraction From Automatic Extraction From and Reasoning About and Reasoning About

Automatic text classification and extraction of Automatic text classification and extraction of

The Path to Toll Interoperability Report from the IBTTA Interoperability (IOP) Committee The

Soil Extraction Cell: An Alternative Soil Extraction Cell: An Alternative Method of Soil

Automatic Verification of Automatic Verification of Automatic Verification of Automatic

Automatic Wrapper Generation and Data Extraction Kristina Lerman University of Southern

Strong conceptual completeness for Boolean Applications of strong conceptual coherent

Interoperability Doug Fridsma, MD PhD President & CEO, AMIA Getting to Interoperability

CCICI Cloud Standards and Interoperability Dr. Dinkar Sitaram PES University Agenda Why

Lack of interoperability costs the translation industry a fortune What is interoperability Do

Declarative Information Extraction Declarative Information Extraction Using Datalog Datalog with

Automatic Collocation Extraction from Text Corpora Pavel Pecina Ustav form aln a

Automatic Enrollment and Automatic IRAs David C. John The Heritage Foundation The Retirement

A short introduction to L A T EX and its importance Damodar Rajbhandari Out-reach Blogger at

1. Demonstrate ways the assessment community can use big data, real-time assessment tools to

Situated Interaction (InSitu) Head: Wendy Mackay 27-28 Nov. 2013 Comit

Schematron Based Semantic Constraints Specification Framework & Validation Rules Engine for

Romanian Mobile Innovation Romania, Hall 2.1., Booth D34 Romanian Mobile Innovation Hall 2.1.,

PERFECT INSPECTION MADE SIMPLE cobago S.I.X. Intelligent app based platform for precise

Intr troducti tion to NLP P an and T Text Min xt Minin ing Tutor: R Rahm ahmad ad Mahen

coreference resolution beneficial to NLP applications? 2. Do we know how to evaluate anaphora

Sambuz

Useful Links

Newsletter

Mail Us

Automatic Extraction of Conceptual Interoperability Constraints from - PowerPoint PPT Presentation

Automatic Extraction of Conceptual Interoperability Constraints from API Documentation Master Thesis Mohammed Abujayyab First supervisor: Prof. Dr. Dr. h.c. H. Dieter Rombach Second supervisor: Hadil Abukwaik, MSc . 13.04.2016 Outline

Interoperability of retail DFS What is Interoperability? DFS interoperability models

uf: Minimizing the Coq Extraction TCB Eric Mullen , Stuart Pernsteiner, James Wilcox, Zachary

Interoperability in Ten Minutes What is interoperability? How do we achieve interoperability How

Automatic Extraction From Automatic Extraction From and Reasoning About and Reasoning About

Automatic text classification and extraction of Automatic text classification and extraction of

The Path to Toll Interoperability Report from the IBTTA Interoperability (IOP) Committee The

Soil Extraction Cell: An Alternative Soil Extraction Cell: An Alternative Method of Soil

Automatic Verification of Automatic Verification of Automatic Verification of Automatic

Automatic Wrapper Generation and Data Extraction Kristina Lerman University of Southern

Strong conceptual completeness for Boolean Applications of strong conceptual coherent

Interoperability Doug Fridsma, MD PhD President &amp; CEO, AMIA Getting to Interoperability

CCICI Cloud Standards and Interoperability Dr. Dinkar Sitaram PES University Agenda Why

Lack of interoperability costs the translation industry a fortune What is interoperability Do

Declarative Information Extraction Declarative Information Extraction Using Datalog Datalog with

Automatic Collocation Extraction from Text Corpora Pavel Pecina Ustav form aln a

Automatic Enrollment and Automatic IRAs David C. John The Heritage Foundation The Retirement

A short introduction to L A T EX and its importance Damodar Rajbhandari Out-reach Blogger at

1. Demonstrate ways the assessment community can use big data, real-time assessment tools to

Situated Interaction (InSitu) Head: Wendy Mackay 27-28 Nov. 2013 Comit

Schematron Based Semantic Constraints Specification Framework &amp; Validation Rules Engine for

Romanian Mobile Innovation Romania, Hall 2.1., Booth D34 Romanian Mobile Innovation Hall 2.1.,

PERFECT INSPECTION MADE SIMPLE cobago S.I.X. Intelligent app based platform for precise

Intr troducti tion to NLP P an and T Text Min xt Minin ing Tutor: R Rahm ahmad ad Mahen

coreference resolution beneficial to NLP applications? 2. Do we know how to evaluate anaphora

Sambuz

Useful Links

Newsletter

Mail Us

Interoperability Doug Fridsma, MD PhD President & CEO, AMIA Getting to Interoperability

Schematron Based Semantic Constraints Specification Framework & Validation Rules Engine for