Technical Aspects of the Paper: Improving Code Readability Models - PowerPoint PPT Presentation

Technical Aspects of the Paper: Improving Code Readability Models with Textual Features Deeksha Arya COMP762

Key Concepts ´ Previous work as mentioned in the paper: ´ QALP tool (to compute similarity between comment and code) ´ Entropy ´ Halstead’s volume metric ´ Area Under the Curve (AUC) ´ Concepts used in paper’s experiments: ´ Center selection (used to get 200 representative code snippets) ´ Cronbach-alpha (to evaluate agreement between participants regarding readability value) ´ Logistic Regression with wrapper strategy (binary classification algorithm) ´ Wilcoxon test ´ Cliff’s Delta

QALP Score (Quality Assessment using Language Processing) Measures correlation between the natural language used in a program code ´ (mainly identifiers) w.r.t its documentation (in this case its comments), hence identifies well-documented code Pre-processing involves: ´ Using stop-words (custom defined for the code – includes keywords, library function and ´ predefined variable names) Stemming (elimination of word suffixes) ´ Atomic splitting of identifiers from code (splits compound identifiers into multiple atomic ´ terms using a lex-based scanner based on an island grammar) Weight the words using tf-idf (high weight to terms which occur more than average in ´ document but are rarer in entire collection) Considers each word as a separate dimension in an n -dimensional vector space – ´ vectorizes comments and code separately Calculates cosine similarity between comment and code tokens ´ Greater QALP score indicates both document models in question describe ´ concepts using the same vocabulary Ref: Increasing diversity: Natural language measures for software fault prediction

Entropy ´ Measures the complexity, the degree of disorder, or the amount of information in a data set ´ Let x i be a term in document X, and p(x i ) is the ratio of the count of occurrences of x i to the total number of words in the document. Then H(X) is the entropy and is given by: ´ Highest entropy indicates uniform distributions and lower entropy indicates highly skewed distributions Ref: A Simpler Model of Software Readability

Halstead’s Volume ´ Similar to the idea of entropy ´ Represents the minimum number of bits needed to naively represent the program, or the number of mental comparisons to write the program ´ Program length N = total number of operators + total number of operands ´ Program vocabulary n = number of distinct operators + number of distinct operands ´ Halstead Volume: V = Nlog 2 n ´ Greater volume indicates greater complexity Ref: A Simpler Model of Software Readability

AUC – Area Under the (ROC) Curve ´ Receiver Operating Characteristic (ROC) curve ´ True Positive Rate (Sensitivity): TP/(TP+FN) ´ False Positive Rate (1-Specificity): FP/(FP+TN) ´ ROC curve plotted by varying discrimination threshold ´ All such curves pass through (0,0) and (1,1) ´ Point (0, 1) represents perfect classification and points on the ROC curve close to (0, 1) represent good classifiers. Ref: A Simpler Model of Software Readability

Binary Classification with Logistic Regression ´ Supervised learning algorithm ´ Binary classification: “Not Readable”(0), “Readable”(1) ´ Takes real-valued inputs of some dimension n and predicts the probability of the input belonging to the default class(1). If probability > 0.5, predicted class is 1, else 0. ´ Probability = sigmoid(output) = sigmoid( ⍬ 0 + ⍬ 1 *x 1 + ⍬ 2 *x 2 + … + ⍬ n *x n )

Training with Logistic Regression ´ Training involves finding the gradient of the error and updating the coefficient vector ⍬ to better represent the model and improve accuracy over a number of iterations Gradient Descent Step: Here, m = total number of training examples h ⍬ (x) = predicted output y = actual labelled output ⍺ = learning rate ´ When an optimal set of coefficients is found, the model is then used to predict the class of previously unseen datapoints

Overfitting To reduce overfitting: Reduce number of features used to model ´ data

Feature Selection using Wrapper Method ´ Create all possible subsets of size k from feature vector ´ k is determined via cross-validation ´ Perform classification on each subset of features ´ Feature subset upon which classification with highest accuracy is obtained is chosen as best feature representation Ref: Large Scale Attribute Selection Using Wrappers

Center Selection Used to select 200 most representative methods for evaluation ´ Continuously draw an edge between closest pair of points based ´ on distance In this case Euclidean distance – square root of the difference ´ between the squares of the vector components Do not create edges between two components which are ´ already in the same cluster -> hence single-linked clusters Once there are k-connected components, stop the procedure ´ Ref: Algorithm design by J. Kleinberg and E. Tardos

Cronbach-alpha Measures reliability – how well a test measures what it should ´ Measure of how closely items within a group are related ´ Used to measure level of agreement of annotators on what ´ readable code is Can be written as a function of the number of items and the ´ average inter-correlation among the items N: number of items c-bar: average inter-item covariance v-bar: variance Ref: https://stats.idre.ucla.edu/spss/faq/what-does-cronbachs-alpha-mean

Wilcoxon Test Used when comparing two related values, matched values, or repeated ´ measurements on a single value to assess whether their mean ranks differ Used to determine if classification accuracy of proposed model is significantly ´ different from other models Algorithm: ´ ´ Find the difference between each pair of values ´ Rank the absolute value of these differences, ignoring any “0” differences. Give the lowest rank to the smallest absolute difference-score. If two or more difference-scores are the same, this is a "tie": tied scores get the average of the ranks that those scores would have obtained, had they been different from each other. ´ Apply the negative sign to ranks for negative differences and add together all the rank scores – this is called the critical value ´ N = number of non-0 differences ´ Compare with Wilcoxon chart and check critical value with alpha = 0.05 and N ´ If Wilcoxon chart value < critical value then data is similar, if it is more, data is highly different Ref: https://en.wikipedia.org/wiki/Wilcoxon_signed-rank_test

Critical value: |W| = |1.5+1.5-3-4-5-6+7+8+9| = 9 W c (alpha=0.05, N=9) = 6 Since W c < |W|, the two datasets are dissimilar Ref: https://en.wikipedia.org/wiki/Wilcoxon_signed-rank_test

Cliff’s Delta Measure of how often the values in one distribution are larger ´ than the values in a second distribution. Used to perform pairwise comparison between all-features model ´ and other models. In this expression, x 1 and x 2 are scores within a group 1 and group ´ 2, and n 1 and n 2 are the sizes of the sample groups respectively. Ranges from 1 when all values from one group are higher than ´ the values from the other group, to -1 when the reverse is true. Completely overlapping distributions have a Cliff’s delta of 0. Ref: http://www.scielo.org.co/scielo.php?script=sci_arttext&pid=S1657-92672011000200018

P-value ´ The p -value is defined as the probability of obtaining a result on a null hypothesis (H 0 ) equal to or more extreme than what was actually observed. ´ Null hypothesis is a prediction of no-difference, that is, for example “Is there a significant difference if we add a particular feature to the input set to determine readability?” ´ The smaller the p -value, the higher the significance because it tells the investigator that the null hypothesis under consideration may not adequately explain the observation. ´ The hypothesis is rejected if any of these probabilities is less than or equal to a pre-defined threshold value ⍺ which is referred to as the level of significance. Ref: https://www.statsdirect.com/help/basics/p_values.htm

Technical Aspects of the Paper: Improving Code Readability Models - PowerPoint PPT Presentation

Technical Aspects of the Paper: Improving Code Readability Models with Textual Features Deeksha Arya COMP762 Key Concepts Previous work as mentioned in the paper: QALP tool (to compute similarity between comment and code) Entropy

Readability: a one-hundred-year-old field still in his teens Thomas Franois CENTAL (IL&C),

Exploring Measures of Readability for Spoken Language Introduction Analyzing linguistic

On the Readability of Boundary Labeling Lukas Barth, Andreas Gemsa, Benjamin Niedermann, Martin

Readability Assessment for Sentences Introduction Motivation, Methods and Evaluation Background

PAPER PROJECT 1 SOURCE: http://www.printhaus.es/diferencias-entre-papel/ PAPER PROJECT 1: TYPES

PAPER PROJECT 3 SOURCE: http://www.printhaus.es/diferencias-entre-papel/ PAPER PROJECT 3: TYPES

F.Maraninchi 2 Aspects and Reactive Systems Switch to full screen F.Maraninchi 0 Aspects and

Code Generation Machine code generation cs4713 1 Machine code generation machine Intermediate

{Sequential Code} {Sequential Code} {Sequential Code} {Sequential Code} {Sequential Code}

Improving Improving Finances, Finances, Improving Improving Lives Lives www.jeanchatzky.com

Geotechnical Aspects of Borders Railway Mine Infill Grouting 1 May 15 Geotechnical Aspects of

READABILITY OF SENIOR HIGH SCHOOL (SHS) ENGLISH AND SOCIAL STUDIES TEXTBOOKS IN GHANA:

80% of Code Red 2 Code Red 2 re-re- Code Red 1 and Code Red 2 Code Red 2 re- cleaned up

Selection Sort Section 10.2 Code for Selection Sort (cont.) Code for an Array Sort Code for an

in practice source code source code javac scalac groovyc jrubyc 0xCAFEBABE byte code

Pennine Acute Hospitals NHS Trust: Improvement Journey 1 Pennine Improvement Plan Improving

1 Annoucnements Homework 1 due Wednesday 2 Survey of Computer Science Decision Trees,

CSE 142 Chapter 4 Programming I l Read Sections 4.14.5, 4.74.9 l The book assumes that

Review Relational, equality, and logical expressions evaluate to int values 1 (true) or 0

CSE217 INTRODUCTION TO DATA SCIENCE LECTURE 2: EXPLORATORY DATA ANALYSIS Spring 2019 Marion

Content / Mastery of Basic Concepts Place Values of Whole

Natural Language Generation . .. . . .. .. . .. . . . .. . . .. . . .. . . ..

A Monte Carlo approach to a divergence minimization problem (work in progress) IGAIA IV, June

M ODELLING AND A NALYSIS OF B IOCHEMICAL N ETWORKS WITH T IME P ETRI N ETS Louchka Popova-Zeugmann

Technical Aspects of the Paper: Improving Code Readability Models - PowerPoint PPT Presentation

Technical Aspects of the Paper: Improving Code Readability Models with Textual Features Deeksha Arya COMP762 Key Concepts Previous work as mentioned in the paper: QALP tool (to compute similarity between comment and code) Entropy

Readability: a one-hundred-year-old field still in his teens Thomas Franois CENTAL (IL&amp;C),

Exploring Measures of Readability for Spoken Language Introduction Analyzing linguistic

On the Readability of Boundary Labeling Lukas Barth, Andreas Gemsa, Benjamin Niedermann, Martin

Readability Assessment for Sentences Introduction Motivation, Methods and Evaluation Background

PAPER PROJECT 1 SOURCE: http://www.printhaus.es/diferencias-entre-papel/ PAPER PROJECT 1: TYPES

PAPER PROJECT 3 SOURCE: http://www.printhaus.es/diferencias-entre-papel/ PAPER PROJECT 3: TYPES

F.Maraninchi 2 Aspects and Reactive Systems Switch to full screen F.Maraninchi 0 Aspects and

Code Generation Machine code generation cs4713 1 Machine code generation machine Intermediate

{Sequential Code} {Sequential Code} {Sequential Code} {Sequential Code} {Sequential Code}

Improving Improving Finances, Finances, Improving Improving Lives Lives www.jeanchatzky.com

Geotechnical Aspects of Borders Railway Mine Infill Grouting 1 May 15 Geotechnical Aspects of

READABILITY OF SENIOR HIGH SCHOOL (SHS) ENGLISH AND SOCIAL STUDIES TEXTBOOKS IN GHANA:

80% of Code Red 2 Code Red 2 re-re- Code Red 1 and Code Red 2 Code Red 2 re- cleaned up

Selection Sort Section 10.2 Code for Selection Sort (cont.) Code for an Array Sort Code for an

in practice source code source code javac scalac groovyc jrubyc 0xCAFEBABE byte code

Pennine Acute Hospitals NHS Trust: Improvement Journey 1 Pennine Improvement Plan Improving

1 Annoucnements Homework 1 due Wednesday 2 Survey of Computer Science Decision Trees,

CSE 142 Chapter 4 Programming I l Read Sections 4.14.5, 4.74.9 l The book assumes that

Review Relational, equality, and logical expressions evaluate to int values 1 (true) or 0

CSE217 INTRODUCTION TO DATA SCIENCE LECTURE 2: EXPLORATORY DATA ANALYSIS Spring 2019 Marion

Content / Mastery of Basic Concepts Place Values of Whole

Natural Language Generation . .. . . .. .. . .. . . . .. . . .. . . .. . . ..

A Monte Carlo approach to a divergence minimization problem (work in progress) IGAIA IV, June

M ODELLING AND A NALYSIS OF B IOCHEMICAL N ETWORKS WITH T IME P ETRI N ETS Louchka Popova-Zeugmann

Readability: a one-hundred-year-old field still in his teens Thomas Franois CENTAL (IL&C),