Mining Software Repositories Session 1 Infrastructure and - PowerPoint PPT Presentation

Feb 21, 2023 •444 likes •576 views

Mining Software Repositories Session 1 Infrastructure and extraction Discussion Leader: Daniel M. German 1 The Stages 1. Data Extraction 2. Data Mining/Facts Finding/Change Patterns/System Understanding 3. Integration and Presentation 2

Mining Software Repositories Session 1 Infrastructure and extraction Discussion Leader: Daniel M. German 1
The Stages 1. Data Extraction 2. Data Mining/Facts Finding/Change Patterns/System Understanding 3. Integration and Presentation 2
The Extraction Stage • The dirty work, but somebody has to do it • Lots of raw data out there – Usually Open Source – Difficult to gain access to Closed source data 3
The Issues • Why do we need extract historical data? • Without a purpose, this data might have no value 4
The Issues... • What to extract? ( software trails ) – Code ∗ Releases ∗ Versioning history – Defects – Documentation ∗ Explicit (man pages, help system, design documents) ∗ Implicit (email messages) ∗ Web site 5
The Issues... • From Where – What projects to select? – The software process might have an impact in the way the historical data gets recorded – It is necessary to understand this process – Different projects store data in different ways 6
The Papers • The Perils and Pitfalls of Mining SourceForge by James Howison and Kevin Crowston • Their experiences mining sourceForge • What they learnt spidering the site • Some potential mistakes in the analysis of the extracted data 7
The Papers... • Text is Software Too by Alexander Dekhtyar, Jane Huffman Hayes and Tim Menzies • Mining of textual requirements documents • “Text mining from software engineering text is a hight risk, high return adventure.” 8
The Papers... • Mining CVS Repositories, the softChange experience by Daniel German • The revision history of the source code says a lot about the project: – it highlights the process, the architecture evolution, hidden relationships between files... • The Concurrent Versions System (CVS) is a major source of historical data 9
The Papers • Research Infrastructure for Empirical Science of F/OSS by Les Gasser, Gabriel Ripoche and Robert Sandusky • Preprocessing CVS Data for Fine-Grained Analysis by Thomas Zimmerman and Peter Weissgerber 10
Discussion: the Issues, revisited • Several people are working in the same problems – Comparison? – Collaboration? (Avoid reinventing the wheel) • Nomenclature? • Choosing projects for analysis? • Sharing data? • Sharing the extractors? 11

Recommend

Mining Software Repositories What is MSR? Mining Software Repositories (MSR) uses data

Mining Software Repositories What is MSR? Mining Software Repositories (MSR) uses data available in repositories to support development activities For example, defect assignment, software validation, evolution and planning Increased

216 views • 9 slides

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

What is Web Mining? Wh t i W b Mi i What is Web Mining? Wh t i W b Mi i ? ? Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques to automat cally d scover and extract nformat on automatically

777 views • 20 slides

Working together to make ORCID work for repositories ORCID in repositories task force Open

Working together to make ORCID work for repositories ORCID in repositories task force Open Repositories 2019 | Hamburg, June 2019 Liz Krznarich, ORCID https://orcid.org/0000-0001-6622-4910 slides: https://orcid.figshare.com About this time

722 views • 40 slides

Bazel and External Repositories Which version do you get? Klaus Aehlig October 910, 2018

Bazel External Repositories Bazel and External Repositories Which version do you get? Klaus Aehlig October 910, 2018 Bazel External Repositories Imagine. . . You freshly check out your project. Bazel External Repositories Imagine. . .

714 views • 28 slides

Model-based Mining of Software Repositories Markus Scheidgen 1 Saturday, 27. September 2014

Model-based Mining of Software Repositories Markus Scheidgen 1 Saturday, 27. September 2014 Agenda Mining Software Repositories (MSR) and current approaches srcrepo a model-based MSR system srcrepo components and analysis

521 views • 30 slides

Mining Source Code^3 Mining Idioms, Usages and Edits Dario Di Nucci Research Fellow

Mining Source Code^3 Mining Idioms, Usages and Edits Dario Di Nucci Research Fellow dario.di.nucci@vub.be Mining Software Repositories 3 Software Repositories? Issue Trackers Versioning Systems Archived Communication Market Places

919 views • 78 slides

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

What is Web Mining? What is Web Mining? Web Mining Web Mining Web mining is the use of data mining techniques to automatically discover and extract information from Web documents/services (Etzioni, 1996, CACM 39(11)) Web mining aims to

571 views • 22 slides

Lecture 12 Mining Software Repositories, Part 2 Hipikat, Bugcache, Mining Social Network EE382V

Lecture 12 Mining Software Repositories, Part 2 Hipikat, Bugcache, Mining Social Network EE382V Spring 2009, Software Evolution, Instructor Miryung Kim Announcement Project Midpoint Review is coming up in two weeks. You must have

388 views • 19 slides

Boa Robert Dyer, Hoan Nguyen, Hridesh Rajan, and Tien Nguyen

Mining Ultra-Large-Scale Software Repositories with Boa Robert Dyer, Hoan Nguyen, Hridesh Rajan, and Tien Nguyen {rdyer,hoan,hridesh,tien}@iastate.edu Iowa State University Why mine software repositories? Why mine software repositories?

946 views • 59 slides

Connecting my repository to the PID Graph Kristian Garza Open Repositories 2019 @kriztean

Connecting my repository to the PID Graph Kristian Garza Open Repositories 2019 @kriztean https://doi.org/10.5438/jwvf-8a66 How can we add value to our repositories? 2 How can we add value to our repositories? breaking silos with PIDs 3

793 views • 13 slides

RCAAP Repositories RCAAP Repositories Network Network - Promoting Promoting Interoperability

RCAAP Repositories RCAAP Repositories Network Network - Promoting Promoting Interoperability Interoperability OR2019 OR2019 - Hamburg Hamburg - 10 10-06 06-2019 2019 Agenda About RCAAP About RCAAP Why we need an integrated

716 views • 36 slides

ORCID in Finland? How to take advantage of ORCID in institutional repositories, Open Repositories

ORCID in Finland? How to take advantage of ORCID in institutional repositories, Open Repositories 2014, June 9 Jyrki Ilva (jyrki.ilva@helsinki.fi) THE NATIONAL LIBRARY OF FINLAND Library Network Services Researcher identification in Finland

433 views • 10 slides

Maureen P. Walsh Open Repositories 2013 Charlottetown, PEI

Metadata Reuse Workflows and Methods for Metadata Reuse Workflows and Methods for DSpace Repositories DSpace Repositories Maureen P. Walsh Open Repositories 2013 Charlottetown, PEI

1.05k views • 46 slides

Some advice from a reproducible researcher about how some advice from research data repositories

Some advice from a reproducible researcher about how some advice from research data repositories to irreproducible researchers about reproducibility and repositories might help researchers, repositories, and reproducibility Thomas J. Leeper

838 views • 44 slides

Implementing Trusted Digital Implementing Trusted Digital Repositories Repositories Reagan W.

Implementing Trusted Digital Implementing Trusted Digital Repositories Repositories Reagan W. Moore Reagan W. Moore Richard Marciano Richard Marciano Arcot Rajasekar Rajasekar Arcot Wayne Schroeder Wayne Schroeder Mike Wan Mike Wan

696 views • 13 slides

Cement, Aggregates, Mining Presentation Cement, Aggregates and Mining Cement, Aggregates and

Cement, Aggregates, Mining Presentation Cement, Aggregates and Mining Cement, Aggregates and Mining Cement, Aggregates and Mining Cement, Aggregates and Mining Cement, Aggregates and Mining Cement, Aggregates and Mining Cement, Aggregates

925 views • 46 slides

Spend Analytics In current situation of covid 19 how spend analytics can help procurement ? Covid

Spend Analytics In current situation of covid 19 how spend analytics can help procurement ? Covid 19 Global Impacts World is closed Businesses are closed, global disruption Work from home Home office Global supply chain

963 views • 34 slides

What can we learn from law? Raphael Gellert & Niels van Dijk (VUB/LSTS) Brno, 25 November

Inroads into DPIA Methodologies What can we learn from law? Raphael Gellert & Niels van Dijk (VUB/LSTS) Brno, 25 November 2016 Data Protection Impact Assessment Where a type of processing in particular using new technologies, and

606 views • 36 slides

Statistics 498 Summer 2009 Summer Practicum in Statistics and Financial Risk Professor Peter

Statistics 498 Summer 2009 Summer Practicum in Statistics and Financial Risk Professor Peter Bloomfield email: bloomfield@stat.ncsu.edu Course home page: http://www4.stat.ncsu.edu/ bloomfld/courses/498/ 1 Topic 1: Dynamics of Credit Ratings

203 views • 16 slides

Nuclear Industry Perspectives on Waste Confidence Briefing on Waste Confidence Rulemaking March

Nuclear Industry Perspectives on Waste Confidence Briefing on Waste Confidence Rulemaking March 21, 2014 Ellen C. Ginsberg Vice President, General Counsel and Secretary Nuclear Energy Institute Background Waste Confidence Decision: - A

279 views • 11 slides

Basic techniques Text processing; term weighting; vector space model; inverted index; Web Search

Basic techniques Text processing; term weighting; vector space model; inverted index; Web Search Overview Indexes Query Indexi xing Ranki king Applica cation Results Documents User Information Query y Query analys ysis proce

768 views • 40 slides

Objectives Graphs Graph Connectivity, Traversal BFS & DFS Implementations, Analysis

1/28/19 Objectives Graphs Graph Connectivity, Traversal BFS & DFS Implementations, Analysis Jan 28, 2019 CSCI211 - Sprenkle 1 Review What is a heap? When is it useful? What is a graph? What are two ways to

418 views • 17 slides

Security Basics - Lessons From a Paranoid Stuart Larsen Yahoo! Paranoids - Pentest

Security Basics - Lessons From a Paranoid Stuart Larsen Yahoo! Paranoids - Pentest Overview Threat Modeling - Common Web Vulnerabilities - Automated Tooling - Modern Attacks - whoami Threat Modeling Analyzing the security of an

560 views • 44 slides

Levels of Testing Chapter 12 Beyond unit testing Developer Testing stages Unit testing

Levels of Testing Chapter 12 Beyond unit testing Developer Testing stages Unit testing Testing of individual components Integration testing Testing to expose problems arising from the combination of components System

177 views • 15 slides

Mining Software Repositories Session 1 Infrastructure and - PowerPoint PPT Presentation

Mining Software Repositories Session 1 Infrastructure and extraction Discussion Leader: Daniel M. German 1 The Stages 1. Data Extraction 2. Data Mining/Facts Finding/Change Patterns/System Understanding 3. Integration and Presentation 2

Mining Software Repositories What is MSR? Mining Software Repositories (MSR) uses data

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Working together to make ORCID work for repositories ORCID in repositories task force Open

Bazel and External Repositories Which version do you get? Klaus Aehlig October 910, 2018

Model-based Mining of Software Repositories Markus Scheidgen 1 Saturday, 27. September 2014

Mining Source Code^3 Mining Idioms, Usages and Edits Dario Di Nucci Research Fellow

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Lecture 12 Mining Software Repositories, Part 2 Hipikat, Bugcache, Mining Social Network EE382V

Boa Robert Dyer, Hoan Nguyen, Hridesh Rajan, and Tien Nguyen

Connecting my repository to the PID Graph Kristian Garza Open Repositories 2019 @kriztean

RCAAP Repositories RCAAP Repositories Network Network - Promoting Promoting Interoperability

ORCID in Finland? How to take advantage of ORCID in institutional repositories, Open Repositories

Maureen P. Walsh Open Repositories 2013 Charlottetown, PEI

Some advice from a reproducible researcher about how some advice from research data repositories

Implementing Trusted Digital Implementing Trusted Digital Repositories Repositories Reagan W.

Cement, Aggregates, Mining Presentation Cement, Aggregates and Mining Cement, Aggregates and

Spend Analytics In current situation of covid 19 how spend analytics can help procurement ? Covid

What can we learn from law? Raphael Gellert & Niels van Dijk (VUB/LSTS) Brno, 25 November

Statistics 498 Summer 2009 Summer Practicum in Statistics and Financial Risk Professor Peter

Nuclear Industry Perspectives on Waste Confidence Briefing on Waste Confidence Rulemaking March

Basic techniques Text processing; term weighting; vector space model; inverted index; Web Search

Objectives Graphs Graph Connectivity, Traversal BFS & DFS Implementations, Analysis

Security Basics - Lessons From a Paranoid Stuart Larsen Yahoo! Paranoids - Pentest

Levels of Testing Chapter 12 Beyond unit testing Developer Testing stages Unit testing

Sambuz

Useful Links

Newsletter

Mail Us

Mining Software Repositories Session 1 Infrastructure and - PowerPoint PPT Presentation

Mining Software Repositories Session 1 Infrastructure and extraction Discussion Leader: Daniel M. German 1 The Stages 1. Data Extraction 2. Data Mining/Facts Finding/Change Patterns/System Understanding 3. Integration and Presentation 2

Mining Software Repositories What is MSR? Mining Software Repositories (MSR) uses data

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Working together to make ORCID work for repositories ORCID in repositories task force Open

Bazel and External Repositories Which version do you get? Klaus Aehlig October 910, 2018

Model-based Mining of Software Repositories Markus Scheidgen 1 Saturday, 27. September 2014

Mining Source Code^3 Mining Idioms, Usages and Edits Dario Di Nucci Research Fellow

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Lecture 12 Mining Software Repositories, Part 2 Hipikat, Bugcache, Mining Social Network EE382V

Boa Robert Dyer, Hoan Nguyen, Hridesh Rajan, and Tien Nguyen

Connecting my repository to the PID Graph Kristian Garza Open Repositories 2019 @kriztean

RCAAP Repositories RCAAP Repositories Network Network - Promoting Promoting Interoperability

ORCID in Finland? How to take advantage of ORCID in institutional repositories, Open Repositories

Maureen P. Walsh Open Repositories 2013 Charlottetown, PEI

Some advice from a reproducible researcher about how some advice from research data repositories

Implementing Trusted Digital Implementing Trusted Digital Repositories Repositories Reagan W.

Cement, Aggregates, Mining Presentation Cement, Aggregates and Mining Cement, Aggregates and

Spend Analytics In current situation of covid 19 how spend analytics can help procurement ? Covid

What can we learn from law? Raphael Gellert &amp; Niels van Dijk (VUB/LSTS) Brno, 25 November

Statistics 498 Summer 2009 Summer Practicum in Statistics and Financial Risk Professor Peter

Nuclear Industry Perspectives on Waste Confidence Briefing on Waste Confidence Rulemaking March

Basic techniques Text processing; term weighting; vector space model; inverted index; Web Search

Objectives Graphs Graph Connectivity, Traversal BFS &amp; DFS Implementations, Analysis

Security Basics - Lessons From a Paranoid Stuart Larsen Yahoo! Paranoids - Pentest

Levels of Testing Chapter 12 Beyond unit testing Developer Testing stages Unit testing

Sambuz

Useful Links

Newsletter

Mail Us

What can we learn from law? Raphael Gellert & Niels van Dijk (VUB/LSTS) Brno, 25 November

Objectives Graphs Graph Connectivity, Traversal BFS & DFS Implementations, Analysis