Challenges in Mining Whole Software Universe Katsuro Inoue Osaka University Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Analyzing Evolution of kern_malloc 1 Lites 1.0 28-33 Kame (G) (G) 1 1 2 3 4 19 Kernel Source 65 2 Archive - CMU Mach 34-36 SimOS 3.0 (K) (K) 13 15 24 36 58 62 3 Lites 1.1.u3 27-46 Kame (G) (G) 11 4 Lites 1.1-950808 47 Netnice (G) (G) 0.9 The Rio (RAM I/O) 5 48 Kame Project (K) (G) 7 10 ftp in The University 6 49-50 Psumip of Edinburgh (G) (G) 5 6 9 7 Mip-summer98 51 Netnice 27 (G) (G) 0.8 8 freeBSD/SPARC 52 Reflexprotocol (G) (G) 8 9-12 ftp in Stockholm 37-46 66 53 Netnice 52 53 26 University (G) (G) 59 61 12 14 25 34 47 49 13 freeBSD-cam2.1.5R 54 NetBSD v1.105 Cover Ratio (G) (K) 18 28-33 16 48 50 60 17 35 14-15 SonicOSX 55 OpenBSD PV Xen 0.7 (K) (G) 56 Labyrinth 20-23 16 56 OpenBSD v1.73 51 63 55 BSD(labyrinthos) (K) (K) 17 Oskit 57 Pmon (G) (G) 54 Proyecto A.T.L.D. 67 18 Psumip 58-62 GNU/hurd(extremeli 57 64 0.6 nux) (G) (K) 19 Mach 63 OpenBSD v1.74 (G) (K) 20-22 Savannah 64 Pmon (G) (G) Unofficial OSKit 23 65 774 source (K) (G) 0.5 24-26 Unofficial OSKit 66 Chord-ns3 source(oskit) (K) (G) ftp in Stockholm Openbsd-loongson- 27 67 University (G) vc (G) Results by G(Google Code Search) 0.4 and K(Koders) 1993/01/31 1995/10/28 1998/07/24 2001/04/19 2004/01/14 2006/10/10 2009/07/06 2012/04/01 Last modified time : File in New BSD License : File in Original BSD License Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 2
Analyzing Reuse of Outdated Libraries Vulnerability of 50 OSS Projects Using libpng 4 3 2 1 0 v1.2.7 v1.2.8 v1.2.5 v1.2.22 v1.2.33 v1.2.35 v1.2.37 v1.2.42 v1.2.43 v1.4.2 v1.2.44 v1.4.6beta06 v1.5.1 v1.5.4 v1.5.7 v1.2.46 v1.4.8 v1.2.49 v1.5.9 v1.5.10 v1.5.12 v1.5.13 v1.0.11 v1.2.1 v1.2.12 v1.2.16 v1.2.21 v1.2.23 v1.2.24 v1.2.27 v1.2.29 v1.2.32 v1.2.34 v1.2.39 v1.2.40 v1.4.1 v1.4.4 Vulnerabilities reported No defects reported Result from Google Code Search and Koders 3 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Experience and Concern Mining source code repositories, e.g., SourceForge, Github, Open Hub, Google Code, Marven, ... (BlackDuck) – Outcomes heavily depend on repository contents – Aren't we mining a small world? – There may be many other source code contents in the universe Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Whole Software Universe 𝑉 • Whole Software Universe 𝑉 ≡ 𝐷𝑝𝑚𝑚𝑓𝑑𝑢𝑗𝑝𝑜 𝑝𝑔 𝐵𝑚𝑚 𝑇𝑝𝑔𝑢𝑥𝑏𝑠𝑓 𝐸𝑓𝑤𝑓𝑚𝑝𝑞𝑓𝑒 𝑐𝑧 𝐼𝑣𝑛𝑏𝑜 𝑗𝑜 𝑢ℎ𝑓 𝑄𝑏𝑡𝑢 – Open source software – Personally-developed software – Proprietary software ... any others • 𝑄 : Set of all meaningful software (a countable infinite set) • 𝑉 ⊆ 𝑄 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Questions for 𝑉 A) How do we get 𝑉 ? ? B) What do we mine from 𝑉 ? C) How do we mine 𝑉 ? D) Why do we mine 𝑉 ? Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
A) How Do We Get 𝑉 ? • No one knows actual 𝑉 • So we would collect many repositories, and construct a subset 𝑉 ′ ⊆ 𝑉 • 𝑉 ′ should be as large as possible, of course • 𝑉 ′ should reflect characteristics of 𝑉 • Challenges – Collecting and unifying different repositories into 𝑉 ′ • Duplication, coherence, ... – Performance and capacity for 𝑉 ′ – Updating and maintaining 𝑉 ′ Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
B) What Do We Mine from 𝑉 ? Examples • Simple metrics of 𝑉 over history – Size 𝑉 𝑢1 ,|𝑉| 𝑢2 ,… – Language usage … • Density of 𝑉 with respect to 𝑄 • History and evolution of code 𝑑 in 𝑉 – Origin version of 𝑑 – Closely related code 𝑑′ (clone, variation, family, ...) – Future prediction for 𝑑 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
C) How Do We Mine 𝑉 (𝑉 ′ ) ? 1. Direct mining – Good model – Powerful machine 2. Indirect mining – Use external services – Reconstruct mining result from those external services Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Direct Mining 𝑽′ Copy of 𝑽′ 𝑽 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Indirect Mining Want to know about 𝑽′ Mashup 𝑽′ Engine Query Decomposition and Result Composition 𝑽 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
D) Why Do We Mine 𝑉 ? Objectives of mining 𝑉 • Reuse and knowledge transfer – We do not want to reinvent the wheel • Historical Archive – Frontier's wisdom ... Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Discussion! • Is it interesting research topics? • Can we get useful research results? • Is it feasible research target? Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Thank you Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Recommend
More recommend