overview
play

Overview Evolution in Open Source Software: What is software - PDF document

Overview Evolution in Open Source Software: What is software evolution? A Case Study Why should we care? Previous research Michael W. Godfrey A case study: The Linux OS kernel Qiang Tu Observations, hypotheses, and future


  1. Overview Evolution in Open Source Software: � What is software evolution? A Case Study � Why should we care? � Previous research Michael W. Godfrey � A case study: The Linux OS kernel Qiang Tu � Observations, hypotheses, and future research Software Architecture Group University of Waterloo What is software evolution? Previous research � Lehman’s laws “ Evolut ion is what happens � Parnas on software geriatrics while you’re busy � Eick et al. on code decay (10 MLOC telecom) making ot her plans.” � Gall et al. (10 MLOC telecom) � Usually, we consider evolution to begin once the first � Munro, Burd et al. (2 MLOC gcc ) version has been delivered: � Maintenance is the planned set of tasks to effect changes. � Evolution is what actually happens to the software. Lehman’s Laws in a nutshell Lehman’s examples � Observations: � (Most) useful software must evolve or die. � As a software system gets bigger, its resulting complexity tends to limit its ability to grow. � Development progress/effort is (more or less) constant; growth is at best constant. � Advice: � Need to manage complexity. � Do periodic redesigns. � Treat software and its development process as a feedback system (and not as a passive theorem). 1

  2. A case study in evolution: A case study in evolution: The Linux OS kernel The Linux OS kernel � It’s Linux! � Large system, very stable, many releases over several years, many developers � Growing mainstream adoption � Open source development model � Interesting phenomenon in itself � Easy to track, can publish results, many experts � Not much previous study Linux background Methodology � Examined 96 versions of Linux kernel � Linux kernel v1.0 released March 1994 � 34 of the 67 stable releases � 487 source files, 165 KLOC, i386 only � 62 of the 369 development releases � Linux kernel v2.3.39 released January 2000 � All measures considered only .c/.h files contained in the tarball � 4854 source files, 2.2 MLOC, 10 hardware � Counted LOC using “ wc –l ” and an awk script that ignored architectures supported, over 300 developers comments and blank lines credited � Counted # of fcns/vars/macros using ctags � Maintained along two parallel paths: � Architectural model (SSs hierarchy) based on default directory structure � development and stable � We plotted growth against calendar time � Lehman suggests plotting growth against release number Growth of compressed tar file Growth of # of source files 6000 20,000,000 18,000,000 5000 Development releases (1.1, 1.3, 2.1, 2.3) Development releases (1.1, 1.3, 2.1, 2.3) 16,000,000 # of source code files (*.[ch] ) Stable releases (1.0, 1.2, 2.0, 2.2) Stable releases (1.0, 1.2, 2.0, 2.2) 14,000,000 4000 Size in bytes 12,000,000 3000 10,000,000 8,000,000 2000 6,000,000 4,000,000 1000 2,000,000 0 0 Jan 1993 Jun 1994 Oct 1995 Mar 1997 Jul 1998 Dec 1999 Apr 2001 Jan 1993 Jun 1994 Oct 1995 Mar 1997 Jul 1998 Dec 1999 Apr 2001 2

  3. Growth of # of global fcns, variables, and macros Growth of Lines of Code (LOC) 140,000 2,500,000 Total LOC ("wc -l") -- development releases 120,000 Total LOC ("wc -l") -- stable releases # of global fcns, variables, and macros Development releases (1.1, 1.3, 2.1, 2.3) 2,000,000 Total LOC uncommented -- development releases Stable releases (1.0, 1.2, 2.0, 2.2) 100,000 Total LOC uncommented -- stable releases 1,500,000 Total LOC 80,000 60,000 1,000,000 40,000 500,000 20,000 0 0 Jan 1993 Jun 1994 Oct 1995 Mar 1997 Jul 1998 Dec 1999 Apr 2001 Jan 1993 Jun 1994 Oct 1995 Mar 1997 Jul 1998 Dec 1999 Apr 2001 Average/median .c file size Average/median .h file size 700 140 600 120 500 100 Uncommented LOC Uncommented LOC 400 80 300 60 200 40 Average .c file size -- dev. releases Average .h file size -- dev. releases Average .c file size -- stable releases Average .h file size -- stable releases 100 20 Median .c file size -- dev. releases Median .h file size -- dev. releases Median .c file size -- stable releases Median .h file size -- stable releases 0 0 Jan 1993 Jun 1994 Oct 1995 Mar 1997 Jul 1998 Dec 1999 Apr 2001 Jan 1993 Jun 1994 Oct 1995 Mar 1997 Jul 1998 Dec 1999 Apr 2001 SS LOC as percentage of Growth of major SSs (dev. releases) total system 70.0 1,200,000 drivers Percentage of total system uncommented LOC 60.0 1,000,000 arch include drivers Total uncommented LOC 50.0 arch net 800,000 include fs net 40.0 kernel f s 600,000 kernel mm mm 30.0 ipc ipc lib 400,000 lib init 20.0 init 200,000 10.0 0 0.0 Jan 1993 Jun 1994 Oct 1995 Mar 1997 Jul 1998 Dec 1999 Apr 2001 Jan 1993 Jun 1994 Oct 1995 Mar 1997 Jul 1998 Dec 1999 Apr 2001 3

  4. SS LOC as percentage of total system (ignoring drivers) Growth of small core SSs 30.0 9000 arch kernel include 8000 Percentage of total system uncommented LOC 25.0 net mm fs 7000 ipc kernel lib Total uncommented LOC mm 20.0 6000 init ipc lib 5000 init 15.0 4000 10.0 3000 2000 5.0 1000 0.0 0 Jan 1993 Jun 1994 Oct 1995 Mar 1997 Jul 1998 Dec 1999 Apr 2001 Jan 1993 Jun 1994 Oct 1995 Mar 1997 Jul 1998 Dec 1999 Apr 2001 Growth of arch SSs Growth of drivers SSs 300,000 40,000 35,000 arch/ppc/ drivers/net 250,000 arch/sparc/ drivers/scsi 30,000 arch/sparc64/ drivers/char arch/m68k/ Total uncommented LOC drivers/video Total uncommented LOC 200,000 drivers/isdn arch/mips/ 25,000 drivers/sound arch/i386/ drivers/acorn arch/alpha/ 150,000 drivers/block 20,000 arch/arm/ drivers/cdrom drivers/usb arch/sh/ 15,000 drivers/"others" arch/s390/ 100,000 10,000 50,000 5,000 0 0 Jan 1993 Jun 1994 Oct 1995 Mar 1997 Jul 1998 Dec 1999 Apr 2001 Jan 1993 Jun 1994 Oct 1995 Mar 1997 Jul 1998 Dec 1999 Apr 2001 Why has Linux been able to Observations and hypotheses continue its geometric growth? � Growth along devel. path is super-linear � Core code quality is carefully maintained � Architecture/problem domain y = .21* x^ 2 + 252* x + 90,055 r2= .997 � It’s largely drivers y = size in LOC x = days since v1.0 � Much of the code is “parallel” r2 is “coefficient of determination” using least squares � It’s not as big as you might think � Vanilla configuration used only 15% of files [Lehman/Turski’s model: y’ = y + E/y^ 2 < y0 + x* E/y0^ 2 ] � Development model (OSD) and its sociology � Linux’s strong growth is continuing. � Popularity and visibility has encouraged outsiders � This is stronger growth at MLOC level than observed (both hackers and industry) to contribute by others (Lehman, Gall), even for other OSs. 4

  5. Growth of pine (email client) Growth of fetchmail [Raymond] 350 300 250 # of Modules 200 150 100 50 0 Jan-93 Jun-94 Oct-95 Mar-97 Jul-98 Dec-99 Apr-01 Growth of gcc/g++/egcs Growth of X Windows 3000 1000 X11R6 X11R6.3 900 X11R6.4 2500 X11R6.1 800 2000 700 # of Modules X11R5 g++ 600 # of modules 1500 gcc 500 egcs X11R3 1000 400 X10R4 X11R2 300 500 X10R3 X11R1 200 0 100 Nov-84 Aug-87 May-90 Jan-93 Oct-95 Jul-98 Apr-01 0 Aug-87 Dec-88 May-90 Sep-91 Jan-93 Jun-94 Oct-95 Mar-97 Jul-98 Dec-99 Apr-01 vim avg % comments and Growth of vim (text editor) blank lines per file 160,000 31.0 Average percent comments + blank lines 140,000 30.0 Total LOC ("wc -l") 120,000 Total LOC (ignoring comments and blank lines) 29.0 100,000 Total LOC 80,000 28.0 60,000 27.0 40,000 26.0 20,000 0 25.0 May 1990 Sep 1991 Jan 1993 Jun 1994 Oct 1995 Mar 1997 Jul 1998 Dec 1999 Apr 2001 May 1990 Sep 1991 Jan 1993 Jun 1994 Oct 1995 Mar 1997 Jul 1998 Dec 1999 Apr 2001 5

  6. vim avg/median file size vim ’s architecture 1000 900 Average uncommented LOC per source file 800 Median uncommented LOC per source file 700 Uncommented LOC 600 500 400 300 200 100 0 May 1990 Sep 1991 Jan 1993 Jun 1994 Oct 1995 Mar 1997 Jul 1998 Dec 1999 Apr 2001 Software evolution research: Hypotheses What next? Factors affecting evolution include So far, we have examined only growth. � More case studies needed � Size and age of system � Qualitative and quantitative � Use of traditional sw. eng. principles during � Industrial and open source systems development � Different problem domains, architectures PLUS � Supporting tools to aid analysing, visualizing, and � Problem domain querying program evolution � Problem complexity, multi-platform, multi-features � More than just RCS and perl � Software architecture � Support for architecture repair � Process model � Codified knowledge: Why and how does software change? � Sociology, market forces, and acts-of-God � Build catalogue of change patterns and evolutionary narratives Change patterns and Codified knowledge evolutionary narratives � Mature engineering disciplines codify knowledge and � Phenomena observed in Linux evolution experience. � Bandwagon effect � Arguably, this is lacking in software engineering. � Contributed third party code � Software architecture styles [Shaw] � “Mostly parallel” enables sustained growth � Design patterns [GoF] � Clone and hack � Codified knowledge of how and why programs evolve: � Careful control of core code; more flexibility on � Evolutionary narratives [Godfrey] contributed drivers, experimental features � Long term, coarse granularity � Change patterns � Short term, fine granularity 6

Recommend


More recommend