Linear Predictive Coding and Cepstrum coefficients for mining time variant information from software repositories G. Antoniol, F. Rollo and G. Venturi RCOST – Unievrsity of Sannio - Italy
LPC Idea � Model a time series with a polynomial approximation � LPC Cepstrum � smooth the spectrum Define the distance between two time series � as the distance between their polynomial approximations Use distance to cluster time series with � identical or similar evolutions.
LPC and Linux Kernel Similar pairs for different thresholds � 211 Linux releases about and coefficients used 1700 files 10000 � Study the influence of the 1E-3 1000 1E-4 number of coefficients 1E-5 � Study the influence of 100 12 16 20 32 distance thresholds � Mine files with similar Similar pair of evolving files evolution: � Create groups of files with 800 700 the same or very similar size 600 500 evolution 400 300 200 100 0 1 14 27 40 53 66 79 92 105 118 131 144 157 170 183 196 209 222 235 248
Recommend
More recommend