Predicting octane content of gasoline using Near Infrared Spectra Data from: Kalivas, John H., "Two Data Sets of Near Infrared Spectra," Chemometrics and Intelligent Laboratory Systems, v.37 (1997) pp.255-259 Example courtesy of the Mathworks, Inc. (https://www.mathworks.com/help/stats/examples/partial-least-squares- regression-and-principal-components-regression.html)
NIR spectra and octane content for 60 gasolines. Why P.C. regression? • Large number of variables (401 wavelengths) • Highly correlated variables • Complex relationship – we expect multiple peaks to correlate with octane.
PCR model with 2 principal components • Low predictive value: R 2 < 0.2 • We can choose more components since we don’t need to graphically interpret them (like during PCA).
Selecting 2 principal components is not enough Our first two P.C. captured only 85% of the variance. The remaining 15% appears to be important.
10 principal components are highly predictive 10 components works great, but this was arbitrarily chosen. How many components do we need? Cross validation can help.
Cross validating with increasing components We repeat the PCR using 0 – 10 principal components. Four components seems to be sufficient. This is the most “parsimonious” model.
What is loaded onto the four components? Loadings of the first four principal components contain only a few “peaks”. These spectra are easier to interpret than the original NIR spectra.
Recommend
More recommend