Time series Decomposing a series into meaningful components R.W. Oldford
Time series data - Sunspot activity Recall the measurements of monthly mean relative sunspot numbers from 1749 to 1983 plotted as Monthly sunspot activity 200 Sunspot numbers 100 50 0 1750 1800 1850 1900 1950 Date The principal feature here was the rise and fall of sunspot activity roughly following an eleven year cycle.
Time series data - Sunspot activity Working with the square root of the sunspot numbers, the plot can be coloured according to 11 year blocks: Sunspot activity 15 Mean sqrt monthly sunspots 10 5 0 1750 1800 1850 1900 1950 Time There are 22 different colours (randomized order here) each corresponding to a different 11 year cycle.
Time series data - Sunspot activity And the putative 11 year cycles overlaid to see how well they match: 11 year sequences 15 Mean sqrt monthly sunspots 10 5 0 0 20 40 60 80 100 120 Month number in 11 year sequence One reason we could overlap these is that the sunspot pattern is relatvely consistent, or stationary, over time. It does not seem to be the case, for example, that the number of sunspots is growing (or declining) dramatically over the past few centuries. This is not always the case for time series.
Atmospheric CO 2 concentrations Let’s consider another physical time series, the concentration of carbon dioxide or CO 2 (a “greenhouse gas”) over over the Hawaii Island volcano Mauna Loa. These measurements have been made since at least March 1958. The R monthly time series co2 contains atmospheric concentrations (in parts per million) over Mauna Loa of CO 2 for the years 1959 to 1997. Three months (Feb., March, April, 1964) were missing and have simply had their values interpolated. plot (co2) 360 co2 340 320 1960 1970 1980 1990 Time What happens when you change the aspect ratio? (Try with zoom in RStudio, or with interactive loon plots as shown on the next slide.)
Atmospheric CO 2 concentrations We could again explore this interactively, using loon # loon requires at least R 3.4.1 # install loon from CRAN install.packages("loon") or # follow install instructions at # http://waddella.github.io/loon/install.html library (loon) # First a helper function to get nice dates out of a monthly timeseries getDates <- function (ts) { mapply (FUN= function (x, y) { paste (x, y)}, floor ( time (co2)), month.name)} data <- co2 p <- l_plot (data, title="Mauna Loa Atmospheric CO2", ylabel="concentration (ppm)", xlabel = "Date", linkingGroup="co2", color = "grey20", showScales = TRUE, showGuides = TRUE, itemLabel = getDates (data), showItemLabels=TRUE) # Another way to get the x and y coordinates xy.raw <- xy.coords (data) l_layer_line (p, x= xy.raw $ x, y= xy.raw $ y, color = "grey50", linewidth = 3, index="end", label="connect the dots") Unlike the sunspots data, there are clearly two patterns in the CO 2 data: an overall rising trend, and a shorter pattern repeating within each year .
Atmospheric CO 2 concentrations We could try to capture these patterns by looking at averages: ◮ first over all months for each year ◮ second over all years for each month and then plot these averages over time. # First we need a couple of helper functions # As before we need to get the years from our time-series object # We can use time(...) head ( time (co2)) ## [1] 1959.000 1959.083 1959.167 1959.250 1959.333 1959.417 getYears <- function (ts) { unique ( floor ( time (ts)))} getYears (co2) ## [1] 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 ## [16] 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 ## [31] 1989 1990 1991 1992 1993 1994 1995 1996 1997 # And get the number of the month. # This will actually work more generally using frequency() getMonthNos <- function (ts) {1 :frequency (ts)} getMonthNos (co2) ## [1] 1 2 3 4 5 6 7 8 9 10 11 12
Atmospheric CO 2 concentrations We can now compute the averages: ◮ first over all months for each year ◮ second over all years for each month as follows: getAves <- function (x, by = "year" ){ years <- getYears (x) monthNos <- getMonthNos (x) nyears <- length (years) nmonths <- length (monthNos) if ( ! (by %in% c ("year", "month"))) { stop ("unknown value for by = ", by, "; by must be one of {\"year\", \"month\"}") } if (by == "year"){ aves <- sapply (1 : nyears, FUN= function (i) { mean ( window (x, start = c (years[i], monthNos[1]), end = c (years[i], monthNos[nmonths])))}) aves <- data.frame (aves = aves, row.names = years) } else { aves <- sapply (1 : nmonths, FUN= function (i) { mean ( window (x, start = c (years[1], monthNos[i]), end = c (years[nyears], monthNos[i]), frequency=1))}) aves <- data.frame (aves = aves, row.names = month.abb[monthNos]) } aves }
Atmospheric CO 2 concentrations We now compute the averages: first by year over all months # over months (by year) head ( getAves (co2, by = "year")) ## aves ## 1959 315.8258 ## 1960 316.7475 ## 1961 317.4850 ## 1962 318.2975 ## 1963 318.8325 ## 1964 319.4625 Now compute the averages: second by month over all years # over years (by month) head ( getAves (co2, by = "month")) ## aves ## Jan 336.4308 ## Feb 337.2033 ## Mar 338.0546 ## Apr 339.2944 ## May 339.8821 ## Jun 339.3282
Atmospheric CO 2 concentrations And we can plot them ◮ first by year (over all months) ◮ second by month (over all years) ## over months (by year) plot ( getYears (co2), getAves (co2, by = "year") $ aves, type="l", xlab="Year", ylab="Annual average over all months") ## over years (by month) plot ( getAves (co2, by = "month") $ aves, type="l", xlab="Month", xaxt="n", ylab="Monthly average over all years") axis (1, at= getMonthNos (co2), labels=month.abb)
Atmospheric CO 2 concentrations And we can plot them ◮ first by year(over all months) 360 Annual average over all months 350 340 330 320 1960 1970 1980 1990 Year We have smoothed out all of the bumps given each season to reveal the overall trend in the data.
Atmospheric CO 2 concentrations And we can plot them ◮ first by month (over all years) 340 339 Monthly average over all years 338 337 336 335 334 January February March April May June July August September October November December Month We have smoothed out the data over all years to show the average seasonal pattern.
Atmospheric CO 2 concentrations In some sense, we have decomposed the series into two pieces; an overall trend (across years), and a seasonal (within year) pattern. We could plot these together with the data as follows # Get the number of years nyears <- length ( getYears (co2)) # Reduce white space between plots savePar <- par (mfrow= c (3,1), mar= c (2, 4, 1.5, 0.5), oma= rep (0.1,4)) # Original data plot (co2, main= "Atmospheric CO2 (in ppm)") # The trend (annual averages) plot ( getYears (co2), getAves (co2) $ aves, type="l", ylab="Annual average") # The seasonal pattern (month average over all years) # We repeat the pattern here so that every season is shown (identically) seasonalAves <- getAves (co2, by = "month") $ aves plot ( time (co2), rep (seasonalAves, nyears), type="l", ylab="Monthly averages") par (savePar)
Atmospheric CO 2 concentrations Atmospheric CO2 (in ppm) 360 co2 340 320 1960 1970 1980 1990 360 Annual average 350 340 330 320 1960 1970 1980 1990 340 Monthly averages 338 336 334 1960 1970 1980 1990
Atmospheric CO 2 concentrations There exist functions in R that will produce this decomposition of a time series for you. # The default decomposition is an additive one # (multiplicative also available) co2_decomp <- decompose (co2) # having structure str (co2_decomp) # which can be plotted as plot (co2_decomp)
Atmospheric CO 2 concentrations Decomposition of additive time series 360 observed 340 320 360 trend 340 320 3 2 seasonal 1 −1 −3 0.5 random 0.0 −0.5 1960 1970 1980 1990 Time The three bottom pieces sum to the top one. Notice the y -axis labels on each and what this says about the range of possible values for each component.
Atmospheric CO 2 concentrations The visual decomposition is itself a model. Namely, that for time t Y ( t ) = Trend ( t ) + Seasonality ( t ) + Random ( t ) ◮ the time series is visually decomposed into the sum of several smooth components and one “random” component. ◮ each smooth component is interpretable and captures an important feature of the series ◮ the “random” component consists set of residuals from the fitted model; we expect to see no pattern in the residuals Questions: 1. The smooths are simple averages (over months within each year; over years within each month). ◮ As we did with density estimates, can we do better than simple averages? 2. This visual decomposition seems like a good idea: ◮ each piece summarizes important visual features of the data ◮ can it be used for more than time series? If so, how?
Summarizing a y on x relationship Relating y to x 1.0 0.5 0.0 y −0.5 −1.0 −1.5 0.0 0.2 0.4 0.6 0.8 1.0 x Often model the values y i as y i = µ ( x i ) + r i where µ ( x ) is a simple summary of the y i s at x = x i
Recommend
More recommend