example exploration
play

Example exploration Old Faithful R.W. Oldford Old Faithful In the - PowerPoint PPT Presentation

Example exploration Old Faithful R.W. Oldford Old Faithful In the Yellowstone National Park, Wyoming, USA there is a famous geyser called Old Faithful which erupts with some regularity. The physical model is thought to be something like


  1. Example exploration Old Faithful R.W. Oldford

  2. Old Faithful In the Yellowstone National Park, Wyoming, USA there is a famous geyser called “Old Faithful” which erupts with some regularity. The physical model is thought to be something like the following illustration (from Rinehart (1969), p. 572, via Azzalini and Bowman (1990)):

  3. Old Faithful: stages Azzalini and Bowman (1990) describe the stages as: Stage 1. “The tube is full of water which is heated by the surrounding rocks. The water is heated above the normal boiling temperature because of increased pressure. This due to the mass of water which is on top: the deeper the water the higher the temperature required for boiling. Moreover, ’whereas the water in the tube is superheated with respect to the ambient boiling point at the mouth of the geyser, the water temperature at depth is far below the boiling point curve that must be applied to a vertical column of water’. " Stage 2. “When the top water reaches the boiling temperature, it becomes steam and moves towards the surface. The pressure at the bottom then drops rapidly to the normal level and, by an induction effect, the bottom water rapidly becomes steam. This cascading mechanism is repeated several times: as water is converted into steam, the pressure on lower water is decreased, causing the production of more steam and triggering the eruption." Stage 3a. “ ’If at the time of cascading the temperature in the lower regions is lower than might be expected, cascading stops short of the bottom and the play is short.’ Stage 3b. “Alternatively, ’when the temperature is comparatively high at these depths, cascading works itself down much farther and the play is long’." Stage 4. “The geyser tube is completely or partly empty, ready to be filled with new water." “We do not discuss geological reasons for the fact that sometimes the cascading effect works down to the bottom of the tube while at other times it stops earlier. We simply note the phenomenon and discuss its consequences. Stages 3a and 3b are associated with short and long waiting times for the next eruption. In stage 3a, the system starts a new cycle partially filled with hot water so that the following heating time is shorter; at the new eruption the entire tube will be emptied, since part of the water had already been heated in the previous cycle."

  4. Old Faithful: Data collection For each eruption, the waiting time w between its beginning and the beginning of the previous eruption is recorded to the nearest minute and the duration d of the eruption is recorded to fractions of a minute. Collected from August 1st until August 15th, 1985 the data record the 299 successive eruptions which occurred during this time. Though R. A. Hutchinson, the park geologist, collected similar data sets, it is not clear from the source whether or not this data set is one of them. Measurements had to be taken through the night and duration times for these eruptions were recorded only as being one of short, medium, or long (encoded here as 2, 3, or 4 minutes, respectively). Questions 1. What might the scientific investigators have in mind for a target population/process ? 2. What might be the study population/process available to the scientific investigators? Why might there be study error ? 3. What is the sample in this case? Why might there be sample error ? 4. Imagine the process for selecting a sample. How might this process produce sampling bias? 5. Imagine the measuring process . What problem(s) do you think might be associated with the measuring process? How might it manifest itself in terms of measuring bias and/or variability?

  5. Old Faithful: The data The data is available to us as a data.frame in R via the package MASS . data (geyser, package = "MASS") nrow (geyser) ## [1] 299 names (geyser) ## [1] "waiting" "duration" geyser[1 : 2,] ## waiting duration ## 1 80 4.016667 ## 2 71 2.150000 summary (geyser) ## waiting duration ## Min. : 43.00 Min. :0.8333 ## 1st Qu.: 59.00 1st Qu.:2.0000 ## Median : 76.00 Median :4.0000 ## Mean : 72.31 Mean :3.4608 ## 3rd Qu.: 83.00 3rd Qu.:4.3833 ## Max. :108.00 Max. :5.4500

  6. Old Faithful: The data exploration Could look at each measurement to see how the values appear to be distributed. For example, simply sorted: # the sorted values plot ( sort (geyser $ duration), xlab = "Index: smallest to largest", ylab ="duration") 5 4 duration 3 2 1 0 50 100 150 200 250 300 Index: smallest to largest What is this plot called? What do you learn? Repeat the above for the waiting times.

  7. Old Faithful: The data exploration Could look at each measurement to see how the values appear to be distributed. For example, by a histogram: hist (geyser $ duration, xlab = "duration", col = "grey", main = "Old Faithful") Old Faithful 80 60 Frequency 40 20 0 1 2 3 4 5 duration What do you learn? How does this connect with the previous plot? Repeat the above for the waiting times.

  8. Old Faithful: The data exploration Could look at each measurement to see how the values appear to be distributed. For example, by a density estimate: # the density plot ( density (geyser $ duration), xlab = "duration", main = "Old Faithful") # fill it polygon ( density (geyser $ duration), col = "grey") Old Faithful 0.5 0.4 0.3 Density 0.2 0.1 0.0 0 1 2 3 4 5 6 duration What do you learn? How does this connect with the previous plot?

  9. Old Faithful: The data exploration For eruption i , ◮ d i denotes its duration in minutes and ◮ w i the time between its beginning and the beginning of the previous eruption. Questions 1. Why might a plot of the pairs ( w i , d i ) be of interest? What do you notice: plot (geyser) 5 4 duration 3 2 1 50 60 70 80 90 100 110 waiting 2. What other pairs might be of interest?

  10. Old Faithful: The data exploration 3. Why might each of the following be of interest? ◮ ( d i , w i +1 ) ◮ ( d i − 1 , d i ) ◮ ( w i − 1 , w i ) ◮ ( i , w i ) ◮ ( i , d i ) 4. How do we plot each of these?

  11. Old Faithful: The data exploration ( d i , w i +1 ) n <- nrow (geyser) with (geyser, plot (x = duration[ - n], y = waiting[ - 1], xlab = "duration", ylab="following waiting time", main = "Old Faithful") ) Old Faithful 110 100 90 following waiting time 80 70 60 50 1 2 3 4 5 duration What do you learn here?

  12. Old Faithful: The data exploration ( d i − 1 , d i ) n <- nrow (geyser) with (geyser, plot (x = duration[ - n], y = duration[ - 1], xlab = "duration", ylab="following duration", main = "Old Faithful") ) Old Faithful 5 4 following duration 3 2 1 1 2 3 4 5 duration What do you learn here?

  13. Old Faithful: The data exploration ( w i − 1 , w i ) n <- nrow (geyser) with (geyser, plot (x = waiting[ - 1], y = waiting[ - n], xlab = "waiting time", ylab="following waiting time", main = "Old Faithful") ) Old Faithful 110 100 90 following waiting time 80 70 60 50 50 60 70 80 90 100 110 waiting time What do you learn here?

  14. Old Faithful: The data exploration ( i , w i ) with (geyser, plot (x = waiting, type="l", xlab = "index", ylab="waiting time", main = "Old Faithful") ) Old Faithful 110 100 90 waiting time 80 70 60 50 0 50 100 150 200 250 300 index What do you learn here?

  15. Old Faithful: The data exploration ( i , d i ) with (geyser, plot (x = duration, type="b", xlab = "index", ylab="duration", main = "Old Faithful") ) Old Faithful 5 4 duration 3 2 1 0 50 100 150 200 250 300 index Why was type “b” used? What do you learn here?

  16. Old Faithful: The data exploration ( i , d i ) points only with (geyser, plot (x = duration, type="p", xlab = "index", ylab="duration", main = "Old Faithful") ) Old Faithful 5 4 duration 3 2 1 0 50 100 150 200 250 300 index What do you learn here?

  17. Old Faithful: The data exploration We could add a “smooth” function estimating the mean response to any of the above plots. For example ( d i − 1 , d i ) n <- nrow (geyser) x <- geyser $ duration[ - n] y <- geyser $ duration[ - 1] smoothfit <- loess ( y ~ x) xvals <- extendrange (x) xvals <- seq ( min (xvals), max (xvals), length.out = 400) predictions <- predict (smoothfit, newdata = data.frame (x = xvals)) plot (x, y, xlab = "duration", ylab="following duration", main = "Old Faithful") lines (xvals, predictions, col = "steelblue") Old Faithful 5 4 following duration 3 2 1 1 2 3 4 5 duration

  18. Old Faithful: The data exploration Now do the same for the waiting times: add a “smooth” function estimating the mean response. n <- nrow (geyser) x <- geyser $ waiting[ - n] y <- geyser $ waiting[ - 1] smoothfit <- loess ( y ~ x) xvals <- extendrange (x) xvals <- seq ( min (xvals), max (xvals), length.out = 400) predictions <- predict (smoothfit, newdata = data.frame (x = xvals)) plot (x, y, xlab = "duration", ylab="following duration", main = "Old Faithful") lines (xvals, predictions, col = "steelblue") Old Faithful 110 100 90 following duration 80 70 60 50 50 60 70 80 90 100 110 duration

Recommend


More recommend