| CSU | Hayward | |
|---|---|---|
| Statistics | Department |
Below we propose seven models for the occurrence of rain on a hypothetical island. All models deal with rain on a day-to-day basisa day is either called rainy (R) or not (N).
A. Independently of the weather on other days, the probability of rain on any one day is 1/7.
B. Every Sunday is rainy, but it never rains on any other day.
C. It can rain only on Sundays and Mondays; independently of the weather on other days, the probability of rain on any Sunday or Monday is 1/2.
D. Rain is possible on any day of the week. If today is rainy the probability that tomorrow will be rainy is 0.4; if today is not rainy, the probability that tomorrow will be rainy is 0.1.
E. Rainy periods are always exactly two days long. If it is not raining today, the probability that it will rain tomorrow is 1/12.
F. Rainy periods are always at least two days long. If it rained yesterday and it is raining today, the probability that it will rain tomorrow is 1/2. If it is not raining today, the probability that it will rain tomorrow is 1/18.
G. The rain gods of this island have a sacred circle with seven positions numbered 0, 1, ..., 6 in order around the circle, and with 6 adjacent to 0. Each day one of them tosses a fair coin. If the result is Heads, a marker on the circle is moved clockwise one position; and counterclockwise one position if Tails. The only clue humans have about the position of the marker is that it rains when, and only when, the marker is on 1.
Part 1. In each model, there is a way to make sense of the idea that over the long run it rains on 1/7 of the days. In each case, give the appropriate interpretation of this idea. While it is possible to interpret some of the models in terms of Markov chains, try to defend the idea of rain 1/7 of the time by using arguments involving algebra, expectations, limits, etc. that do not require explicit use of Markov models. [Hint: For Models E and F you may want to consider patterns of weather for overlapping pairs of days. For example, the sequence NNRRNNN would yield pairs NN, NR, RR, RN, NN, NN.]
Part 2. For each model let Xn, n = 1, 2, 3, ..., be a stochastic process, where Xn is 0 or 1 depending on whether the nth day is observed to be N or R. In which cases can this X-process be modeled as a homogeneous Markov chain? If it can be modeled as a Markov chain, give the appropriate state space and transition matrix. If not, show that the rules for a homogeneous Markov chain are violated.
Part 3. The 500 observations immediately following Part 4 were generated according to one of the seven models. (Rainy days are indicated by 1 and non-rainy days by 0. Read down the columns in sequence. For example, the first four days are non-rainy, followed by a rainy day; there are no rainy days among the last 20.) Can you determine which model is consistent with these data?
Part 4. (Intermediate) Give a continuous-time Markov process with states 0 (for N) and 1 (for R) such that it rains 1/7 of the time (continuous time).
0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 1 0 0 1 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 1 0 1 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 1 1 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 1 1 0 0 0 0 1 0 0 0 1 0 0 1 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 |
Note: You should be able to move these data from your browser to an empty Minitab worksheet: In your browser select the data (with the mouse) and "cut" the selection (CTRL+C). Then move to row 1, column 1 of the Minitab data window; paste the data (CTRL+V), selecting the "spaces as delimeters option" option when prompted; and finally use Minitab's MANIP menu to "stack" the data into a single column, say c26. For other statistical packages you may be able to use a similar procedure, or you may need to make a text file of the data and then read it into your package.
For Model A, we have independent identically distributed Bernoulli trials with
Model B is deterministic. If day 1 of the process is a Sunday and we express n as n =
For Model C consider a sequence of days that includes only Sundays and Mondays. Then the analysis is as for Model A with 1/7 replaced by 1/2. Hence, the long-run proportion of rainy days converges to 1 per week or 1/7 of all days.
For Model D suppose there is a steady-state probability p of rain on any one day so that the unconditional probability of rain today and tomorrow is the same. Either it rains today (probability p) or it doesn't (probability
For Model E consider sequences of two days with weather patterns RR, NR, RN, and NN. For example, no rain yesterday and rain today would be an instance of NR. Denote the probabilities of these sequences as
| 1/12 | = P(R tomorrow | N today) |
| = P(N today and R tomorrow) / P(N today) | |
| = b/(1 p), |
so that
Because these four probabilities must add to 1 the solution is
The analysis for Model F is similar, again with
so that
In Model F the symmetry of the "circle of the rain gods" suggests intuitively that over the long run the marker will be on 1 as often as on any other number and so it will rain 1/7 of the time. (Incidentally, notice that this model does not permit rain on two successive days.)
A slight variation of this model would be a circle with only 6 numbers
The complete independence in Model A makes a Markov model unnecessary, but it is possible to construct a trivial one that works. For example, let the state space be
Because Model B is deterministic any probability model will be trivial. One trivial Markov chain that does the job has state space
| 0 1 0 0 0 0 0 |
| 0 0 1 0 0 0 0 |
| 0 0 0 1 0 0 0 |
P = | 0 0 0 0 1 0 0 |
| 0 0 0 0 0 1 0 |
| 0 0 0 0 0 0 1 |
| 1 0 0 0 0 0 0 |
|
This chain is periodic, of period 7. Obviously, its stationary distribution is uniform. (Why "obviously"?) Does it have a limiting distribution?
Model C. If we pay attention only to the days on which rain can occur, Sunday and Monday, a satisfactory model has a
A model that includes all of the days needs to have state space
| 0 0 1/2 1/2 0 0 0 0 0 |
| 0 0 1/2 1/2 0 0 0 0 0 |
| 0 0 0 0 1 0 0 0 0 |
| 0 0 0 0 1 0 0 0 0 |
P = | 0 0 0 0 0 1 0 0 0 |
| 0 0 0 0 0 0 1 0 0 |
| 0 0 0 0 0 0 0 1 0 |
| 0 0 0 0 0 0 0 0 1 |
|1/2 1/2 0 0 0 0 0 0 0 |
|
This chain has period 7 with subclasses {RSun, NSun}, {RMon, NMon}, {Tues}, ..., {Sat}. It has the stationary distribution
Model D is quite simply represented as a Markov chain with state space
Because tomorrow's behavior can depend on the weather both yesterday and today, Model E cannot be a Markov chain with state space {N, R}. Because the dependence never goes back more than 2 days it can, however, be described as a Markov chain with state space
NN NR RN RR
NN | 11/12 1/12 0 0 |
P = NR | 0 0 0 1 |
RN | 11/12 1/12 0 0 |
RR | 0 0 1 0 |
|
This is an ergodic chain with stationary (limiting) distribution
Model F has the same state space as Model E, with the P-matrix changed slightly to accommodate the possibility of rainy periods extending beyond 2 days:
NN NR RN RR
NN | 17/18 1/18 0 0 |
P = NR | 0 0 0 1 |
RN | 17/18 1/18 0 0 |
RR | 0 0 1/2 1/2 |
|
This is also an ergodic chain. Its stationary (limiting) distribution is
In Model G the "circle of the rain gods" is a symmetric random walk on a circle with state space
| 0 1/2 0 0 0 0 1/2|
|1/2 0 1/2 0 0 0 0 |
| 0 1/2 0 1/2 0 0 0 |
P = | 0 0 1/2 0 1/2 0 0 |
| 0 0 0 1/2 0 1/2 0 |
| 0 0 0 0 1/2 0 1/2|
|1/2 0 0 0 0 1/2 0 |
|
Because this matrix is doubly stochastic the
However, the random walk on a circle is not the observed process. The observed process Xn takes only the values 0 (N) and 1 (R). The
|
|
The Markov property would require that these two probabilities be equal. However, the condition in the first probability implies that the random walk on the circle must be in either state 3 or state 6 at stage 2, so that it cannot possibly return to 1 at stage 3. On the other hand, the condition in the second probability implies that the random walk must be in either state 2 or 0 at stage 2, so that (either way) it has probability 1/2 of returning to 1.
The
A casual inspection of the data eliminates several of the models:
That leaves Models A and D as contenders. We cut the data from our browser and put them into c26 of a Minitab worksheet (making sure to preserve the proper time sequence) to do some statistical analysis. First we checked to see what percentage of the days are rainy (1s).
MTB > tally c26;
SUBC> percent.
C26 PERCENT
0 85.60
1 14.40
|
From this we see that the proportion of 1s is not far from the 1/7 expected under either model.
The key issue is now whether the observations are independent as in Model A or Markov dependent as in Model D. A "run" is a sequence of identical observations. For example, in the first column of the data there are four complete runs (two of them consisting of single rainy days) and a fifth run (or non-rainy days) has started. If the observations were independent, the expected number of runs would be about 124, based on 72 rainy days and 428 non-rainy days. [Most books on nonparametric statistics and mathematical statistics discuss the exact distribution of runs. Here is a very crude intuitive "computation." If all 72 of the rainy days occurred in isolation, then there would be 141, 142, or 143 runs depending on how the 1s were dispersed among the 0s. But we can expect about 1/7 of the 1s to be followed by another 1 so we can expect only about 62 runs of 1s interspersed with about an equal number of runs of 0s. (In this crude approximation we ignore the less-likely possibility of three or more rainy days in a row.) The following Minitab printout shows the results of a formal runs test. Clearly, the observed number of runs, 95, is (very highly) significantly smaller than the expected number. Instead of independent alternation of 0s and 1s we have a "clumping" that very strongly indicates a dependent structure.
MTB > runs c26
C26
K = 0.1440
THE OBSERVED NO. OF RUNS = 95
THE EXPECTED NO. OF RUNS = 124.2640
72 OBSERVATIONS ABOVE K 428 BELOW
THE TEST IS SIGNIFICANT AT 0.0000
|
Thus, if any of the seven models is correct, it must be Model D. As a check, we copied c26 into c27 and introduced a missing observation symbol (*) as the first entry in c26. Thus c26 can be thought of as "Today" and c27 as "Tomorrow" as we step through the data. The second cross-tabulation below shows percentages by row. In the first row the proportions are (0.89, 0.11) and in the second row they are (0.65, 0.35). Considering that we have only 500 observations, most of them in the first row, these proportions agree quite well with the probabilities is the first row (0.9, 0.1) and the second row (0.6, 0.4) of the transition matrix for Model D. A chi-squared test for independence on the counts in the first table provides an additional confirmation that the data violate the independence assumed in Model A. (You should perform this test and interpret the results.)
MTB > table c26 c27
ROWS: C26 COLUMNS: C27
0 1 ALL
0 380 47 427
1 47 25 72
ALL 427 72 499
CELL CONTENTS --
COUNT
|
MTB > table c26 c27;
SUBC> rowp.
ROWS: C26 COLUMNS: C27
0 1 ALL
0 88.99 11.01 100.00
1 65.28 34.72 100.00
ALL 85.57 14.43 100.00
CELL CONTENTS --
% OF ROW
|
The simple
Compared to what could have been asked, Part 3 is relatively simple. Not all of the models would be so easy to distinguish from one another based on only 500 observations and using standard statistical tests.
If these two rates are expressed in terms of days, then l = 1/6 and m = 1 would mean that
If the parameters l and m are reduced proportionately, the alternation between N and R is more leisurely; and if they are increased proportionately, the alternation becomes more frantic. But as long as the ratio
' Q-BASIC Program To Generate Observations From
' A 2-State Markov Chain With State Space {0, 1}
' by B. Trumbo, June 1999
CLS 'clears screen
N% = 500 'number of observations
p01 = 0.1 : p11 = 0.4 'transition probabilities
OPEN "a:mcrain.dat" FOR OUTPUT AS #1
RANDOMIZE TIMER 'initialize rand numb gen
Today% = 0
RainCount% = 0 'for printing to screen
FOR i% = 1 TO N%
r = RND 'uniform on (0, 1)
Tomorrow% = 0
IF Today% = 1 AND r < p11 THEN Tomorrow% = 1
IF Today% = 0 AND r < p01 THEN Tomorrow% = 1
RainCount% = RainCount% + Tomorrow%
PRINT Tomorrow%; 'prints to screen in line
PRINT #1, Tomorrow% 'prints record to file
Today% = Tomorrow%
NEXT
PRINT
PRINT RainCount% / N% 'proportion of rainy days
'printed to screen
END
|