CSU Hayward

Statistics Department

Quiz Question 15:
Probability Models for Rain


Seven Models for Rain

Below we propose seven models for the occurrence of rain on a hypothetical island. All models deal with rain on a day-to-day basis—a day is either called rainy (R) or not (N).

A.   Independently of the weather on other days, the probability of rain on any one day is 1/7.

B.   Every Sunday is rainy, but it never rains on any other day.

C.   It can rain only on Sundays and Mondays; independently of the weather on other days, the probability of rain on any Sunday or Monday is 1/2.

D.   Rain is possible on any day of the week. If today is rainy the probability that tomorrow will be rainy is 0.4; if today is not rainy, the probability that tomorrow will be rainy is 0.1.

E.   Rainy periods are always exactly two days long. If it is not raining today, the probability that it will rain tomorrow is 1/12.

F.   Rainy periods are always at least two days long. If it rained yesterday and it is raining today, the probability that it will rain tomorrow is 1/2. If it is not raining today, the probability that it will rain tomorrow is 1/18.

G.   The rain gods of this island have a sacred circle with seven positions numbered 0, 1, ..., 6 in order around the circle, and with 6 adjacent to 0. Each day one of them tosses a fair coin. If the result is Heads, a marker on the circle is moved clockwise one position; and counterclockwise one position if Tails. The only clue humans have about the position of the marker is that it rains when, and only when, the marker is on 1.

Questions

Part 1. In each model, there is a way to make sense of the idea that over the long run it rains on 1/7 of the days. In each case, give the appropriate interpretation of this idea. While it is possible to interpret some of the models in terms of Markov chains, try to defend the idea of rain 1/7 of the time by using arguments involving algebra, expectations, limits, etc. that do not require explicit use of Markov models. [Hint: For Models E and F you may want to consider patterns of weather for overlapping pairs of days. For example, the sequence NNRRNNN would yield pairs NN, NR, RR, RN, NN, NN.]

Part 2. For each model let Xn, n = 1, 2, 3, ..., be a stochastic process, where Xn is 0 or 1 depending on whether the nth day is observed to be N or R. In which cases can this X-process be modeled as a homogeneous Markov chain? If it can be modeled as a Markov chain, give the appropriate state space and transition matrix. If not, show that the rules for a homogeneous Markov chain are violated.

Part 3. The 500 observations immediately following Part 4 were generated according to one of the seven models. (Rainy days are indicated by 1 and non-rainy days by 0. Read down the columns in sequence. For example, the first four days are non-rainy, followed by a rainy day; there are no rainy days among the last 20.) Can you determine which model is consistent with these data?

Part 4. (Intermediate) Give a continuous-time Markov process with states 0 (for N) and 1 (for R) such that it rains 1/7 of the time (continuous time).


Data for Part 3

  0  0  1  0  0  0  0  0  1  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0
  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0
  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0  1  0  0  0  0
  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  1  0  0  0  0  0  0  0  0
  1  1  0  0  0  0  0  0  1  0  0  1  0  0  1  0  1  0  0  1  0  0  0  0  0
  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0
  0  1  1  0  0  0  1  0  0  1  1  0  0  0  1  0  0  0  1  0  0  0  0  0  0
  0  1  0  1  1  0  0  0  0  0  1  0  1  0  1  0  0  0  0  0  0  0  0  0  0
  0  0  0  0  0  0  0  0  0  0  1  0  1  0  0  0  0  0  0  1  0  0  0  0  0
  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0  1  0  0  0  0  0  0  0  0
  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  0  1  0  1  1  0  0  1  0  0
  0  1  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0  1  1  0  0  0  0  0
  1  1  0  0  0  0  0  0  0  0  1  0  1  0  0  0  0  0  1  0  0  1  0  0  0
  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0  1  0  0  0  0  0  0  0
  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0
  0  0  0  0  0  0  0  1  0  0  0  1  0  1  0  0  1  1  0  0  0  0  1  0  0
  0  1  0  0  1  1  0  0  0  0  0  0  1  0  0  0  1  0  0  0  0  0  0  0  0
  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  1  1  0  0  1  0
  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0
  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  1  0  0  0  0  0  0  0  0

Note: You should be able to move these data from your browser to an empty Minitab worksheet: In your browser select the data (with the mouse) and "cut" the selection (CTRL+C). Then move to row 1, column 1 of the Minitab data window; paste the data (CTRL+V), selecting the "spaces as delimeters option" option when prompted; and finally use Minitab's MANIP menu to "stack" the data into a single column, say c26. For other statistical packages you may be able to use a similar procedure, or you may need to make a text file of the data and then read it into your package.


Answers

Part 1

For Model A, we have independent identically distributed Bernoulli trials with P(R) = P(Xi = 1) = 1/7, and hence E(Xi) = 1/7. Roughly speaking, the Law of Large Numbers ensures that it rains 1/7 of the time. More precisely, define Sn = X1 + X1 + ... + Xn and Rn = Sn/n. Thus Rn is the proportion of rainy days among the first n days. Then the Weak Law of Large Numbers says that Rn converges in probability to 1/7 and the Strong LLN says the Rn converges almost surely to 1/7.

Model B is deterministic. If day 1 of the process is a Sunday and we express n as n = 7(k – 1) + j, where k is the week number and j is the day of the week, then the proportion Rn of rainy days up through day n is the deterministic sequence Rn = k/n, which clearly converges to 1/7.

For Model C consider a sequence of days that includes only Sundays and Mondays. Then the analysis is as for Model A with 1/7 replaced by 1/2. Hence, the long-run proportion of rainy days converges to 1 per week or 1/7 of all days.

For Model D suppose there is a steady-state probability p of rain on any one day so that the unconditional probability of rain today and tomorrow is the same. Either it rains today (probability p) or it doesn't (probability 1 – p). From the information given, the total probability of rain tomorrow must be p = (0.4)p + (0.1)(1 – p) so that p = 1/7.

For Model E consider sequences of two days with weather patterns RR, NR, RN, and NN. For example, no rain yesterday and rain today would be an instance of NR. Denote the probabilities of these sequences as r = P(RR), b = P(NR), c = P(RN), and s = P(NN). If there is a steady-state probability p of rain then today's total rain probability p = r + c must equal tomorrow's rain probability p = r + b. From this it is obvious that b = c. This equality makes intuitive sense because every rainy period that begins (NR) must also end (RN). Intuitively, it is clear that r also has this value because every rainy sequence must contain exactly one instance of RR. By the definition of the given conditional probability, we have

1/12= P(R tomorrow | N today)
= P(N today and R tomorrow) / P(N today)
= b/(1 – p),

so that b = r = c = (1/12)(1 – p). Similarly s = (11/12)(1 – p), so that

s : b : c : r = 11 : 1 : 1 : 1.

Because these four probabilities must add to 1 the solution is s = 11/14, and b = r = c = 1/14, so that p = 1/14 + 1/14 = 1/7.

The analysis for Model F is similar, again with b = c. In this model, however, after the one guaranteed RR, additional RRs may occur according to a geometric distribution with parameter 1/2. So the average number of RRs in a rainy period is 2. Thus, an analysis similar to the one for Model E shows that

s : b : c : r = 17 : 1 : 1 : 2,

so that s = 17/21, r = 2/21, and b = c = 1/21. Finally, p = b + r = 3/21 = 1/7.

In Model F the symmetry of the "circle of the rain gods" suggests intuitively that over the long run the marker will be on 1 as often as on any other number and so it will rain 1/7 of the time. (Incidentally, notice that this model does not permit rain on two successive days.)

A slight variation of this model would be a circle with only 6 numbers 0, 1, ..., 5 and such that it rains with probability 6/7 when the marker is at 1. Suppose it rains on an initial day that we denote as "day 0," then the periodicity of the movement of the marker would ensure that rain happens only on even-numbered days thereafter. (The periodicity is due to the even number of positions around the circle.) Mere mortals with no prior knowledge of the occult mechanism might have real difficulty modeling a mechanism from observed rain data to explain the fact that it never rains on odd numbered days even though the wait between rainy days is quite variable.

Part 2

The complete independence in Model A makes a Markov model unnecessary, but it is possible to construct a trivial one that works. For example, let the state space be S = {N, R} = {0, 1}. Then the 2 x 2 transition matrix with two identical rows (6/7, 1/7) will do. This is an ergodic (i.e., finite, irreducible, aperiodic) chain and so it has a steady state distribution which is also a limiting distribution: (6/7, 1/7).

Because Model B is deterministic any probability model will be trivial. One trivial Markov chain that does the job has state space S = {Sun, Mon, ..., Sat} = {1, 2, ..., 7}, where we understand that 1 indicates R and that the other states correspond to N. The P-matrix corresponding to this state space simply ensures that the days of the week follow their natural order:


    | 0 1 0 0 0 0 0 |
    | 0 0 1 0 0 0 0 |
    | 0 0 0 1 0 0 0 |
P = | 0 0 0 0 1 0 0 |
    | 0 0 0 0 0 1 0 |
    | 0 0 0 0 0 0 1 |
    | 1 0 0 0 0 0 0 |

This chain is periodic, of period 7. Obviously, its stationary distribution is uniform. (Why "obviously"?) Does it have a limiting distribution?

Model C. If we pay attention only to the days on which rain can occur, Sunday and Monday, a satisfactory model has a 2 x 2 P-matrix with all four elements 1/2. This is an ergodic chain with limiting distribution (1/2, 1/2).

A model that includes all of the days needs to have state space S = {RSun, NSun, RMon, NMon, Tues, ..., Sat}, where it is understood that only RSun and RMon have rain. The transition matrix would be


    | 0   0  1/2 1/2  0   0   0   0   0 |
    | 0   0  1/2 1/2  0   0   0   0   0 |
    | 0   0   0   0   1   0   0   0   0 |
    | 0   0   0   0   1   0   0   0   0 |
P = | 0   0   0   0   0   1   0   0   0 |
    | 0   0   0   0   0   0   1   0   0 |
    | 0   0   0   0   0   0   0   1   0 |
    | 0   0   0   0   0   0   0   0   1 |
    |1/2 1/2  0   0   0   0   0   0   0 |

This chain has period 7 with subclasses {RSun, NSun}, {RMon, NMon}, {Tues}, ..., {Sat}. It has the stationary distribution (1/14, 1/14, 1/14, 1/14. 1/7, ..., 1/7). As in Model B, P(Xn) does not have a limit in the usual sense, but it converges in the Cesàro sense to 1/7.

Model D is quite simply represented as a Markov chain with state space S = {N, R} and a 2 x 2 matrix with first row (0.9, 0.1) and second row (0.6, 0.4). This is an ergodic chain with stationary distribution (6/7, 1/7), which is also a limiting distribution. (Can you find other matrices with the same limiting distribution? One of them doesn't allow rain on two days in a row.)

Because tomorrow's behavior can depend on the weather both yesterday and today, Model E cannot be a Markov chain with state space {N, R}. Because the dependence never goes back more than 2 days it can, however, be described as a Markov chain with state space S = {NN, NR, RN, RR}. With states in this order, the P-matrix is:


            NN    NR    RN   RR 
      NN | 11/12  1/12   0    0  |
P =   NR |   0     0     0    1  |
      RN | 11/12  1/12   0    0  |
      RR |   0     0     1    0  |

This is an ergodic chain with stationary (limiting) distribution (11/14. 1/14, 1/14, 1/14). If we go back to considering individual days, we note that any one rainy day is either the beginning of the sequence RN or the sequence RR. Thus, at steady state, the total probability of rain on a given day is 1/14 + 1/14 = 1/7.

Model F has the same state space as Model E, with the P-matrix changed slightly to accommodate the possibility of rainy periods extending beyond 2 days:

            NN    NR    RN    RR 
      NN | 17/18  1/18   0     0  |
P =   NR |   0     0     0     1  |
      RN | 17/18  1/18   0     0  |
      RR |   0     0    1/2   1/2 |

This is also an ergodic chain. Its stationary (limiting) distribution is (17/21, 1/21, 1/21, 2/21). The steady state probability that any single day will be rainy is 1/21 + 2/21 = 1/7.

In Model G the "circle of the rain gods" is a symmetric random walk on a circle with state space S = {0, 1, 2, 3, 4, 5, 6}. It is a Markov chain Un, n = 1, 2, 3, ..., unobservable by humans. Its P-matrix is

    | 0  1/2  0   0   0   0  1/2|
    |1/2  0  1/2  0   0   0   0 |
    | 0  1/2  0  1/2  0   0   0 |
P = | 0   0  1/2  0  1/2  0   0 |
    | 0   0   0  1/2  0  1/2  0 |
    | 0   0   0   0  1/2  0  1/2|
    |1/2  0   0   0   0  1/2  0 |

Because this matrix is doubly stochastic the U-process has the uniform stationary distribution on S. Because the chain is ergodic, this is also the limiting distribution. It rains if and only if the random walk on the circle is at state 1, so we can say that the steady state (limiting) probability of observing rain is 1/7.

However, the random walk on a circle is not the observed process. The observed process Xn takes only the values 0 (N) and 1 (R). The X-process is not Markovian, nor can it be made into a Markov process by considering sequences of states (as we did for Models E and F). To see that it is not Markov consider the following conditional probabilities:

P(Y3 = 1 | Y0 = 1, Y1 = 0, Y2 = 0) = 0,
P(Y3 = 1 | Y0 = 0, Y1 = 1, Y2 = 0) = 1/2.

The Markov property would require that these two probabilities be equal. However, the condition in the first probability implies that the random walk on the circle must be in either state 3 or state 6 at stage 2, so that it cannot possibly return to 1 at stage 3. On the other hand, the condition in the second probability implies that the random walk must be in either state 2 or 0 at stage 2, so that (either way) it has probability 1/2 of returning to 1.

The X-process is a function of the U-process (the random walk on a circle). Specifically, X = 1 if U = 1, and X = 0 otherwise. This example illustrates that a function of a Markov chain need not be a Markov chain. Because we can never be sure how far from 0 the random walk has gone, there is no way to use sequences of states for the observable X-process to express it as a Markov process with "compound" states.

Part 3

A casual inspection of the data eliminates several of the models:

That leaves Models A and D as contenders. We cut the data from our browser and put them into c26 of a Minitab worksheet (making sure to preserve the proper time sequence) to do some statistical analysis. First we checked to see what percentage of the days are rainy (1s).

 MTB > tally c26;
 SUBC> percent.
 
  C26   PERCENT
    0     85.60
    1     14.40

From this we see that the proportion of 1s is not far from the 1/7 expected under either model.

The key issue is now whether the observations are independent as in Model A or Markov dependent as in Model D. A "run" is a sequence of identical observations. For example, in the first column of the data there are four complete runs (two of them consisting of single rainy days) and a fifth run (or non-rainy days) has started. If the observations were independent, the expected number of runs would be about 124, based on 72 rainy days and 428 non-rainy days. [Most books on nonparametric statistics and mathematical statistics discuss the exact distribution of runs. Here is a very crude intuitive "computation." If all 72 of the rainy days occurred in isolation, then there would be 141, 142, or 143 runs depending on how the 1s were dispersed among the 0s. But we can expect about 1/7 of the 1s to be followed by another 1 so we can expect only about 62 runs of 1s interspersed with about an equal number of runs of 0s. (In this crude approximation we ignore the less-likely possibility of three or more rainy days in a row.) The following Minitab printout shows the results of a formal runs test. Clearly, the observed number of runs, 95, is (very highly) significantly smaller than the expected number. Instead of independent alternation of 0s and 1s we have a "clumping" that very strongly indicates a dependent structure.

 MTB > runs c26
 
   C26    
 
   K =   0.1440
 
 
   THE OBSERVED NO. OF RUNS =  95
   THE EXPECTED NO. OF RUNS = 124.2640
   72 OBSERVATIONS ABOVE K  428 BELOW 
         THE TEST IS SIGNIFICANT AT  0.0000

Thus, if any of the seven models is correct, it must be Model D. As a check, we copied c26 into c27 and introduced a missing observation symbol (*) as the first entry in c26. Thus c26 can be thought of as "Today" and c27 as "Tomorrow" as we step through the data. The second cross-tabulation below shows percentages by row. In the first row the proportions are (0.89, 0.11) and in the second row they are (0.65, 0.35). Considering that we have only 500 observations, most of them in the first row, these proportions agree quite well with the probabilities is the first row (0.9, 0.1) and the second row (0.6, 0.4) of the transition matrix for Model D. A chi-squared test for independence on the counts in the first table provides an additional confirmation that the data violate the independence assumed in Model A. (You should perform this test and interpret the results.)

 MTB > table c26 c27

  ROWS: C26   COLUMNS: C27
  
            0    1    ALL
   
   0    380     47    427
   1     47     25     72
  ALL   427     72    499
  
   CELL CONTENTS --
           COUNT

 MTB > table c26 c27;
 SUBC> rowp.
   
  ROWS: C26   COLUMNS: C27
  
            0      1      ALL
   
   0    88.99  11.01   100.00
   1    65.28  34.72   100.00
  ALL   85.57  14.43   100.00
  
   CELL CONTENTS --
           % OF ROW

The simple Q-Basic program shown at the end of these answers was used to generate the 500 observations from Model D shown in the statement of this part. Whether or not you have Q-Basic available, you can use the algorithm behind this program to generate your own data from Model D. (Try it.) Simple embellishments of the algorithm (including "Yesterday") can be used to simulate Models E and F. You can use Minitab to simulate Model A. (One way is to use the command random 500 c1; with the subcommand binomial 1 0.14286.)

Compared to what could have been asked, Part 3 is relatively simple. Not all of the models would be so easy to distinguish from one another based on only 500 observations and using standard statistical tests.

Part 4

We require a continuous-time Markov process for which the stationary distribution is (6/7, 1/7). Let l be the mean of a Poisson process that governs transitions from N to R, and let m be the mean of a Poisson process that governs transitions from R to N. The balance equation is (6/7)l = (1/7)m, so any choice of l and m with m/l = 6 would work. [The 2 x 2 Q-matrix of instantaneous transition intensities has first row (–ll) and second row (m, –m).]

If these two rates are expressed in terms of days, then l = 1/6 and m = 1 would mean that

If the parameters l and m are reduced proportionately, the alternation between N and R is more leisurely; and if they are increased proportionately, the alternation becomes more frantic. But as long as the ratio m/l = 6 is maintained, it rains 1/7 of the time over the long run.


Appendix—BASIC Program To Simulate
2-State Markov Chain (Part 3)

'       Q-BASIC Program To Generate Observations From
'       A 2-State Markov Chain With State Space {0, 1}
'       by B. Trumbo, June 1999

CLS                                 'clears screen
N% = 500                            'number of observations
p01 = 0.1 : p11 = 0.4               'transition probabilities
OPEN "a:mcrain.dat" FOR OUTPUT AS #1
RANDOMIZE TIMER                     'initialize rand numb gen
Today% = 0
RainCount% = 0                      'for printing to screen

FOR i% = 1 TO N%
        r = RND                     'uniform on (0, 1)
        Tomorrow% = 0
        IF Today% = 1 AND r < p11 THEN Tomorrow% = 1
        IF Today% = 0 AND r < p01 THEN Tomorrow% = 1
        RainCount% = RainCount% + Tomorrow%
        PRINT Tomorrow%;            'prints to screen in line
        PRINT #1, Tomorrow%         'prints record to file
        Today% = Tomorrow%
NEXT

PRINT
PRINT RainCount% / N%               'proportion of rainy days
                                    'printed to screen
END

Copyright © 1999 by Bruce E. Trumbo. All rights reserved. Intended for instructional use at California State University, Hayward. Please request permission in advance for other uses: btrumbo@csuhayward.edu.