Joint Random Variables

Suppose we have two random variables and , where is the joint sample space. If and are defined on different experiments with sample spaces and , we may set . We may then analyse joint random variables in terms of (i.e. a mapping ).

1. Joint Random Variables

For each pair let be the set . Then, the induced probability for is given by:

1.1 Joint CDF

We can now define the joint cumulative distribution function as:

The marginal cumulative distribution function (i.e. the distribution of and separately) is given by:

A joint cdf has the following properties:

It has monotonicity:

1.1.1 Interval Probabilities

Suppose that we are interested in whether the random variable lies in the interval cross product : that is if . Since , we have:

1.2 Joint PMF

If and are both discrete random variables, then we can define the joint probability mass function as:

We can recover the marginal pmf's by summing over the other variable:

A joint pmf has the following properties:

The previous definitions readily generalize to random variables.

1.3 Joint PDF

In th econtinuous case, if such that:

Then we say and are jointly continuous and is the joint probability density function. Then, we also have:

For to be a valid pdf, we need to make sure:

We can also find the marginal pdf's by integrating over the other variable:

2. Independence & Expectation

Two random variables and are independent iff . This implies that:

For discrete variables, they are independent iff .
For continuous variables, they are independent iff .

Immediately from the following, we have that:

If then .
If and and are independent then .
Hence, iff and are independent.

2.1 Covariance & Correlation

For a random variable the the variance . The bivariate extension of this is , which is the covariance of and , :

When and and independent, .

Covariance measures how to random variables change in relatino to each other, and so is related to correlation. The correlation coefficient is defined as:

When and are independent, .

2.2 Multivariate Normal Distribution

A random vector with means is multivariate normal if it has joint pdf:

Where is the positive definite covairance matrix of :

Note that need not be independent.

3. Conditional Distributions

We can extend the concept of conditional probability to a conditional probability mass function:

This is valid for any . Bayes theorem may now be recast as:

3.1 Conditional PDFs

For the continuous case, we have the conditional probability density function, defined for :

Now, and are independent iff .

We can also rewrite bayes theorem as:

3.2 Conditional CDFs

For drvs, . For crvs, .

Conditional interval probabilities follow from the conditional cdf:

3.3 Law of Total Probability

For drvs, . For crvs, .

Equivalently, .

3.4 Conditional Expectation

For a drv, the conditional expectation of given is:

For a crv, the conditional expectation of given is:

In either case, the expectation is a function of but not of .

3.5 Law of Total Expectation

We can define a random variable , which is a function of by where . Then:

4. Markov Chain

We have already learned that a coin tossing sequence is a realization of independent discrete random variables , taking values in .

Discrete Time Markov Chains (DTMC) generalize this to support arbitary and dependent random variables:

is the state space, and its elements are the states .
where takes values in and models the state at time .
A realization of is called a sample path.
The goal is to calculate , the probability that at time the system reaches state .\

DTMCs assume that the Markov Property holds:

i.e. The choice of the next state depends only on the current state only, not on the past.

Leveraging the Markov property, a DTMC specification requires:

An initial probability vector where .
A translation probability matrix where .

This gives rise the following properties:

Each transition probability is independent of the time .
Self loops are alllowed, for instance means that the DTMC can never leave state .
is a non-negative matrix with rows that sum to (a stochastic matrix).

In general, we can form transient analysis to show that:

Therefore, by law of total probability, we have that:

4.1 Long Term Behaviour

We may want to look at situations where the DTMC stabilizes. Two characterizations are the most common:

Limiting Distribution is a vector s.t. .
Steady State Distribution is a vector that is invariant under the transition matrix, i.e. .

A limiting distribution, when it exists, is always a steady state distribution, but the converse is not true (i.e. a steady state distribution existing does not mean a limiting distribution exists).

Limiting and steady state distributions may not be unique.

4.2 Classifying DTMCs

A DTMC is irreducable if the directed graph associated to is strongly connected. This means that for any pair , there exists some sample path starting from state , the DTMC eventually reaches state .

A DTMC is periodic if its states can only be visited at integer multiples of a fixed period. Otherwise, it is aperiodic.

If a DTMC is irreducable and aperiodic, then:

Limiting and steady state distributions both exist. They are unique and identical (i.e. ).
The elements of are strictly positive.
is the solution of subject to .

Without aperiodicity, an irreducable DTMC no longer has a valid limiting distribution. However, a steady state distribution exists and is the unique solution to subject to .

Back to Home

Table of Contents