Suppose we have two random variables and , where is the joint sample space. If and are defined on different experiments with sample spaces and , we may set . We may then analyse joint random variables in terms of (i.e. a mapping).
1. Joint Random Variables
For each pair let be the set . Then, the induced probability for is given by:
1.1 Joint CDF
We can now define the joint cumulative distribution function as:
The marginal cumulative distribution function (i.e. the distribution of and separately) is given by:
A joint cdf has the following properties:
It has monotonicity:
1.1.1 Interval Probabilities
Suppose that we are interested in whether the random variable lies in the interval cross product : that is if . Since , we have:
1.2 Joint PMF
If and are both discrete random variables, then we can define the joint probability mass function as:
We can recover the marginal pmf's by summing over the other variable:
A joint pmf has the following properties:
The previous definitions readily generalize to random variables.
1.3 Joint PDF
In th econtinuous case, if such that:
Then we say and are jointly continuous and is the joint probability density function. Then, we also have:
For to be a valid pdf, we need to make sure:
We can also find the marginal pdf's by integrating over the other variable:
2. Independence & Expectation
Two random variables and are independent iff . This implies that:
For discrete variables, they are independent iff .
For continuous variables, they are independent iff .
Immediately from the following, we have that:
If then .
If and and are independent then .
Hence, iff and are independent.
2.1 Covariance & Correlation
For a random variable the the variance. The bivariate extension of this is , which is the covariance of and , :
When and and independent, .
Covariance measures how to random variables change in relatino to each other, and so is related to correlation. The correlation coefficient is defined as:
When and are independent, .
2.2 Multivariate Normal Distribution
A random vector with means is multivariate normal if it has joint pdf:
Where is the positive definitecovairance matrix of :
Note that need not be independent.
3. Conditional Distributions
We can extend the concept of conditional probability to a conditional probability mass function:
This is valid for any . Bayes theorem may now be recast as:
3.1 Conditional PDFs
For the continuous case, we have the conditional probability density function, defined for :
Now, and are independent iff .
We can also rewrite bayes theorem as:
3.2 Conditional CDFs
For drvs, .
For crvs, .
Conditional interval probabilities follow from the conditional cdf:
3.3 Law of Total Probability
For drvs, .
For crvs, .
Equivalently, .
3.4 Conditional Expectation
For a drv, the conditional expectation of given is:
For a crv, the conditional expectation of given is:
In either case, the expectation is a function of but not of .
3.5 Law of Total Expectation
We can define a random variable , which is a function of by where . Then:
4. Markov Chain
We have already learned that a coin tossing sequence is a realization of independent discrete random variables , taking values in .
Discrete Time Markov Chains (DTMC) generalize this to support arbitary and dependent random variables:
is the state space, and its elements are the states .
where takes values in and models the state at time .
A realization of is called a sample path.
The goal is to calculate , the probability that at time the system reaches state .\
DTMCs assume that the Markov Property holds:
i.e. The choice of the next state depends only on the current state only, not on the past.
Leveraging the Markov property, a DTMC specification requires:
An initial probability vector where .
A translation probability matrix where .
This gives rise the following properties:
Each transition probability is independent of the time .
Self loops are alllowed, for instance means that the DTMC can never leave state .
is a non-negative matrix with rows that sum to (a stochastic matrix).
In general, we can form transient analysis to show that:
Therefore, by law of total probability, we have that:
4.1 Long Term Behaviour
We may want to look at situations where the DTMC stabilizes. Two characterizations are the most common:
Limiting Distribution is a vector s.t. .
Steady State Distribution is a vector that is invariant under the transition matrix, i.e. .
A limiting distribution, when it exists, is always a steady state distribution, but the converse is not true (i.e. a steady state distribution existing does not mean a limiting distribution exists).
Limiting and steady state distributions may not be unique.
4.2 Classifying DTMCs
A DTMC is irreducable if the directed graph associated to is strongly connected. This means that for any pair , there exists some sample path starting from state , the DTMC eventually reaches state .
A DTMC is periodic if its states can only be visited at integer multiples of a fixed period. Otherwise, it is aperiodic.
If a DTMC is irreducable and aperiodic, then:
Limiting and steady state distributions both exist. They are unique and identical (i.e. ).
The elements of are strictly positive.
is the solution of subject to .
Without aperiodicity, an irreducable DTMC no longer has a valid limiting distribution. However, a steady state distribution exists and is the unique solution to subject to .