Stochastic Process

Definitions

  • Probability theory is a branch of mathematics that deals with uncertainty. The core concept is random variable. It provides the rules for combining, manipulating, and analyzing these random variables.

  • Stochastic process (or random process) is a collection of random variables that are indexed by some set, typically representing time or space. The set used to index the random variables is called the index set.

  • Each random variable in the collection represents the state of the process at a particular time or location. Each random variable in the collection takes values from the same mathematical space known as the state space. This state space can be, for example, the integers, the real line or n-dimensional Euclidean space.

  • An increment is the amount that a stochastic process changes between two index values, often interpreted as two points in time.

  • A single outcome of a stochastic process is called a sample path/function or realization.

Classifications and Properties

  • discrete-time: finite or countable number of elements (e.g., set of integers, or natural numbers)

    • In mathematics, the number of elements in a countable set could be infinite, as long as it can have a one-to-one correspondence with (or an injective map into) the set of natural numbers. A countable set that is not finite is said to be countably infinite.

  • continuous-time: uncountable number of elements (e.g., the index set is some interval of the real line)

  • A stochastic process is said to have stationary increments if its change only depends on the time span of observation, but not on the time when the observation was started.

  • A stochastic process is said to have independent increments if the increments in disjoint intervals are independent random variables.

  • A stochastic process is said to have Markov property if the future of the process given the present is independent of the past.

Common Stochastic Processes

Random Walk

  • Let's begin with a few important discrete-time random processes before introducing random walks.

  • iid Random Process: Let Xn be a discrete-time random process consisting of a sequence of independent, identically distributed (iid) random variables with common cdf Fx(x), mean m and variance σ2\sigma^2. The sequence Xn is called the iid random process.

  • Bernoulli Random Process: an iid random process taking on values from the set {0, 1}.

  • Sum Process: obtained as the sum of a sequence of iid random variables. The sum process has independent increments, stationary increments, and satisfies Markov property.

  • Binomial Counting Process: the sum of n independent Bernoulli random variables.

  • Random Walk: is also a special case of the sum process.

    • Random Walk is a random process that describes a path that consists of a succession of random steps on some mathematical space. Each step is a random variable, which can be drawn from any distribution, such as Gaussian distribution, Poisson distribution etc.

    • Binomial Counting Process models the number of successes in a given number of trials, while the Random Walk models the position after a given number of steps. When p=0.5 for Binomial Counting Process, it becomes a simple kind of random walk.

    • An elementary example of a random walk: starts at 0 on the integer number line, and at each step moves +1 or −1 with equal probability.

  • Lévy process: is a stochastic process with independent, stationary increments.

    • It also satisfies Markov property.

    • It can be viewed as the continuous-time analog of a random walk.

    • Two most well known examples of Lévy process are Wiener Process and Poisson Process.

Gaussian Process

  • A random process X(t) is a Gaussian process if the samples X1 = X(t1), X2 = X(t2), ..., Xk = X(tk) are jointly Gaussian random variables for all k and all choices of t1, ..., tk.

    • Equivalently, for every finite set of indices t1, ..., tk in the index set T, (X1, X2, ..., Xk) is a multivariate Gaussian random variable.

    • Equivalently, every linear combination of (X1, X2, ..., Xk) has a univariate normal distribution.

    • This definition applies to both discrete-time and continuous-time processes.

  • Gaussian process can be seen as an infinite-dimensional generalization of multivariate normal distributions.

  • Gaussian process can be completely defined by a mean function and a covariance function.

    • The mean function corresponds to the expected value of the process.

    • The covariance function (also known as the kernel function) determines the dependency between different points in the process.

    • The linear operation (e.g., a sum, derivative, or integral) on a Gaussian process results in another Gaussian random process.

  • Whether a GP is stationary or not depends on the chosen covariance function, also known as the kernel function.

    • In GP, "stationary" translates to the property that the covariance between any pair of points depends only on the distance between the points, not on their absolute location. This is also known as "shift-invariance".

  • Applications

    • Robotics: estimate continuous trajectories; create a map of magnetic fields, WiFi signals, or other environmental features; multi-sensor fusion; learning from demonstration

    • Machine Learning: regression, classification, model-based reinforcement learning

    • The key is to choose a suitable kernel function; the computational cost can be high

Wiener Process (Brownian motion)

  • Wiener Process can be thought of a continuous-time analog of a random walk.

  • Wiener Process can be characterized by its four properties: 1) it starts at zero, 2) has independent increments, 3) has Gaussian increments, and 4) has continuous paths.

    • It has continuous paths that are nowhere differentiable, meaning that they change in a way that appears random at every scale.

    • Wiener Process is an integral of white noise Gaussian process, and therefore it is also a Gaussian process. Therefore, it can be described by a mean and a covariance function.

    • "white noise" in this context implies independent and identically distributed (i.i.d.) Gaussian random variables, and typically assumes zero mean.

    • The mean function: E[X(t)] = 0; The variance function: VAR[X] = \alpha t, which increases linearly with time.

  • Wiener Process is also a Lévy process, and a Markov process.

  • Applications: to model physical and financial phenomena, such as the motion of particles in a liquid (Brownian motion) and the price of a risky asset in financial mathematics.

Poisson Process

  • Poisson Process (or Poisson Point Process) can model the situation where events occur at random instants of time at an average rate of λ events per second.

  • Let N(t) be the number of event occurrences in the time interval [0, t]. N(t) is then a nondecreasing, integer-valued, continuous-time random process as shown in the figure.

  • Key properties: 1) the number of occurrences in each finite interval has a Poisson distribution; 2) the number of occurrences in disjoint intervals are independent random variables.

    • A Poisson process is fully characterized by its rate parameter λ.

    • The mean function: m(t) = E[N(t)=k] = λt; the variance function: VAR[N(t)] = λt.

    • The mean number of events in a given interval of length t is λt, and the variance of the number of events in a given interval of length t is also λt.

    • It is interesting to see that the Wiener process and the Poisson process have the same covariance function, despite the fact that the two processes have very different sample functions. This underscores the fact that the mean and auto-covariance functions are only partial descriptions of a random process.

  • Poisson Process is also a Lévy process, and a Markov process.

  • Applications: radioactive decay, telephone call arrivals, insurance mathematics, etc.

Markov Process (Markov Chain)

  • A Markov process (or a Markov chain) is a type of stochastic process that has the Markov property, meaning that the future state of the process depends only on the current state and not on the sequence of previous states.

  • Usually the term "Markov chain" is reserved for a process with a discrete set of times, and the term "Markov process" typically indicates a continuous-time stochastic process.

  • A Markov process can be characterized as a collection of random variables together with the probabilities of moving from one state to another.

  • A Markov process can be in either continuous-time or discrete-time, and whether it is stationary depends on its transition probabilities and initial state distribution.

Markov Decision Process (MDP)

  • An MDP is a type of Markov process that incorporates actions and rewards, which are used for decision-making problems. Typically designed for discrete time steps.

    • In an MDP, an agent makes decisions by choosing actions, each action leads to a transition from one state to another, and each transition leads to a reward.

    • The goal of the agent is to find a policy (a mapping from states to actions) that maximizes the expected sum of rewards (also known as the return).

    • MDPs are a fundamental framework in reinforcement learning.

  • An MDP is defined by a tuple (S, A, P, R), where:

    • S is a finite set of states

    • A is a finite set of actions

    • P is the state transition probability matrix

    • R is a reward function

Partially Observable Markov Decision Process (POMDP)

  • A POMDP is a generalization of an MDP to the case where the agent does not have complete knowledge of the state of the environment. Instead, the agent has to make decisions based on observations, which provide partial information about the state of the environment.

  • A POMDP is defined by a tuple (S, A, P, R, O, Z), where:

    • S, A, P, R are the same as for an MDP

    • O is a finite set of observations

    • Z is the observation function, which gives the probability of an observation given the state of the environment and the action taken by the agent

Markov Random Field (MRF)

  • An indexed collection of random variables is called a random field when the index has two or more dimensions.

  • A Markov random field (MRF), Markov network or undirected graphical model is a set of random variables having a Markov property described by an undirected graph.

Covariance and Correlation

  • Covariance: it is a measure to determine how much two variables are linearly related to each other. The magnitude depends on the units of the variables.

  • Correlation: it is a normalized version of covariance and ranges between -1 and 1, making it more interpretable than covariance. Being 0 means independent, and 1 (or -1) indicates perfect positive (or negative) linear relationship.

  • Auto-covariance: it is a special case of covariance where the two variables are the same variable at two different points in time.

  • Auto-correlation: it is a special case of correlation where the two variables are the same variable at two different points in time. It is a normalized version of auto-covariance.

  • Cross-covariance: a special case of covariance where two variables are from two time series.

  • Cross-correlation: a special case of correlation where two variances are from two time series.

References

  • Probability, Statistics, and Random Processes for Electrical Engineering, Third Edition, Alberto Leon-Garcia

  • Probability and Random Processes, Fourth Edition, Geoffrey Grimmett and David Stirzaker, Oxford University Press, 2020

  • Wikipedia and ChatGPT

Last updated