Hawkes processes achieve burstiness through self-excitation: events cause events. But there is another mechanism: the arrival rate itself may be randomly varying, driven by an unobserved environment. A Cox process (also called a doubly stochastic Poisson process) captures exactly this — the intensity is a random process, and conditional on it, events are Poisson.
A Cox process directed by the random intensity process {Λ(t)}_{t≥0} is defined as:
Conditional on the realization
{Λ(t) = λ(t)}, the event processNis an inhomogeneous Poisson process with intensityλ(t).
In other words, there are two levels of randomness:
Λ(t) is drawn from some distribution over functionsΛ(t), events occur as a standard NHPPA key property of Cox processes is overdispersion: the variance of event counts exceeds the mean.
For any interval (a, b]:
E[N(a,b]] = E[Λ(a,b]]
Var[N(a,b]] = E[Λ(a,b]] + Var[Λ(a,b]]
The first term is the Poisson variance (equidispersion); the second term is additional variance from the randomness of the intensity. Thus Var > Mean whenever the intensity is non-constant and random.
Fano factor: F = 1 + Var[Λ(a,b]] / E[Λ(a,b]] > 1.
This is the diagnostic for Cox processes: observe F > 1 in the data, and the Poisson model is inadequate. Note that Hawkes processes also produce overdispersion (via clustering), so F > 1 alone does not distinguish Cox from Hawkes — the ACF structure does.
The most popular Cox process in applications is the Log-Gaussian Cox Process (LGCP), where:
log Λ(t) = G(t)
and G(t) is a Gaussian process with mean function m(t) and covariance kernel k(t, s).
Why log-Gaussian? The log transformation ensures Λ(t) > 0 for all t. The Gaussian process provides a flexible model for smooth random variation.
Common kernels:
k(t,s) = σ²·exp(−(t−s)²/(2ℓ²)) — smooth, infinitely differentiablek(t,s) = σ²(1+√3|t−s|/ℓ)·exp(−√3|t−s|/ℓ) — rougher, 1 time differentiablek(t,s) = σ²·exp(−2sin²(π|t−s|/p)/ℓ²) — for periodic intensity functionsSimulation is a two-step procedure:
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF, ConstantKernel
def simulate_lgcp(T, n_grid=500, sigma=1.0, length_scale=5.0, mu_log=1.0):
# Step 1: sample a GP path on a fine grid
t_grid = np.linspace(0, T, n_grid).reshape(-1, 1)
kernel = ConstantKernel(sigma**2) * RBF(length_scale)
gp = GaussianProcessRegressor(kernel=kernel)
log_lambda = gp.sample_y(t_grid, n_samples=1).flatten() + mu_log
# Step 2: simulate NHPP via thinning
lambda_grid = np.exp(log_lambda)
lambda_bar = lambda_grid.max() * 1.05
dt = T / n_grid
events = []
t = 0.0
while t < T:
t += np.random.exponential(1.0 / lambda_bar)
if t >= T:
break
idx = min(int(t / dt), n_grid - 1)
if np.random.uniform() < lambda_grid[idx] / lambda_bar:
events.append(t)
return np.array(events), t_grid.flatten(), lambda_grid
Step 1 samples a random intensity path; Step 2 simulates events from that path. See code/09_cox_processes.py for the full implementation.
An alternative to the LGCP is the shot noise Cox process, where the intensity is driven by a superposition of decaying pulses:
Λ(t) = μ + Σᵢ h(t − sᵢ)
where {sᵢ} are the events of a Poisson process (the “shocks”), and h(t) is a deterministic response function (e.g., h(t) = a · exp(−b·t) · 1{t>0}).
This resembles the Hawkes process but with a key difference: the shocks {sᵢ} are not the observed events N. The shocks are latent (unobserved). The Cox process models an unobserved random environment; the Hawkes process models observed self-excitation.
Both Cox processes and Hawkes processes can produce overdispersion (F > 1) and clustering. The key difference lies in the autocorrelation function (ACF) of the counting process:
N(t, t+h] at lag Δ reflects the kernel k(t, t+Δ) of the GP — typically smooth and slowly decaying.exp(−β·Δ/(1−n*)) — governed by the excitation kernel, often faster-decaying and with a specific shape.Non-parametric estimation of the second-order structure (Bartlett spectrum or pair correlation function) can distinguish the two.
Λ(t); conditional on it, events are Poisson.Var[N(A)] = E[Λ(A)] + Var[Λ(A)] explains overdispersion.F > 1, but the ACF structure distinguishes them.| ← Chapter 8 | Table of Contents | Chapter 10: Marked Point Processes → |