Given a sequence of observed event times, how do we fit a Hawkes process? The standard approach is maximum likelihood estimation (MLE). The exponential kernel admits a log-likelihood that can be computed in O(n) time using a recursive auxiliary variable — making MLE practical even for large datasets.
From chapter 1, the general log-likelihood for a point process on [0, T] is:
ℓ(θ) = Σᵢ log λ*(tᵢ; θ) − ∫₀ᵀ λ*(t; θ) dt
For the Hawkes process with exponential kernel φ(t) = α·exp(−βt), the two terms become:
Term 1 — Sum of log intensities at event times:
Σᵢ log(μ + α · Rᵢ)
where Rᵢ = Σ_{j<i} exp(−β(tᵢ − tⱼ)) is the recursive auxiliary variable.
Term 2 — Compensator integral:
∫₀ᵀ λ*(t) dt = μT + (α/β) · Σᵢ [1 − exp(−β(T − tᵢ))]
This integral has a closed form because the kernel is exponential: the contribution of each event i to the integral from tᵢ to T is (α/β)(1 − exp(−β(T−tᵢ))).
Computing Rᵢ naively costs O(n²) (sum over all previous events for each event). With the recursion:
R₁ = 0
Rᵢ = exp(−β(tᵢ − tᵢ₋₁)) · (1 + Rᵢ₋₁)
each Rᵢ is computed in O(1) from Rᵢ₋₁. Total cost: O(n).
def hawkes_loglik(params, events, T):
mu, alpha, beta = params
if mu <= 0 or alpha <= 0 or beta <= 0 or alpha >= beta:
return -np.inf
n = len(events)
R = np.zeros(n)
for i in range(1, n):
dt = events[i] - events[i-1]
R[i] = np.exp(-beta * dt) * (1.0 + R[i-1])
# Term 1: log intensities
intensities = mu + alpha * R
loglik = np.sum(np.log(intensities))
# Term 2: compensator
loglik -= mu * T + (alpha / beta) * np.sum(1 - np.exp(-beta * (T - events)))
return loglik
Maximize ℓ(θ) over θ = (μ, α, β) using scipy.optimize.minimize with the L-BFGS-B method (which supports box constraints):
from scipy.optimize import minimize
def neg_loglik(params):
return -hawkes_loglik(params, events, T)
result = minimize(
neg_loglik,
x0=[0.5, 0.3, 1.0], # initial guess
method='L-BFGS-B',
bounds=[(1e-6, None), (1e-6, None), (1e-6, None)]
)
mu_hat, alpha_hat, beta_hat = result.x
n_star_hat = alpha_hat / beta_hat
Practical tips:
n* = 1)alpha < beta to enforce stationarity (n* < 1) during optimizationalpha < beta can be imposed via a reparametrization: alpha = beta · n* with n* ∈ (0, 1)Approximate standard errors come from the observed Fisher information (negative Hessian of the log-likelihood at the MLE):
I_obs = −∂²ℓ/∂θ² |_{θ̂}
Cov(θ̂) ≈ I_obs⁻¹
SE(θ̂ⱼ) = sqrt(I_obs⁻¹[j,j])
scipy.optimize.minimize can return the inverse Hessian via result.hess_inv (with L-BFGS-B). Alternatively, use numerical second derivatives:
from scipy.optimize import approx_fprime
eps = 1e-5
hess = np.array([[
(neg_loglik(result.x + eps*eᵢ + eps*eⱼ) - neg_loglik(result.x + eps*eᵢ)
- neg_loglik(result.x + eps*eⱼ) + neg_loglik(result.x)) / eps**2
for j, eⱼ in enumerate(np.eye(3))] for i, eᵢ in enumerate(np.eye(3))])
cov = np.linalg.inv(hess)
The branching ratio n* = α/β is often the quantity of greatest interest. A profile likelihood for n* fixes n* at a grid of values and maximizes over (μ, α) (or equivalently (μ, β)) at each grid point:
ℓ_profile(n*) = max_{μ,β} ℓ(μ, n*·β, β)
A 95% confidence interval is {n* : 2(ℓ_max − ℓ_profile(n*)) ≤ χ²(0.95, df=1) = 3.84}.
A simulation study validates the estimator: simulate at known parameters, estimate, and check that the estimates are unbiased and the confidence intervals have correct coverage.
Key finding: when n* is close to 1, the MLE can be poorly identified. The log-likelihood surface becomes nearly flat along the n* direction, and estimates can have large variance. Short observation windows T exacerbate this. As a rule of thumb, aim for n · (1 − n*) ≫ 1 observed events for reliable estimation.
See code/07_hawkes_estimation.py for the full implementation including a simulation study of bias and variance across n* values.
O(n) recursive computation via Rᵢ.n*.n* → 1: longer observation windows and more events are needed.| ← Chapter 6 | Table of Contents | Chapter 8: Multivariate Hawkes Processes → |