Chapter 7: Estimating Hawkes Processes

Given a sequence of observed event times, how do we fit a Hawkes process? The standard approach is maximum likelihood estimation (MLE). The exponential kernel admits a log-likelihood that can be computed in O(n) time using a recursive auxiliary variable — making MLE practical even for large datasets.


7.1 The Log-Likelihood

From chapter 1, the general log-likelihood for a point process on [0, T] is:

ℓ(θ) = Σᵢ log λ*(tᵢ; θ) − ∫₀ᵀ λ*(t; θ) dt

For the Hawkes process with exponential kernel φ(t) = α·exp(−βt), the two terms become:

Term 1 — Sum of log intensities at event times:

Σᵢ log(μ + α · Rᵢ)

where Rᵢ = Σ_{j<i} exp(−β(tᵢ − tⱼ)) is the recursive auxiliary variable.

Term 2 — Compensator integral:

∫₀ᵀ λ*(t) dt = μT + (α/β) · Σᵢ [1 − exp(−β(T − tᵢ))]

This integral has a closed form because the kernel is exponential: the contribution of each event i to the integral from tᵢ to T is (α/β)(1 − exp(−β(T−tᵢ))).


7.2 Recursive Computation in O(n)

Computing Rᵢ naively costs O(n²) (sum over all previous events for each event). With the recursion:

R₁ = 0
Rᵢ = exp(−β(tᵢ − tᵢ₋₁)) · (1 + Rᵢ₋₁)

each Rᵢ is computed in O(1) from Rᵢ₋₁. Total cost: O(n).

def hawkes_loglik(params, events, T):
    mu, alpha, beta = params
    if mu <= 0 or alpha <= 0 or beta <= 0 or alpha >= beta:
        return -np.inf

    n = len(events)
    R = np.zeros(n)
    for i in range(1, n):
        dt = events[i] - events[i-1]
        R[i] = np.exp(-beta * dt) * (1.0 + R[i-1])

    # Term 1: log intensities
    intensities = mu + alpha * R
    loglik = np.sum(np.log(intensities))

    # Term 2: compensator
    loglik -= mu * T + (alpha / beta) * np.sum(1 - np.exp(-beta * (T - events)))
    return loglik

7.3 Maximum Likelihood Estimation

Maximize ℓ(θ) over θ = (μ, α, β) using scipy.optimize.minimize with the L-BFGS-B method (which supports box constraints):

from scipy.optimize import minimize

def neg_loglik(params):
    return -hawkes_loglik(params, events, T)

result = minimize(
    neg_loglik,
    x0=[0.5, 0.3, 1.0],            # initial guess
    method='L-BFGS-B',
    bounds=[(1e-6, None), (1e-6, None), (1e-6, None)]
)
mu_hat, alpha_hat, beta_hat = result.x
n_star_hat = alpha_hat / beta_hat

Practical tips:


7.4 Standard Errors

Approximate standard errors come from the observed Fisher information (negative Hessian of the log-likelihood at the MLE):

I_obs = −∂²ℓ/∂θ² |_{θ̂}
Cov(θ̂) ≈ I_obs⁻¹
SE(θ̂ⱼ) = sqrt(I_obs⁻¹[j,j])

scipy.optimize.minimize can return the inverse Hessian via result.hess_inv (with L-BFGS-B). Alternatively, use numerical second derivatives:

from scipy.optimize import approx_fprime
eps = 1e-5
hess = np.array([[
    (neg_loglik(result.x + eps*eᵢ + eps*eⱼ) - neg_loglik(result.x + eps*eᵢ)
     - neg_loglik(result.x + eps*eⱼ) + neg_loglik(result.x)) / eps**2
    for j, eⱼ in enumerate(np.eye(3))] for i, eᵢ in enumerate(np.eye(3))])
cov = np.linalg.inv(hess)

7.5 Profile Likelihood for the Branching Ratio

The branching ratio n* = α/β is often the quantity of greatest interest. A profile likelihood for n* fixes n* at a grid of values and maximizes over (μ, α) (or equivalently (μ, β)) at each grid point:

ℓ_profile(n*) = max_{μ,β} ℓ(μ, n*·β, β)

A 95% confidence interval is {n* : 2(ℓ_max − ℓ_profile(n*)) ≤ χ²(0.95, df=1) = 3.84}.


7.6 Simulation Study: Checking Identifiability

A simulation study validates the estimator: simulate at known parameters, estimate, and check that the estimates are unbiased and the confidence intervals have correct coverage.

Key finding: when n* is close to 1, the MLE can be poorly identified. The log-likelihood surface becomes nearly flat along the n* direction, and estimates can have large variance. Short observation windows T exacerbate this. As a rule of thumb, aim for n · (1 − n*) ≫ 1 observed events for reliable estimation.

See code/07_hawkes_estimation.py for the full implementation including a simulation study of bias and variance across n* values.


7.7 Key Takeaways


← Chapter 6 Table of Contents Chapter 8: Multivariate Hawkes Processes →