We have seen the log-likelihood for individual models in earlier chapters. This chapter pulls together the full framework for likelihood-based inference: parameter estimation, uncertainty quantification, and model comparison. The tools here apply uniformly across all the point process families in this book.
For any simple point process with conditional intensity λ*(t; θ), the log-likelihood of observing events t₁ < ... < tₙ on [0, T] is:
ℓ(θ) = Σᵢ₌₁ⁿ log λ*(tᵢ; θ) − ∫₀ᵀ λ*(t; θ) dt
Derivation sketch: Partition [0, T] into bins of width δ. The probability of the observed event pattern is approximately:
∏ᵢ [λ*(tᵢ)δ] · ∏_{bins without events} [1 − λ*(t)δ]
≈ ∏ᵢ λ*(tᵢ) · δ · exp(−∫ λ*(t) dt)
Taking logs and dropping the δ-dependent constant gives the formula.
The compensator ∫₀ᵀ λ*(t) dt must be evaluated exactly for efficient optimization. Its form depends on the model:
| Model | Compensator |
|---|---|
| HPP(λ) | λ · T |
| NHPP(λ(t)) | ∫₀ᵀ λ(t) dt (numerical if no closed form) |
| Hawkes (exp kernel) | μT + (α/β) Σᵢ [1 − exp(−β(T−tᵢ))] |
| Renewal | No simple closed form; use numerical integration |
For the HPP and Hawkes (exponential kernel), the compensator is analytic, making optimization fast and stable.
The negative log-likelihood −ℓ(θ) is minimized using scipy.optimize.minimize:
from scipy.optimize import minimize
def fit_model(neg_loglik_fn, x0, bounds):
best = None
for _ in range(10): # multiple restarts
x0_perturbed = x0 * np.exp(np.random.randn(len(x0)) * 0.5)
result = minimize(neg_loglik_fn, x0_perturbed,
method='L-BFGS-B', bounds=bounds,
options={'maxiter': 1000, 'ftol': 1e-12})
if best is None or result.fun < best.fun:
best = result
return best
Key practices:
n* < 1)result.success and result.fun at optimumGiven two fitted models with k₁ and k₂ parameters and maximized log-likelihoods ℓ̂₁ and ℓ̂₂:
AIC = −2ℓ̂ + 2k
BIC = −2ℓ̂ + k · log(n)
Lower AIC/BIC is better. AIC favors predictive accuracy; BIC penalizes parameters more heavily and is consistent (selects the true model as n → ∞ under the true model).
Example comparison:
| Model | Parameters k | log-likelihood ℓ̂ | AIC | BIC |
|---|---|---|---|---|
| HPP | 1 | -312.4 | 626.8 | 628.2 |
| NHPP (sinusoidal) | 3 | -287.1 | 580.2 | 586.4 |
| Hawkes | 3 | -261.8 | 529.6 | 535.8 |
The Hawkes model wins decisively here, justifying the extra complexity.
For nested models (model 0 is a special case of model 1 with k₁ − k₀ = d extra parameters), the likelihood ratio test (LRT) gives a formal hypothesis test:
H₀: restricted model (k₀ parameters)
H₁: full model (k₁ parameters)
LRT statistic: W = 2(ℓ̂₁ − ℓ̂₀) ~ χ²(d) under H₀
Example: Testing whether a Hawkes process reduces to a Poisson process (H₀: α = 0):
H₀, fit the HPP (1 parameter): ℓ̂₀H₁, fit the Hawkes (3 parameters): ℓ̂₁W = 2(ℓ̂₁ − ℓ̂₀) ~ χ²(2) under H₀H₀ if W > 5.99 (p < 0.05)from scipy.stats import chi2
W = 2 * (loglik_hawkes - loglik_hpp)
p_value = 1 - chi2.cdf(W, df=2)
print(f"W = {W:.2f}, p = {p_value:.4f}")
Wald confidence intervals (using the observed Fisher information):
SE(θ̂ⱼ) ≈ sqrt([I_obs⁻¹]ⱼⱼ)
CI: θ̂ⱼ ± 1.96 · SE(θ̂ⱼ)
where I_obs = −∇²ℓ(θ̂) is the negative Hessian.
Profile likelihood intervals are more accurate near boundaries and for non-linear parameters like n* = α/β:
ℓ_profile(n*) = max_{μ, β} ℓ(μ, n*·β, β)
CI: {n* : 2(ℓ̂ − ℓ_profile(n*)) ≤ 3.84}
Profile likelihood CIs are preferred for the branching ratio because the Wald approximation can be poor near n* = 0 or n* = 1.
For small samples where asymptotic theory may not apply, use the parametric bootstrap:
θ̂B datasets from the fitted modelθ̂₁*, ..., θ̂_B*θ̂* to compute standard errors and confidence intervalsB = 500
bootstrap_estimates = []
for _ in range(B):
sim_events = simulate_hawkes(mu_hat, alpha_hat, beta_hat, T)
result = fit_hawkes(sim_events, T)
bootstrap_estimates.append(result.x)
bootstrap_estimates = np.array(bootstrap_estimates)
SE_bootstrap = bootstrap_estimates.std(axis=0)
See code/11_mle_model_selection.py for the full pipeline.
ℓ = Σ log λ*(tᵢ) − ∫ λ*(t) dt is universal; only the form of λ*(t) changes.−ℓ numerically with multiple restarts; enforce parameter constraints via bounds.| ← Chapter 10 | Table of Contents | Chapter 12: Goodness-of-Fit → |