A Statistical Theory of
the Kalman Filter

Vivak Patel

Department of Statistics, University of Chicago
SIAM UQ, Lausanne, Switzerland
April 8, 2016

Outline

I. Kalman Filter

II. Single State Estimation

III. Selective Assimilation

IV. Incremental Estimator

V. Numerical Experiment

VI. Conclusions

Kalman Filter

Let $\lbrace \chi_t: t \in \mathbb{N} \rbrace$ be a sequence of states in $\mathbb{R}^d$ related by the stochastic difference equation $$ \chi_{t+1} = D_{t}\chi_t + η_t $$ where $D_t \in \mathbb{R}^{d \times d}$ and $η_t ∼ \mathcal{N}(0,Q_M)$.

Our objective is to estimate $\chi_t$ given $D_t$ and a sequence of noisy observations $\lbrace Z_t: t \in \mathbb{N} \rbrace$ related to $\chi_t$ $$ Z_t = O_{t}\chi_t + \xi_t $$ where $O_t \in \mathbb{R}^{d \times d'}$ and $\xi_t ∼ \mathcal{N}(0,Q_O)$

The Kalman Filter estimates $\chi_{t+1}$ and its covariance $\mathcal{C}_{t+1}$ from estimates $\hat{\chi}_t$ and $C_t$ using: $$\begin{aligned} \hat{\chi}_{t+1} &= \arg\min \left\lbrace \left\Vert Z_{t+1} - O_{t+1}\chi \right\Vert_{Q_O^{-1}}^2 + \left\Vert \chi - D_t\hat{\chi}_{t} \right\Vert_{(D_tC_tD_t^T + Q_M)^{-1}}^2 \right\rbrace \\ C_{t+1}^{-1} &= (D_t C_{t}D_t^T + Q_M)^{-1} + O_{t+1}^TQ_0^{-1} O_{t+1} \end{aligned}$$

Kalman Filter

Observability/Controllability

Exponential Convergence (Fixed State, Deterministic Observations)

Nonexpansive in Specialized Metrics

Kalman Filter

Questions important to statisticians:

  1. Q: How many observations are needed (with stochastic dynamics and noisy observations) to estimate
    a single state "well"?
    A: With probability approaching 1, objective decays like $d/k$.
    $ $
  2. Q: Does the covariance estimate actually estimate the asymptotic covariance of the parameter estimate?
    A: Yes, it does.
    $ $
  3. Q: How important are the apriori parameter and covariance estimate?
    A: ...
    $ $

Kalman Filter

Questions important to numerical optimizer:

  1. Q: Can we do "better" than the Kalman Filter?
    A: No.
    $ $
  2. Q: What is the rate of convergence of the Kalman Filter?
    A: Sublinear to a single state. Presumably, we cannot do better.
    $ $
  3. Q: Do we need to carry around the covariance estimate from state to state?
    A: ...
    $ $
  4. Q: Can we parallelize computations over relevant dimensions?
    A: ...
    $ $

Kalman Filter

Questions important to UQ community (not in other categories):

  1. Q: How well does the KF "perform" when the dynamics are just stable (e.v. of 1) or unstable (e.v. > 1)?
    A: ...
    $ $
  2. Q: How well does the KF "perform" when deterministic dynamics are misspecified?
    A: ...
    $ $
  3. Q: How well does the KF "perform" when model noise is misspecified/correlated from state to state?
    A: ...
    $ $
  4. Q: How well does the (E)KF "perform" when the observation function is misspecified?
    $ $
  5. Q: How well does the KF "perform" when the observation model noise is misspecified/correlated from state to state?

Single State Estimation

To get to statistical notation:

Selective Assimilation

Question: Suppose $\hat{\beta}_k$ is the estimator of $\beta^*$ after $k$ observations are assimilated. How quickly does $\hat{\beta}_k$ converge to $\beta^*$?

General Assumptions:

(Iterative) Batch Estimator: $$\hat{\beta}_k = \arg\min \left\lbrace \sum_{j=1}^k \frac{1}{\sigma_j^2}(Y_j - X_j^T \beta)^2 + \left\Vert\beta - \beta_0\right\Vert_{M_0^{-1}}^2 \right\rbrace$$ If $\beta_0 \in \mathbb{R}^d$ and $M_0 \succ 0$ and conditioned on $X_1,\ldots,X_k$ then $\hat{\beta}_k \to \beta^*$ almost surely. However:

Incremental Estimator

Kalman Filter (Single State) $$\begin{aligned} \hat{\beta}_{k+1} &= \arg\min\left\lbrace \frac{1}{\gamma_k^2}(Y_{k+1} - X_{k+1}^T \beta)^2 + \left\Vert \beta - \hat{\beta}_k \right\Vert_{M_k^{-1}}^2 \right\rbrace \\ M_{k+1}^{-1} &= M_k^{-1} + \frac{1}{\gamma_k^2}X_{k+1}X_{k+1}^T \end{aligned}$$

Assumptions:

Incremental Estimator

Tuning Parameter Requirements: $$0 < \delta^2 \leq \inf_k \gamma_k^2 \leq \sup_k \gamma_k^2 \leq \Delta^2 < \infty$$ $$\beta_0 \text{ is arbitrary and }M_0 \succ 0$$

Theorem 1: Conditioned on $X_1,X_2,\ldots,X_k$, $\left\Vert \beta_k - \beta^* \right\Vert \to 0$ almost surely.

Theorem 2: Let $$\mathcal{M}_k = \mathbb{E}\left[\left.\left(\hat{\beta}_k - \beta^*\right)\left(\hat{\beta}_k - \beta^*\right)^T \right\vert X_1,\ldots,X_k \right]$$ If $\sigma_j^2 = \sigma^2 ~ \forall j \in \mathbb{N}$ then for all $\epsilon > 0$ asymptotically almost surely $$ \frac{1 - \epsilon}{\Delta^2}M_k \preceq \frac{1}{\sigma^2} \mathcal{M}_k \preceq \frac{1+\epsilon}{\delta^2} M_k$$

Numerical Experiment

Data Set

Linear Model

Tuning Parameter

Numerical Experiment

Convergence of Estimated Covariance.

Conclusions

  1. Q: How many observations are needed (with stochastic dynamics and noisy observations) to estimate
    a single state "well"?
    A: (The trace of) $M_k$ is sufficient for determining convergence.
    $ $
  2. Q: Does the covariance estimate actually estimate the asymptotic covariance of the parameter estimate?
    A: Yes. $\frac{1-\epsilon}{\max_k \gamma_k^2}M_k \prec \frac{1}{\sigma^2}\mathcal{M}_k \prec \frac{1+\epsilon}{\min_k \gamma_{k}^2} M_k$
    $ $
  3. Q: Can we converge faster than the Kalman Filter?
    A: For a single state and given that $M_k$ is estimating $\mathcal{M}_k$, no. We will incrementally invert the hessian
    of the objective. In fact, we show that the conditioning has no impact on the rate of convergence.
    $ $

Thank You

Acknowledgements

Slides
This slide deck can be found at galton.uchicago.edu/~vpatel.

Reference

Patel, Vivak. "Kalman-based Stochastic Gradient Method with Stop Condition and Insensitivity to Conditioning." arXiv preprint arXiv:1512.01139 (2015).

/