# A Hidden Markov Model Approach to Statistical Estimation and Numerical Optimization under Uncertainty

## Vivak Patel

Department of Statistics, University of Chicago
April 21, 2017

# Problems & Solutions

## Linear Regression

Let $X,X_1,X_2,\ldots \in \mathbb{R}^{d}$ be independent, identically distributed, $l^2$ random variables such that $$0 \prec \mathbb{E}\left[XX'\right]$$

Let $\epsilon,\epsilon_1,\epsilon_2,\ldots$ be independent, identically distributed, mean zero, finite variance random variables.

Let $\beta^* \in \mathbb{R}^d$, and define $$Y = X'\beta^* + \epsilon \text{ and } Y_k = X_k'\beta^* + \epsilon_k$$

Problem: Given $\lbrace (Y_k,X_k):k=1,\ldots,N \rbrace$ find an estimator for $\beta^*$ which scales as $N \to \infty$.

# Problems & Solutions

## Linear Regression

Problem: Given $\lbrace (Y_k,X_k):k=1,\ldots,N \rbrace$ find an estimator for $\beta^*$ which scales as $N \to \infty$.

Solution 1: At each $N$ compute $$\beta_N = \arg \min \Vert Y - X\beta \Vert_2^2.$$

Solution 2: At each $N$ compute $$\beta_N = \beta_{N-1} + \alpha_{N-1}X_{N}(Y_N - X_{N}'\beta_{N-1}).$$

# Problems & Solutions

## Logistic Regression (Canonical Link, Linear Predictor)

Problem: Given $\lbrace (Y_k,X_k):k=1,\ldots,N \rbrace$ find an estimator for $\beta^*$ which scales as $N \to \infty$.

Solution 1: At each $N$ compute $$\beta_N = \arg \min \sum_{k=1}^N \log(1 + \exp(X_k'\beta)) - Y_k X_k'\beta.$$

Solution 2: At each $N$ compute $$\beta_N = \beta_{N-1} + \alpha_{N-1}X_N \left( Y_N - \frac{1}{1 + \exp(-X_N'\beta_{N-1})} \right).$$

# An HMM Formulation

## Statistical Filtering

If we view the estimation problem as a filtering problem, then we can use statistical filter to estimate (Kalman, Extended Kalman, Unscented, Particle, etc.)

Kalman Filter for Linear Regression $$\beta_N = \beta_{N-1} + M_{N-1}X_{N}\frac{ Y_N - X_{N}'\beta_{N-1}}{\gamma + X_{N}'M_{N-1}X_{N}}$$ where $$M_{N} = (I - \frac{M_{N-1}X_{N}X_{N}'}{\gamma + X_{N}'M_{N-1}X_N})M_{N-1}.$$

# Some Results

## Summary of Results for Linear Regression

1. The parameter estimator is consistent
2. The covariance estimator is consistent up to a multiplicative factor.
3. There is a robust set of choices for the parameter $\gamma$
4. Achieves the (information theoretic) "lower bound" convergence rate complexity.

## What You Will Not Find

1. Asymptotic distribution results
2. A framework for extending the Kalman Filter to other estimation problems

# Problems & Solutions

## Composite Objective Function Minimization

Problem: Consider the following optimization problem $$\min \sum_{i=1}^N f_i(\theta),$$ where $N$ is very large. For such problems, it is often infeasible to parse all $N$ component functions per iteration owing to memory and communication limitations. This prevents us from using standard optimization methods such as gradient descent, quasi-Newton, etc.

Solution 1: Subsample the objective function and use an incremental estimator.

Solution 2: Subsample the objective function and use a stochastic method designed a discrete, finite probability space.

# Problems & Solutions

Problem: Consider the problem of optimizing an objective function $f(\theta)$, but any attempt to evaluate $f(\theta)$ only returns $$f(\theta) + \epsilon,$$ where $\epsilon$ is a random variable which may be strongly correlated to $f(\theta)$. Moreover, the gradient of $f(\theta)$ cannot be evaluated, noisy or otherwise.

Solution: ...

# Problems & Solutions

## Linear Stochastic Program with Recourse

Problem: Let $T$ and $h$ be matrix-valued and vector-valued random variables. Consider \begin{aligned} \min_{\theta} ~& c'\theta + \mathbb{E}\left[\min_{y} q'y \right] \\ \text{subject to} ~& A\theta = b \\ & T\theta + Wy = h \\ & \theta \in \Theta \\ & y \in Y \end{aligned}

Solution: Generate samples of $T,h$ and replace the expectation with an empirical estimate.

# An HMM Formulation

Consider the following generic problem \begin{aligned} \min_{\theta}~& \mathbb{E}\left[ f(\theta,\xi) \right] \end{aligned} where $\xi$ is a random variable which we can independently sample from at each iteration to produce $\xi_1,\xi_2,\ldots$.

Suppose also that $$\nabla_\theta \mathbb{E}\left[ f(\theta,\xi) \right] = \mathbb{E}\left[ \nabla_\theta f(\theta,\xi) \right]$$

Finally, suppose that for given values of $\theta$ and $\xi$, $f(\theta,\xi)$ and $\nabla_\theta f(\theta,\xi)$ can be evaluated.

# An HMM Formulation

Hidden & Observed States: $\mathbb{E}[f]$, $\mathbb{E}[\nabla f]$, $f(\cdot,\xi_k)$ and $\nabla f(\cdot, \xi_k)$

Transition Dynamics: $$\mathbb{E}[f(\theta,\xi)] = \mathbb{E}[f(\theta_k,\xi_k)] + \nabla \mathbb{E}[f(\theta_k,\xi_k)]'(\theta - \theta_k) + \epsilon \Vert \theta - \theta_k \Vert_2^2,$$ where $\epsilon$ is mean zero and has some finite variance.

Likelihood Model: $$f(\theta,\xi_k) = \mathbb{E}[f(\theta,\xi)] + \eta,$$ where $\eta$ is mean zero and has some finite variance.

/