product of Gaussian probability densities

So lately I have been following Udacity’s wonderful course on robotics and in particular was fascinated by the procedure of Kalman filter. My attempt here will be to expound on certain mathematical subtleties and missing deductive steps in the videos.

The first one has to do with updating prediction by incorporating observation. Somehow in my entire graduate education on probability the notion of product of densities functions is de-emphasized, or maybe I was too prematurely drawn into research to take advantage of the vast number of applied courses that do teach this stuff.

Imagine we have two independent observations of the same 1-dimensional quantity x, represented by $X_1$ and $X_2$, and suppose somehow we also know the variances $\sigma_1^2$ and $\sigma_2^2$. The question now is what will be the best estimate of what x actually is based on these two observations? Here by estimate we can only mean a single number. As a perhaps stupid remark, for single observations, the observed value is the unbiased minimum variance estimate of the mean itself: so the UMV estimator of $\mu_1$ given $X_1$ is in fact $X_1$.

Obviously some parametric assumption is needed. For instance one can assume the estimate takes the form $\hat{\mu} = \alpha X_1 + (1- \alpha) X_2$ for some $\alpha \in [0,1]$, that is, the final estimate is a convex combination of the two individual ones. This assumption is in fact more natural than I first thought, since it guarantees the estimator will be unbiased.

So now we have to find a suitable $\alpha$. What property do we desire in the final estimate $\hat{x}$, treated as a random variable? Why not stipulate that its variance is minimized in addition to being unbiased (i.e., UMV)? This then becomes an interesting optimization problem:
$\min_\alpha \text{var} \alpha X_1 + (1 -\alpha) X_2$. One can easily solve this to obtain
$\alpha^* = \sigma_2^2 / (\sigma_1^2 + \sigma_2^2)$.
Plugging in, we also get the following new estimate of the variance:
$\hat{\sigma}^2 = (\sigma_2^4 \sigma_1^2 + \sigma_1^4 \sigma_2^2) / (\sigma_1^2 + \sigma_2^2)^2 = 1/ (\sigma_1^{-2} + \sigma_2^{-2})$.

So what does this have to do with product of Gaussian pdf’s? If you multiply the two Gaussian densities and renormalize, you get exactly the same result as above! First of all, it’s remarkable that the resulting product is again proportional to a Gaussian. The easiest way to see is by multiplying things out:
$(x-\mu_1)^2 / \sigma_1^2 + (x -\mu_2)^2 / \sigma_2^2 =$ constant + the exponent of a Gaussian with the above mean and variance.

but conceptually the Fourier transform of a Gaussian is still Gaussian, and convolution of Gaussians is Gaussian, hence the closure extends to products. Note also convolution will increase the variance, but Fourier transform takes a big variance Gaussian to a small variance one, so the renormalized product will get skinnier.

If one thinks about maximal likelihood estimator, if $\mu$ is the true mean, then given two independent observations, $X_1$ and $X_2$, their joint density is proportional to $e^{-(X_1 - \mu)^2/ \sigma_1^2 - (X_2 - \mu)^2/ \sigma_2^2}$. The MLE of $\mu$ is precisely $\hat{x}$ (as obtained by optimizing the above density over $\mu$). The variance of $\hat{x}$ similarly inherits from those of $X_1$ and $X_2$, and should coincide with $\hat{\sigma}^2$ given above. In fact the distribution of $\hat{x}$ will be Gaussian.

p.s. At first I was trying to derive the MVUE from the point of view of Bayesian update, by letting the distribution of $X_1$ be the prior and that of $X_2$ the conditional likelihood function (in this case conditioning doesn’t do anything). I cannot make sense of it. So if someone is an expert in this, please enlighten me!