## Graphical inference model lecture 2

The indexing and bookkeeping of notations in this class have always been a nightmare to me. I try my best to state things correctly. Whenever possible I copy directly from Andrea’s notes.

The moral of today’s lesson is that trees are the easiest object to perform computational complexity reduction. Here we talk about factor graphs that have a tree form, and we assume bounded factor node degrees so that the depth of the tree is nontrivial (there is essentially no reduction when we perform the belief propagation algorithm on a star graph). The idea is to fix a variable node vertex, say ${0}$ of our tree graph ${G}$. And consider all its factor node neighbors, labelled ${a,b,c}$. Since removing ${x_0}$ results in three disconnected components in ${G}$, which can be labelled by ${a,b,c}$ as well, the marginal ${\mu_0(x_0)}$ factors as follows:

$\displaystyle \mu_0(x_0) \sim \sum_{x_{V \setminus 0}} \prod_{e \in F_{a \rightarrow 0}} \psi_e(x_e) \prod_{e \in F_{b \rightarrow 0}} \psi_e(x_e)\prod_{e \in F_{c \rightarrow 0}} \psi_e(x_e) \\ \sim [\sum_{x_{v_{a\rightarrow 0} \setminus 0}} \prod_{e \in F_{a \rightarrow 0}} \psi_e(x_e)] [\sum_{x_{v_{a\rightarrow 0} \setminus 0}}\prod_{e \in F_{b \rightarrow 0}} \psi_e(x_e)][\sum_{x_{v_{a\rightarrow 0} \setminus 0}}\prod_{e \in F_{b \rightarrow 0}} \psi_e(x_e)]\\ = \mu_{a \rightarrow 0} (x_0) \mu_{b \rightarrow 0}(x_0) \mu_{c \rightarrow 0}(x_0) \ \ \ \ \ (9)$

where ${F_{a \rightarrow 0}}$ denotes the edges in component of ${a}$ including the vertex ${0}$. The last equation defines the partial marginals ${\mu_{a \rightarrow 0}(x_0)}$, etc. They will be called beliefs in the case of general graph for reasons given below. Next we can perform a similar trick on ${\mu_{a \rightarrow 0}(x_0)}$ to express it in terms of partial marginals of ${x_a}$. Say ${a}$ is further connected to 2 other variable nodes ${1,2}$, then

$\displaystyle \mu_{a \rightarrow 0}(x_0) \sim \sum_{x_{v_{a \rightarrow 0} \setminus 0}} \psi_a(x_a) \prod_{e \in F_{1 \rightarrow a}} \psi_e(x_e) \prod_{e \in F_{2 \rightarrow a}} \psi_e(x_e)\\ \sim \sum_{x_1,x_2} \psi_a(x_0,x_1,x_2) [\sum_{x_{v_{1 \rightarrow a} \setminus 1}} \prod_{e \in F_{1 \rightarrow a}} \psi_e(x_e)] [\sum_{x_{v_{2 \rightarrow a} \setminus 1}} \prod_{e \in F_{2 \rightarrow a}} \psi_e(x_e)]\\ =\sum_{x_1,x_2} \psi_a(x_0,x_1,x_2) \mu_{1 \rightarrow a}(x_1)\mu_{2 \rightarrow a}(x_2), \ \ \ \ \ (10)$

where the last equality defines ${\mu_{1 \rightarrow a}(x_1)}$ and ${\mu_{2 \rightarrow a}(x_2)}$. To be more abstract, we have a system of pairs of equations relating factor nodes and variable nodes, known as Belief propogation:

$\displaystyle \nu_{j \rightarrow a} (x_j) \sim \prod_{b \in \partial j \setminus a} \hat{\nu}_{b \rightarrow j}(x_j)\\ \hat{\nu}_{a \rightarrow j}(x_j) \sim \sum_{x_{\partial a \setminus j}} \psi_a(x_{\partial a}) \prod_{k \in \partial a \setminus j} \nu_{k \rightarrow a}(x_k) \ \ \ \ \ (11)$

where ${\nu}$ corresponds to the partial marginal of the variable nodes and ${\hat{\nu}}$ the factor nodes. On trees, one easily sees that the above coupled system of nonlinear equations (watch for the normalization factor, since ${\sim}$ means proportional to, or equal up to the partition function) admit a unique solution: this can be done by an inductive argument, starting with the leaves. Since the leaves have no descendents, their partial marginals are uniquely determined to be ${1}$, the empty product, if they are variables nodes. If they are factor nodes, then we have a deterministic sum on the right hand side of the second equation, depending on no unknowns. Once the outer layer is determined, we can treat them in the same way as ${\psi}$‘s, and view the inner layer as the new leaves. The solution of the system above gives a way to compute the joint distribution of the varaibles attached to all nodes, satisfying the graphical structure. In the case of general graph, the ${\nu}$ and ${\hat{\nu}}$ have no probabilistic interpretation, hence are called beliefs rather than partial marginals. One can also introduce a dynamical version of the system, but starting with some random guess of the beliefs, and feed into the following system:

$\displaystyle \nu_{j \rightarrow a}^{(t+1)} (x_j) \sim \prod_{b \in \partial j \setminus a} \hat{\nu}^{(t)}_{b \rightarrow j}(x_j)\\ \hat{\nu}^{(t)}_{a \rightarrow j}(x_j) \sim \sum_{x_{\partial a \setminus j}} \psi_a(x_{\partial a}) \prod_{k \in \partial a \setminus j} \nu^{(t)}_{k \rightarrow a}(x_k) \ \ \ \ \ (12)$

as ${\nu^{(0)}}$ and ${\hat{\nu}^{(0)}}$. It clearly converges on trees, because once the leaves are updated, they no longer change in subsequent evolution, and become constant factors just as we described in the previous paragraph. In fact the convergence is exact in ${n}$ steps, where ${n}$ is the depth of the tree (the distance from the chosen root to the farthest leaf). For general graphs, there is not even a unique fixed point of the operator ${T_{BP}:(\nu^{(t)},\hat{\nu}^{(t)}) \rightarrow (\nu^{(t+1)},\hat{\nu}^{(t+1)})}$, let alone convergence. But under very mild conditions, namely ${\psi_a > 0}$ for all ${a \in F}$, one can show that fixed points of ${T_{BP}}$ do exist (though not unique in general). The key tool is Brouwer’s fixed point theorem:

Theorem 3 If ${f:[0,1]^n \rightarrow [0,1]^n}$ is a continous map, then there is a fixed point.

Andrea gave a vivid demonstration of the theorem, by rolling (without tearing) a piece of paper into a ball, and saying that some part of the paper must stayed in the same 2D position as before the rolling. The condition on ${\psi_a}$‘s is precisely to ensure continuity of ${T_{BP}}$. BP not only stands for belief propogation apparently, but also Bethe-Pierl. What a historic coincidence.