## Graphical inference model: lecture 4. Max product algorithm and group testing

Max product is algorithm is closely related to the sum product algorithm, in that it uses partial marginals to compute the total marginal. But the objective is to find the mode of a distribution, i.e., ${\arg \max _x \mu(x)}$. It does that by computing the partial max marginals, ${M_i(y_i) = \max_x \{ \mu(x): x_i = y_i\}}$, and then take its argmax. It can be approximated by a BP(belief propogation) algorithm similar to computing partial marginals:

$\displaystyle \nu_{j \rightarrow a}^{(t+1)} (x_j) \sim \prod_{b \in \partial j \setminus a} \hat{\nu}^{(t)}_{b \rightarrow j}(x_j)\\ \hat{\nu}^{(t)}_{a \rightarrow j}(x_j) \sim \max_{x_{\partial a \setminus j}} [\psi_a(x_{\partial a}) \prod_{k \in \partial a \setminus j} \nu^{(t)}_{k \rightarrow a}(x_k)] \ \ \ \ \ (1)$

It’s called max-product because to compute its iterations, one needs only to perform maximum and product operation. The normalization is not important here actually, if we are only interested in the argmax, i.e., mode, instead of mode probability. One thing about existence of BP fixed points we forgot to mention last time: the reason ${\psi_a > 0}$ implies continuity of the BP operator ${T_{BP}}$ is that it guarantees that the map ${\{\hat{\nu}^{(t)}_{j \rightarrow a} (x_j)\} \mapsto \{\hat{\nu}^{(t)}_{a \rightarrow j} (x_j)\}}$ is always well defined. The other map ${\{\hat{\nu}^{(t)}_{a \rightarrow j} (x_j)\}\mapsto \{\hat{\nu}^{(t+1)}_{j \rightarrow a} (x_j)\}}$ is always continuous on ${[0,1]^n}$ because marginal probabilities add up to ${1}$, hence we don’t have division by ${0}$ issue. I don’t quite understand max product normalizations. Max marginals can be computed iteratively: first label all the vertices ${x_1, \ldots, x_n}$, The last part of the lecture is spent on a particular factor graph model with binary variables on the vertices. The setting is for group testing and arises from the following WWII scenario: the soldiers are tested for a certain number of diseases. But in order to reduce the cost of tests, the blood of groups of 100 people, say, is tested collectively, and if it tests positive for disease A, then we want to infer a certain probability of each individual having the disease. The diseases will be labelled as 1,2,3, etc, whereas the symptoms (factor nodes ) are labelled by a, b, c, etc. This gives a factor graph. One could perhaps use orthogonal testing to minimize the number of symptom recordings, and correlations among different diseases play a role in the factor functions ${\psi_a}$‘s. The variables represent positive or negative, hence are binary. Two main conclusions from binary variables: under the BP flow, the model is anti-monotone, meaning if for two prior probabilities ${\mu}$ and ${\nu}$, we have ${\mu_{i \rightarrow a} (x_i) \le \nu_{i \rightarrow a}(x_i)}$ for variable to factor node edge ${i \rightarrow a}$, then after one iteration of BP, the posterior probabilities have reversed inequalities. This has the following intuitive explanation: if some disease ${i}$ has a high influence on the symptom ${a}$, then the other diseases have lower influence on ${a}$ by reverse inference, which is the conclusion of the model after one iteration of BP. One can also show that ${T_{BP}^2}$ has at least two fixed points,

$\displaystyle \nu_1 \le \nu_2. \ \ \ \ \ (2)$

Consider the least element ${\mu_0}$ in the space of all partial marginals, i.e., all influence probabilities are ${0}$. Iterating it twice we get ${\mu_{1/2} \ge \mu_0}$. Since ${T_{BP}}$ reverses ordering, ${T^2}$ preserve it, hence we get a monotone sequence that converges to ${\nu_1}$. Conversely, the biggest element ${\mu_\infty}$ and we get another sequence that converges to ${\nu_2}$. By monotonicity we also get (2).