A geometric view of optimal transportation and generative model (Q669014)

From MaRDI portal
scientific article
Language Label Description Also known as
English
A geometric view of optimal transportation and generative model
scientific article

    Statements

    A geometric view of optimal transportation and generative model (English)
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    20 March 2019
    0 references
    The reviewed article discusses some interesting connections between optimal transportation, convex geometry, and the generated adversarial networks (GANs) of machine learning, giving a brief introduction to each of the three topics and the relation between them followed by a series of numerical examples offered as proof-of-concept. According to their summary of the Wasserstein GAN approach of \textit{M. Arjovsky}, \textit{S. Chintala} and \textit{L. Bottou} [``Wasserstein generative adversarial networks'', in: International Conference on Machine Learning. 214--223 (2017)], the generator's problem is to choose from an (low) $m$-dimensional family of distributions $\{\mu_\theta\}_{\theta \in \mathbb{R}^m}$ on a (high) $n$-dimensional $\mathbb{R}^n$, the one which best approximates a given distribution $\nu$ (typically concentrated on a submanifold $\Sigma^k \subset \mathbb{R}^n$ with $m\ll k \ll n$). Here \textit{best} is measured with respect to a transportation cost $c$ on $\mathbb{R}^n \times \mathbb{R}^n$, such as the (Kantorovich-Rubinstein-)Wasserstein $L^p$ metric $W_c$ corresponding to $c(x,y)=|x-y|^p$. The role of the discriminator (or adversary) is to compute $W_c(\mu_\theta,\nu)$ for each $\theta$ the generator suggests, or at least to approximate it from below. The generator has a prior $d\zeta(\theta)$ on $\mathbb{R}^m$ and the comparison can be done either in the high or the low dimensional space (which are partly identified using a decoding map $g_\theta:\mathbb{R}^m \longrightarrow \mathbb{R}^n$ and encoding map $g_\theta^{-1}$) -- a point of potential confusion potentially confusing point clarified only at the conclusion of the article. As \textit{L. A. Caffarelli} [Lect. Notes Pure Appl. Math. 177, 29--35 (1996; Zbl 0883.49030)], \textit{W. Gangbo} and the reviewer [C. R. Acad. Sci., Paris, Sér. I 321, No. 12, 1653--1658 (1995; Zbl 0858.49002)] showed, when $c(x,y) = h(x-y)$ is a strictly convex function of $x-y \in \mathbb{R}^n$, the minimum defining $W_c(\mu_\theta,\nu)$ is uniquely attained, and can be inferred from the corresponding Kantorovich (linear programming dual) potential $\xi(y)$ on $\mathbb{R}^n$. This appears as Theorem 3.7 of the present work, highlighted in the introduction and which the authors highlight in the introduction and interpret to mean that the generator's strategy is determined by that of the discriminator. While this is true for each fixed $\theta \in \mathbb{R}^m$, it does not seem to determine the optimal $\theta$ (unless the $\xi$ corresponding to the optimal $\theta$ is known). When $\nu$ has finite support, like an empirical measure, and $p=2$, they go on to relate this problem to Alexandrov's unbounded version of the Minkowski problem, which is to construct a piecewise affine convex graph by prescribing the area and slope of each of its facets. Several additional references would help to clarify the relation of the present paper to previous work. For example, Theorem 4.3, here attributed to \textit{X. Guo} et al. [``Relaxed Wasserstein with applications to gans'', Preprint, \url{arXiv:1705.07164}], is the rediscovery of a variational approach to the latter problem by \textit{F. Aurenhammer} et al. [Algorithmica 20, No. 1, 61--76 (1998; Zbl 0895.68135)]. This corresponds to the $p=2$ case of the reviewer's conjecture with Gangbo (example 1.6 of [Gangbo and the reviewer, loc. cit.]), which is implied by the convex dependence of the dual maximization on $\xi$ observed by \textit{G. Carlier} and \textit{I. Ekeland} [Econ. Theory 42, No. 2, 397--418 (2010; Zbl 1183.91112)] for general $p>1$, and which was rigorously established under additional regularity assumptions of Ma, Trudinger and Wang type by Kitagawa et al, Merigot and Thibert in Theorem 1.3 of \textit{J. Kitagawa} et al., [``Convergence of a newton algorithm for semi-discrete optimal transport'', Preprint, \url{arXiv:1603.05579}]. In section 4.3, the authors propose a Newton's method approach to solving the $p=2$ problem, which appears closely related to a variant of the damped Newton's method implemented by Mérigot et al Meyron and Thibert (Theorem 14 of [\textit{Q. Mérigot} et al., SIAM J. Imaging Sci. 11, No. 2, 1363--1389 (2018; Zbl 1401.49068)]) based on earlier work of Mérigot and of Levy cited therein. The article contains a few obvious typos, such as the absence of Figure 2, the claim that Brenier's potential is real-valued in Theorem 3.5 (it may need to take extended real values) and the assertion that Monge's problem [\textit{G. Monge}. Mémoire sur la théorie des déblais et de remblais. Histoire de l'Académie Royale des Sciences de Paris, avec les Mémoires de Mathématique et de Physique pour la même année. 666--704 (1781)] was first studied by \textit{N. Bonnotte} [SIAM J. Math. Anal. 45, No. 1, 64--87 (2013; Zbl 1263.49059)], and misattributions such as ``the sink-horn method has been introduced by \textit{M. Cuturi} and \textit{G. Peyré} [SIAM Rev. 60, No. 4, 941--965 (2018; Zbl 1402.49037)]'' which really ought to read might more appropriately ``the Sink-Horn method detailed along with its history in Remark 4.4 of [loc. cit.].'' Still it is a stimulating contribution to the literature which may well lead to further developments.
    0 references
    0 references
    optimal mass transportation
    0 references
    Monge-Ampere
    0 references
    GAN
    0 references
    Wasserstein distance
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references