- show how critic weight clipping can lead to pathological behavior
- propose WGAN with gradient penalty
- demonstrate stable training of many difficult GAN architectures with default settings, performance improvements over weight clipping
- loss $$ \min_G\max_{D\in \mathcal D}\mathbb E_{x\backsim \mathbb P_r}[D(x)]-\mathbb E_{\hat x\backsim \mathbb P_g}[D(\hat x)] $$
where
$\mathcal D$ is the set of 1-Lipschitz functions
- To enforce the Lipschitz constraint on the critic, WGAN propose to clip the weights of the critic to lie within a compact space
$[-c, c]$ . The set of functions satisfying this constraint is a subset of the$k$ -Lipschitz functions for some$k$ which depends on$c$ and the critic architecture.