My Favorite Deep Learning Papers of 2017

Wed 27 Dec 2017 Gregory J Stein

Even with so many deep learning papers coming out this year, there were a few publications I felt managed to rise above the rest. Here are the five papers that impacted my mental models the most over the last year. For each, I state the "goal" of the paper, briefly summarize the work, and explain why I found it so interesting.

Coolest Visuals: translating between unpaired sets of images with CycleGAN

Rather than describe exactly what the authors did here, I'll let some of the incredible results stand on their own:

These stunning images are from the CycleGAN paper, in which the authors learn a pair of translation networks capable of translating between unpaired sets of images.

The authors begin with two sets of images from different domains, e.g. of horses and zebras, and learn two translation networks: one that translates horse images to zebra images and another that translates zebra images to horse images. Each translator performs a sort of style transfer, but rather than targeting the style of a single image, the network discovers the aggregate style of a set of images.

The translation networks are trained as a pair of Generative Adversarial Networks, each trying to fool a discriminator into believing that their "translated" images are authentic. An additional "cycle consistency loss" is introduced that encourages an image to remain unchanged after being passed through both translation networks (i.e. forward and backward).

We used the CycleGAN approach to generate realistic synthetic training data for a recent paper of ours and the results were impressive:

The visuals for this paper are stunning, and I highly recommend taking a look at the GitHub project page for some additional examples. In particular, I was interested in this paper because, unlike many previous approaches, it learns to translate between unpaired sets of images, opening the door to applications for which matching image pairs may not exist, or may be difficult to obtain. Furthermore, the code is extremely easy to use and experiment with, demonstrating the robustness of the approach and the quality of the implementation.

Most Elegant: better neural network training using the Wasserstein distance

  • Title: Wasserstein GAN
  • Authors: Martin Arjovsky, Soumith Chintala, Léon Bottou (from the Courant Institute of Mathematical Sciences and Facebook AI Research)
  • Goal: Use a better objective function for more stable training of GANs.

This paper proposes training Generative Adversarial Networks using a slightly different objective function. The newly proposed objective function is much more stable to train than that of a standard GAN, since it avoids vanishing gradients during training:

This figure, taken from the Wasserstein GAN paper, shows how the proposed WGAN objective avoids the vanishing gradients that appear in a standard GAN.

Using this modified objective, the authors also avoid a problem known as mode collapse, in which a standard GAN produces samples from only a subset of possible outputs. For example, if a GAN is being trained to produce handwritten digits 4 and 6, the GAN might produce only 4's and be unable to escape that local minima during training. By eliminating the vanishing gradients in the training objective, the so-called Wasserstein GAN manages to avoid this issue.

In fact, the authors claim that "In no experiment did we see evidence of mode collapse for the WGAN algorithm."

The paper is wonderfully self-contained: the authors (1) motivate a simple idea, (2) show mathematically why it should improve upon the current state of the art and (3) have an impressive results section demonstrating its effectiveness. Furthermore, the changes the authors propose are easy to implement in nearly all popular deep learning frameworks, making it practical to adopt the proposed changes.

Even as we're constantly making progress towards better and better neural networks, it's worth remembering that there is still opportunity for simple insights to make a big difference.

Most Useful: unsupervised simulated training data refinement using GANs

Collecting real-world data can be difficult and time-consuming. As such, many researchers will frequently use simulation tools, which are capable of generating nearly infinite amounts of labeled training data. However, most simulated data is not sufficiently realistic for training deep learning systems that operate on real-world data.

Tools like the OpenAI gym are particularly useful for training data-hungry deep reinforcement learning agents.

To overcome this limitation, the paper uses a Generative Adversarial Network (GAN) to refine the labeled simulated images using unlabeled real-world images. They train a "refinement network" to fool a discriminative classifier that is trained to differentiate between the refined simulated images and the real images. As the refinement network and the classifier are trained in tandem, the refined simulated images begin to look impressively realistic:

This figure, from Shrivastava et al, shows the basic idea behind their "refinement network," in which labeled simulated more realistic using unlabeled real-world images via a GAN. In many cases, the refined synthetic images were indistinguishable from real-world images.

I was immediately interested in this paper when it came out, since it presented the first practical approach to bridging the gap between simulated and real-world data. The key takeaway here is that the algorithm is unsupervised, which means that a user does not need to hand-label the real-world data. For deep learning applications, data is king, yet most academic labs, like mine, don't have the resources to generate the volume of data necessary to rapidly tackle new research domains: if a public dataset does not exist for the problem you're trying to solve, you're stuck collecting and labeling that data yourself. The takeaway message of this paper is that, so long as you have a simulator for the problem you're trying to solve, you should be able to generate the training data you need.

Robotics, in particular, presents an interesting challenge: collecting and labeling data for domain-specific applications requires resources that may not be available to the academic community, where much of the research is being done.

Most Impressive: Google's Go playing AI learns from scratch

  • Title: Mastering the game of Go without human knowledge
  • Authors: David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, Yutian Chen, Timothy Lillicrap, Fan Hui, Laurent Sifre, George van den Driessche, Thore Graepel & Demis Hassabis (from DeepMind)
  • Goal: Learning to play the game of Go without any human examples

No best of 2017 list would be complete without acknowledging the impressive accomplishments of Google's DeepMind over the past year, particularly how it relates to AlphaGo, their Go playing AI. I won't spend much time motivating this work here, since those of you who've made it this far are likely familiar with the result from 2016, and the phenomenal paper describing how they architectured the system. However, the system was trained using expert-level human play as a starting point.

The recent AlphaGo Zero paper avoids incorporating any human knowledge or gameplay: it trains exclusively through "self-play." This is made possible by an improved reinforcement learning training procedure, in which the policy is updated as forward simulations of the game occur. The neural network used to guide search improves during play, making training much faster. AlphaGo Zero even surpasses the performance of AlphaGo Lee, which bested Lee Sedol in 2016, after only about 40 hours of play time.

This graph, taken from the AlphaGo Zero paper, shows the performance of AlphaGo Zero as a function of training time on a cluster of Google's Tensor Processing Units. After a few weeks of training, AlphaGo Zero outperforms all other Go-playing agents.

While my interest in this paper is mostly at an engineering level, I'm also encouraged by the hybrid classical and deep learning approach taken by AlphaGo, in which the addition of a Monte Carlo Tree Search allows the system to outperform a monolithic neural network. As someone who studies robotics, I'm encouraged by such combined approaches: use a classical algorithm as a backbone for decision making and use machine learning to improve performance or overcome computational limitations. This paper, and the 2016 AlphaGo paper are also both excellent to read; both are well written and filled with interesting technical details and insights. If for no other reason, these papers are worth a detailed read.

Most Thought Provoking: Deep Image Prior

  • Title: Deep Image Prior
  • Authors: Dmitry Ulyanov, Andrea Vedaldi, and Victor Lempitsky (from Skolkovo Institute of Science and Technology and University of Oxford)
  • Goal: Understand the prior our neural network models impart on our experiments.

To finish my list for 2017, I have an intriguing paper that had me and my colleagues talking for days. Rather than training a deep neural network with a ton of data, as is pretty standard these days, the authors of this paper wanted to explore how using the neural network itself as a prior could help us tackle some popular image processing tasks. They begin with an untrained neural network and, in the words of the authors "instead of searching for the answer in the image space we now search for it in the space of neural network's parameters," and avoid pertaining the neural network on a large dataset.

This image adapted from the Deep Image Prior paper shows the effect of applying their technique to remove JPEG compression artifacts. As the algorithm iterates, it will eventually overfit to the input, but first discovers a more natural-looking image without any artifacts; the structure of the network is such that it's easier to find a natural-looking image than a corrupted one. It's worth checking out the paper's project page for more examples.

I was immediately fascinated by this result: What does the structure of our neural networks imply about our data? How can we better understand this prior? How can we take advantage of this approach to build better network models? Of course, we as a community implicitly understand some of the constraints that our network structure imposes upon our data: it's unlikely that the CycleGAN approach would have worked as effectively if the "zebra" images were all upside down. Yet it raises some profound questions about our neural network models, and provides some interesting direction for the coming year.

I'm also pretty interested to see how the performance will vary as a function of the neural network initialization. Could we use this approach to select more sensible initialization techniques?


Obviously, this list does not aim to be comprehensive, but I welcome your thoughts and favorite papers in the comments below and on HackerNews.

Liked this post? Subscribe to our RSS feed!

+ Show Comments From Disqus