No Free Lunch and Neural Network Architecture

Fri 24 May 2019 Gregory J Stein
Part 7 of AI Perspectives

Summary: Machine learning must always balance flexibility and prior assumptions about the data. In neural networks, the network architecture codifies these prior assumptions, yet the precise relationship between them is opaque. Deep learning solutions are therefore difficult to build without a lot of trial and error, and neural nets are far from an out-of-the-box solution for most applications.

Since I entered the machine learning community, I have frequently found myself engaging in conversation with researchers or startup-types from other communities about how they can get the most out of their data and, more often than not, we end up taking about neural networks. I get it: the allure is strong. The ability of a neural network to learn complex patterns from massive amounts of data has enabled computers to challenge (and even outperform) humans on general tasks like object detection and games like Go. But reading about the successes of these newer machine learning techniques rarely makes clear one important point:

Nothing is ever free.

When training a neural network — or any machine learning system — a tradeoff is always made between flexibility in the sorts of things the system can learn and the amount of data necessary to train these systems. Yet in practice, the precise nature of this tradeoff is opaque. That a neural network is capable of learning complex concepts — like what an object looks like from a bunch of images — means that training it effectively requires a large amount of data to convincingly rule-out other interpretations of the data and reject the impact of noise. On the face of it, this statement is perhaps obvious: of course it requires more work/data/effort to extract meaning out of more complex problems. Yet, perhaps counterintuitive to the thinking of many machine learning outsiders, the way in which these systems are designed and the relationship between the many complex hyperparameters that define them has a profound impact on how well the system performs.

Noise takes many forms. In the case of object detection, noise might include the color of the object: I should be able to identify that a car is a car regardless of its color.

On its own, data cannot provide all the answers. The goal of learning is to extract more general meaning from a limited amount of data, discovering properties about the data that allow new data to be understood. Generalizing to unseen data requires making a decision, whether explicitly or implicitly, about the nature of the data beyond what is provided in the dataset. Consider a very simple dataset consisting of 10 points evenly spaced along the x-axis:

Suppose your goal was to fit a curve to this data, what might you do? Well, from inspection, the "curve" looks pretty much like a line, so you fit a line to the points and hand off the resulting fit to another system. But what if you were to collect more data and you discover that, instead of falling close to a line (left), the new data has a rather dramatic periodic component (right):

You might (justifiably) claim that more data is necessary, and insist that a separate test set be provided so that you can choose between these different hypotheses. But at what point should you stop? Collecting data ad infinitum in order to rule out all possible hypothesis is impractical, so the learning algorithm must impose some assumptions about the structure of the data to be effective. Prior knowledge of the structure of the data is often essential to good machine learning performance: if you know that the data is linear with some noise added to it, you would know that fitting more complex curves is unnecessary. But if the underlying structure of the data is a mystery, as is often the case for very high-dimensional learning algorithms like a neural network, this problem is impossible to avoid.

Of course, having test data in addition to the training dataset is essential, and would mitigate the risk of a dramatic surprise.

Neural networks are vastly more complex and opaque than linear curve-fitting, exacerbating the challenges associated with constraining the space of possible solutions. The parameters that define a neural network — including the number of layers and the learning rate — do bias the system towards producing certain types of output, but often precisely how those parameters are related to the structure of the data is unclear. Unfortunately, detailed inspection of the performance of the network is largely impossible, owing to the complexity of the system being trained; most modern deep learning systems have millions of free parameters and no longer are a few coefficients enough to understand how the system performs.

There are methods for neural network inspection, but they remain an area of active research.

Relatedly, prior knowledge takes a markedly different form for more complex types of data, like images. For the dataset above, it was pretty clear from quick inspection of the data that a linear model was probably the right representation for the learning problem. If instead your dataset is billions of stock photos and your objective is to identify which of these contain household pets, it is not even clear how one should go about turning intuition into mathematical operations. In fact, this is ultimately the draw of such systems — the utility of neural network models is their ability to learn otherwise impossible-to-write-down features in the data.

Broadly, there is some well-validated intuition about the differences between vastly different neural network structures. Convolutional Neural Networks (CNNs), for example, are often used for image data, since the primary mode of their operation processes local patches of texture rather than the entire image all at once; this dramatically reduces the number of free parameters, and greatly restricts what the system can learn in a way that has proven effective for many image-based tasks. Yet whether I use six layers in my neural network instead of five, for example, is often decided by how well the system performs on the data, rather than an understanding of how that number of layers reflects the structure of the data.

I conclude with a word of caution: neural networks are far from an out-of-the-box solution for most learning problems and the process required to reach near-state-of-the art performance often comes as a surprise to non-experts. The design of machine learning systems remains an active area of research, and no size fits all for neural networks. Many off-the-shelf neural networks are designed or pretrained with a number of unwritten biases or priors about the nature of the dataset. Whether or not a particular learning strategy will work on one dataset is often a judgment call until validated with experiments, making iterating over the design an intensive process that often requires "expert" intuition or guidance.

As always, I welcome your thoughts in the comments below or on Hacker News.


Liked this post? Subscribe to our RSS feed or add your email to our newsletter:


all posts in series

AI Perspectives

Reflections on the progress, promise, and impact of AI.
  • 1 Bias in AI Happens When We Optimize the Wrong Thing
    Bias is a pervasive problem in AI. Only by discouraging machine learning systems from exploiting a certain bias can we expect such a system to avoid doing so.
  • 2 For AI, translation is about more than language
    Translation is about expressing the same underlying information in different ways, and modern machine learning is making incredibly rapid progress in this space.
  • 3 Practical Guidelines for Getting Started with Machine Learning
    The potential advantages of AI are many, and using machine learning to accelerate your business may outweigh potential pitfalls. If you are looking to use machine learning tools, here are a few guidelines you should keep in mind.
  • 4 DeepMind's AlphaZero and The Real World
    Using DeepMind's AlphaZero AI to solve real problems will require a change in the way computers represent and think about the world. In this post, we discuss how abstract models of the world can be used for better AI decision making and discuss recent work of ours that proposes such a model for the task of navigation.
  • 5 Massive Datasets and Generalization in ML
    Big, publically available datasets are great. Yet many practitioners who seek to use models pretrained on this data need to ask themselves how informative the data is likely to be for their purposes. Dataset bias and task specificity are important factors to keep in mind.
  • 6 Proxy metrics are everywhere in Machine Learning
    Many machine learning systems are optimized using metrics that don't perfectly match the stated goals of the system. These so-called "proxy metrics" are incredibly useful, but must be used with caution.
  • 7 No Free Lunch and Neural Network Architecture
    Machine learning must always balance flexibility and prior assumptions about the data. In neural networks, the network architecture codifies these prior assumptions, yet the precise relationship between them is opaque. Deep learning solutions are therefore difficult to build without a lot of trial and error, and neural nets are far from an out-of-the-box solution for most applications.


+ Show Comments From Disqus