Finding examples of "problematic" AI is relatively easy these days. Microsoft has inadvertently given rise to an unhinged, neo-nazi Twitter Bot while an AI beauty contest judge seems to strongly favor white women. Despite the sensational nature of these examples, they reflect a pervasive problem plaguing many modern AI systems.
Machine learning is designed to discover and exploit patterns in data so as to optimize some notion of performance. Most measures of good performance involve maximizing accuracy, yet this performance metric is often sufficient only for situations in which perfect accuracy can be achievedThe notion of "perfect accuracy" is also simplistic in general. If an AI system is being used to screen candidates to hire, deciding how to define accuracy is already a value judgment. . When a task is difficult enough that the system is prone to errors, AI agents may fail in ways that we, as humans, may consider unfair or that take advantage of undesirable patterns in the data. Here, I discuss the issue of bias in AI and argue that great care must be taken to train a machine learning system to avoid systematic bias.
The notion of "perfect accuracy" is also simplistic in general. If an AI system is being used to screen candidates to hire, deciding how to define accuracy is already a value judgment.
In short, if you are a business professional looking to use some form of machine learning, you need to be aware of how bias can manifest itself in practice.
Just over a two weeks ago, NVIDIA showcased vid2vid, their new technique for video-to-video translation. Their paper shows off a number of different applications including one particularly striking example in which the researchers automatically convert sketchy outlines of vlog-style videos from YouTube into compellingly realistic videos of people talking to the camera. The results are incredible and really need to be seen to be believed:
When most people hear the term "translation" they think of translating natural language: e.g. translating text or speech from Mandarin to EnglishMachine learning is, of course, an incredibly powerful tool for language translation. Recently, researchers from Microsoft achieved human-level translation performance on translating news articles from Mandarin to English. . Today I want to reinforce the idea that translation can be applied to different types of data beyond language. The vid2vid paper I mentioned above is just the latest and most visually striking example of the transformative power of AI, and modern machine learning is making incredibly rapid progress in this space.
Machine learning is, of course, an incredibly powerful tool for language translation. Recently, researchers from Microsoft achieved human-level translation performance on translating news articles from Mandarin to English.
In the remainder of this article, I will cover:
- A brief definition of "translation" in the context of AI;
- An overview of how modern machine learning systems tackle translation;
- A list of application domains and some influential research for each.
As the use of machine learning systems grows beyond the academic sphere, one of the more worrying features I have witnessed is a lack of understanding of how machine learning systems should be trained and applied. The lessons the AI community has learned over the last few decades of research are hard-earned, and it should go without saying that those who do not understand the inner workings of a machine learning toolThis advice is not limited to AI. Using any stochastic system without an understanding of when or how it is likely to fail comes with inherent risk. risk having that system fail in often surprising ways.
This advice is not limited to AI. Using any stochastic system without an understanding of when or how it is likely to fail comes with inherent risk.
However, the potential advantages of AI are many, and using machine learning to accelerate your business, whether empowering employees or improving your product, may outweigh potential pitfalls. If you are looking to use machine learning tools, here are a few guidelines you should keep in mind:
- Establish clear metrics for success.
- Start with the simplest approach.
- Ask yourself if machine learning is even necessary.
- Use both a test and a validation dataset.
- Understand and mitigate data overfitting.
- Be wary of bias in your data.
AlphaZero is incredible. If you have yet to read DeepMind's blog post about their recent paper in Science detailing the ins and outs of their legendary game-playing AI, I recommend you do so. In it, DeepMind's scientists describe an intelligent system capable of playing the games of Go, Chess, and Shogi at superhuman levels. Even legendary chess Grandmaster Garry Kasparov says the moves selected by the system demonstrate a "superior understanding" of the games. Even more remarkable is that AlphaZero, a successor to the well-known AlphaGo and AlphaGo Zero, is trained entirely via self-play — it was able to learn good strategies without any meaningful human input.
So do these results imply that Artificial General Intelligence is soon-to-be a solved problem? Hardly. There is a massive difference between an artificially intelligent agent capable of playing chess and a robot that can solve practical real-world tasks, like exploring a building its never seen before to find someone's office. AlphaZero's intelligence derives from its ability to make predictions about how a game is likely to unfold: it learns to predict which moves are better than others and uses this information to think a few moves ahead. As it learns to make increasingly accurate predictions, AlphaZero gets better at rejecting "bad moves" and is able to simulate deeper into the future. But the real world is almost immeasurably complex, and, to act in the real world, a system like AlphaZero must decide between a nearly infinite set of possible actions at every instant in time. Overcoming this limitation is not merely a matter of throwing more computational power at the problem:
Using AlphaZero to solve real problems will require a change in the way computers represent and think about the world.
Yet despite the complexity inherent in the real world, humans are still capable of making predictions about how the world behaves and using this information to make decisions. To understand how, we consider how humans learn to play games.
Summary: Big, publically available datasets are great. Yet many practitioners who seek to use models pretrained on outside data need to ask themselves how informative the data is likely to be for their purposes. "Dataset bias" and "task specificity" are important factors to keep in mind.
As I read deep learning papers these days, I am occasionally struck by the staggering amount of data some researchers are using for their experiments. While I typically work to develop representations that allow for good performance with less data, some scientists are racing full steam ahead in the opposite direction.
It was only a few years ago that we thought the ImageNet 2012 dataset, with 1.2 million labeled images, was quite large. Only six years later, researchers from Facebook AI Research (FAIR) have dwarfed ImageNet 2012 with a 3-billion-image dataset comprised of hashtag-labeled images from Instagram. Google's YouTube-8M dataset, geared towards large-scale video understanding, consists of audio/visual features extracted from 350,000 hours of video. Simulation tools have also been growing to incredible sizes; InteriorNet is a simulation environment consisting of 22 million 3D interior environments, hand-designed by over a thousand interior designers. And let's not forget about OpenAI either, whose multiplayer-game-playing AI is trained using a massive cluster of computers so that it can play 180 years of games against itself every day.
Summary: Many machine learning systems are optimized using metrics that don't perfectly match the stated goals of the system. These so-called "proxy metrics" are incredibly useful, but must be used with caution.
The use of so-called proxy metrics to solve real-world machine learning problems happens with perhaps surprising regularity.I have written before about the repercussions of optimizing a metric that doesn't perfectly align with the stated goal of the system. Here, I touch upon why the use of such metrics is actually quite common. The choice to solve an alternative metric, in which the optimization target is different from the actual metric of interest, is often a conscious one. Such metrics have proven incredibly useful for the machine learning community — when used wisely, proxy metrics can be used to accomplish tasks that are otherwise extremely difficult. Here, I discuss a number of common scenarios in which I see machine learning practitioners using these proxy metrics and how this approach can sometimes result in surprising behaviors and problems.
I have written before about the repercussions of optimizing a metric that doesn't perfectly align with the stated goal of the system. Here, I touch upon why the use of such metrics is actually quite common.
Summary: Machine learning must always balance flexibility and prior assumptions about the data. In neural networks, the network architecture codifies these prior assumptions, yet the precise relationship between them is opaque. Deep learning solutions are therefore difficult to build without a lot of trial and error, and neural nets are far from an out-of-the-box solution for most applications.
Since I entered the machine learning community, I have frequently found myself engaging in conversation with researchers or startup-types from other communities about how they can get the most out of their data and, more often than not, we end up taking about neural networks. I get it: the allure is strong. The ability of a neural network to learn complex patterns from massive amounts of data has enabled computers to challenge (and even outperform) humans on general tasks like object detection and games like Go. But reading about the successes of these newer machine learning techniques rarely makes clear one important point:
Nothing is ever free.
When training a neural network — or any machine learning system — a tradeoff is always made between flexibility in the sorts of things the system can learn and the amount of data necessary to train these systems. Yet in practice, the precise nature of this tradeoff is opaque. That a neural network is capable of learning complex concepts — like what an object looks like from a bunch of images — means that training it effectively requires a large amount of data to convincingly rule-out other interpretations of the data and reject the impact of noise.Noise takes many forms. In the case of object detection, noise might include the color of the object: I should be able to identify that a car is a car regardless of its color. On the face of it, this statement is perhaps obvious: of course it requires more work/data/effort to extract meaning out of more complex problems. Yet, perhaps counterintuitive to the thinking of many machine learning outsiders, the way in which these systems are designed and the relationship between the many complex hyperparameters that define them has a profound impact on how well the system performs.
Noise takes many forms. In the case of object detection, noise might include the color of the object: I should be able to identify that a car is a car regardless of its color.
Summary: Recent ire from the media has focused on the high-power consumption of artificial neural nets (ANNs), yet popular discussion frequently conflates training and testing. Here, I aim to clarify the ways in which conversations involving the relative efficiency of ANNs and the human brain often miss the mark.
I recently saw an article in the MIT Tech Review about the "Carbon Footprint" of training deep neural networks that ended with a peculiar line from one of the researchers quoted in the article:
'Human brains can do amazing things with little power consumption,' he says. 'The bigger question is how can we build such machines.'
Now I want to avoid putting this particular researcher on the spot since his meta-point is a good one: there are absolutely things that the human brain is readily capable of for which the field of Artificial Intelligence has only just begun to scratch the surface. There are certain classes of problems, e.g. navigation under uncertainty, that require massive computational resources to solve in general, yet humans are capable of solving very well with little effort. Our ability to solve complex problems from limited examples, also known as combinatorial generalization, is unmatched in general by machine intelligence. Relatedly, humans have incredibly high sample efficiency, and require only a few training instances to generalize performance on tasks like video game playing and skill learning.
Yet commenting on the relative inefficiency of the neural net training, particularly for supervised learning problems, misses the point slightly. Deep learning has been shown to match and even (arguably) surpass human performance on many supervised tasks, including object detection and semantic segmentation. For such problems, the conversation about relative energy expenditure — as compared to the human brain — becomes more nuanced.
At the end of every year, I like to take a look back at the different trends or papers that inspired me the most. As a researcher in the field, I find it can be quite productive to take a deeper look at where I think the research community has made surprising progress or to identify areas where, perhaps unexpectedly, we did not advance.
Here, I hope to give my perspective on the state of the field. This post will no doubt be a biased sample of what I think is progress in the field. Not only is covering everything effectively impossibleAs Jeff Dean points out, there are roughly 100 machine learning papers posted to the Machine Learning ArXiv per day! , but my views on what may constitute progress may differ from yours. Hopefully all of you reading will glean something from this post, or see a paper you hadn't heard about. Better yet, feel free to disagree: I'd love to discuss my thoughts further and hear alternate perspectives in the comments below or on Hacker News.
As Jeff Dean points out, there are roughly 100 machine learning papers posted to the Machine Learning ArXiv per day!