Welcome to
Caches to Caches

This blog is devoted to the broad interests of Gregory J Stein, which includes topics such as Numerical Modeling, Web Design, Robotics, and a number of my individual hobby projects. If there's any article you would like to see, or something you've been wondering about, be sure to let me know on Twitter.


As an academic, I see a lot of talks. In general, good presentations tend to be based on a good slide deck; even very capable speakers have a tough time reaching their audience when their slides are a mess. One common pitfall I often see is that many researchers will take figures or diagrams directly from their papers, upon which the talk is usually based, and paste them into their slides. It's often clear to the audience when this happens, since figures in papers tend to be rich with information that can be distracting in a talk. My advice:

Avoid using unedited paper figures in talks.

At the end of every year, I like to take a look back at the different trends or papers that inspired me the most. As a researcher in the field, I find it can be quite productive to take a deeper look at where I think the research community has made surprising progress or to identify areas where, perhaps unexpectedly, we did not advance.

Here, I hope to give my perspective on the state of the field. This post will no doubt be a biased sample of what I think is progress in the field. Not only is covering everything effectively impossible, but my views on what may constitute progress may differ from yours. Hopefully all of you reading will glean something from this post, or see a paper you hadn't heard about. Better yet, feel free to disagree: I'd love to discuss my thoughts further and hear alternate perspectives in the comments below or on Hacker News.

As Jeff Dean points out, there are roughly 100 machine learning papers posted to the Machine Learning ArXiv per day!


There's a story I retell about an incredibly talented researcher friend of mine from time-to-time. Though the exact details elude me now, since it was a number of years ago, the story goes something like this:

My friend and I were on our way to lunch when we ran into someone he knew in the hallway, who we'll call Stumped Researcher. He was having some odd issue with a measurement apparatus he'd built; we were all physicists, and every lab has their own custom setup of sensors, signal analyzers, etc. to probe physical phenomena. After a lengthy description, stumped researcher was clearly distraught, unable to collect any data that made sense, indicating that something was wrong with his setup. Without ever having seen the measurement setup and without an understanding of the experimental goals, my friend asked a question that astonished me in its specificity, wanting to know the brand of lock-in amplifier that was being used. Stumped researcher (a bit lost, having not mentioned that any lock-in amplifier was even being used) didn't remember. My friend responded "Yeah, the older model lock-in amplifiers produced by $COMPANY_NAME ship with cables that are known to fail sometimes. I'll bet that's the problem." Sure enough, a couple days later, upon running into no-longer-stumped researcher, that was indeed the problem; a quick change of cable remedied the issue.

To this day, it remains one of the most incredible instances of remote problem-solving I've ever seen. The key enabler of this ability: experience. I know that my friend thought that might be the problem because he'd seen it before in the wild. Tinkering was his passion, and with the number of things he'd bought online, taken apart, and sold for parts, he'd no doubt seen it all. And yet, despite knowing how the trick was done, it certainly seemed like magic to me at the time. I find good doctors also have this ability, to have such a deep understanding of the entire body system that a problem in one region causes them to understand. Recently, it occurred to me that I occasionally do the same thing to the undergraduate researchers I work with, asking an obscure question about their code or data or algorithm and then remotely solving the problem that's vexed them for days.

The title is an allusion to the perhaps overused Arthur C. Clarke quote: Any sufficiently advanced technology is indistinguishable from magic.

I have the privilege of being surrounded by brilliant scientists, philosophers, and thinkers of all kinds, so I witness this phenomena with relative frequency. Yet every time I see someone who surprises me in this way, I try to remember that these circumstances don't just happen: only though dedication to a craft can one gain the depth of understanding necessary to demonstrate this level of mastery. The pull of impostor's syndrome is real, but I try to be inspired by these moments whenever I can. Perhaps someday I'll feature in someone else's anecdotes.

As always, I welcome your thoughts (and personal anecdotes) in the comments below or on Hacker News.


I can't tell you the number of articles I've read devoted to "debunking myths". They try to communicate the author's opinion by listing a set of negative examples, often with section headings labeled Myth #1, Myth #2, etc. At best, it's an easy way of building up a straw-man argument, yet at worst, such an article confuses the reader, filling their screen with potentially contentious or confusing statements. Try as I might, I rarely find these Myth List articles compelling. One particularly problematic article I recently came across boasted a headline of the form "10 Myths about […]" whose in-article headings were simply all the myths. At the start of every new section, I needed to remind myself that the author's belief was opposite to what was written on the page. As you might imagine, the article was far from compelling.

Worse still are articles in which the author's goal is to persuade rather than inform, and whether or not myths are actually myths is a contentious point.

The mental hoops I sometimes have to jump through to figure out what the author is trying to communicate rarely outweigh the benefits they might have gotten by introducing an opposing viewpoint. In succinctly summarizing only a point of view that is not being arguing for, the author introduces a cognitive dissonance in the reader that need not exist. Many such articles could benefit from a more clearly presented statement of the author's viewpoint. Even having both views side-by-side would be a massive improvement, and could be made even clearer by adding visual markers to indicate which statement agrees with the author's. Particularly in the modern era in which online attention span is limited and skimming is the norm, it is to the author's benefit to make their article as skimmable as possible. Myth lists are in direct conflict with this goal, since the author's perspective is often only fleshed out in the body of the text.


Summary: Recent ire from the media has focused on the high-power consumption of artificial neural nets (ANNs), yet popular discussion frequently conflates training and testing. Here, I aim to clarify the ways in which conversations involving the relative efficiency of ANNs and the human brain often miss the mark.

I recently saw an article in the MIT Tech Review about the "Carbon Footprint" of training deep neural networks that ended with a peculiar line from one of the researchers quoted in the article:

'Human brains can do amazing things with little power consumption,' he says. 'The bigger question is how can we build such machines.'

Now I want to avoid putting this particular researcher on the spot since his meta-point is a good one: there are absolutely things that the human brain is readily capable of for which the field of Artificial Intelligence has only just begun to scratch the surface. There are certain classes of problems, e.g. navigation under uncertainty, that require massive computational resources to solve in general, yet humans are capable of solving very well with little effort. Our ability to solve complex problems from limited examples, also known as combinatorial generalization, is unmatched in general by machine intelligence. Relatedly, humans have incredibly high sample efficiency, and require only a few training instances to generalize performance on tasks like video game playing and skill learning.

Yet commenting on the relative inefficiency of the neural net training, particularly for supervised learning problems, misses the point slightly. Deep learning has been shown to match and even (arguably) surpass human performance on many supervised tasks, including object detection and semantic segmentation. For such problems, the conversation about relative energy expenditure — as compared to the human brain — becomes more nuanced.


Summary: Machine learning must always balance flexibility and prior assumptions about the data. In neural networks, the network architecture codifies these prior assumptions, yet the precise relationship between them is opaque. Deep learning solutions are therefore difficult to build without a lot of trial and error, and neural nets are far from an out-of-the-box solution for most applications.

Since I entered the machine learning community, I have frequently found myself engaging in conversation with researchers or startup-types from other communities about how they can get the most out of their data and, more often than not, we end up taking about neural networks. I get it: the allure is strong. The ability of a neural network to learn complex patterns from massive amounts of data has enabled computers to challenge (and even outperform) humans on general tasks like object detection and games like Go. But reading about the successes of these newer machine learning techniques rarely makes clear one important point:

Nothing is ever free.

When training a neural network — or any machine learning system — a tradeoff is always made between flexibility in the sorts of things the system can learn and the amount of data necessary to train these systems. Yet in practice, the precise nature of this tradeoff is opaque. That a neural network is capable of learning complex concepts — like what an object looks like from a bunch of images — means that training it effectively requires a large amount of data to convincingly rule-out other interpretations of the data and reject the impact of noise. On the face of it, this statement is perhaps obvious: of course it requires more work/data/effort to extract meaning out of more complex problems. Yet, perhaps counterintuitive to the thinking of many machine learning outsiders, the way in which these systems are designed and the relationship between the many complex hyperparameters that define them has a profound impact on how well the system performs.

Noise takes many forms. In the case of object detection, noise might include the color of the object: I should be able to identify that a car is a car regardless of its color.


In my role as a Communication Advisor for the MIT Communication Lab, I see a lot of practice talks. Students, both graduate and undergraduate, sign up for a 30 minute or 1 hour long session during which they will present some material they're working on and ask for guidance on both content and presentation: "How clear is what I was trying to accomplish?" or "Are my results figures clear?"

Rarely do students ask "Did I use too much jargon?" It likely doesn't occur to them that, despite their relative inexperience, they might know more about the subject at hand than those to which they are presenting.

One of the key components to good technical communication is the right amount of context. Provide too much background material and your audience will lose interest; too little, and the audience may not be able to follow the remainder of the talk. The first half of a talk should clearly communicate Why the audience should care about your work and How your work compares to other work in the field. Addressing these questions often requires an understanding of popular trends within a discipline or how common certain tools or tricks are.

It should come as no surprise that newer researchers, typically undergraduates or first/second-year graduate students, may find it difficult to decide what information to include when preparing a talk. More frequently than not, I find that most technical talks — particularly those from newcomers to the field — spend too much time discussing the nitty-gritty details of an experiment while leaving out important details about the motivation of their research. Talks from neophyte researchers often vacillate between including an overwhelming amount of detail to covering unnecessary minutiae or unknowingly including too much jargon when explaining difficult concepts, likely in an effort to seem experienced. It is not uncommon for such presentations — in the space of two slides — to transition from an in-depth description of background material that the audience might consider "common knowledge" to a hastily-done description of domain-specific information essential for understanding the remainder of the talk. To make matters more complicated, the composition of the audience must be taken into consideration when deciding what material needs to be addressed during the talk: what one group might decide is "common knowledge" may be completely foreign to another.

Preparing a talk requires understanding one's audience and, without external support, only experience yields such knowledge. Technical communication is understandably hard for newcomers. Not only do they have trouble fully appreciating what they know and don't know, it's also extremely difficult for them to understand what others around them know. Good mentorship is critical for shaping a younger student's perspective in this regard. Such students should seek out feedback from more established members of the community and experienced communicators should make themselves available to provide support.

As always, I welcome your thoughts (and personal anecdotes) in the comments below or on Hacker News.