Welcome to
Caches to Caches

This blog is devoted to the broad interests of Gregory J Stein, which includes topics such as Numerical Modeling, Web Design, Robotics, and a number of my individual hobby projects. If there's any article you would like to see, or something you've been wondering about, be sure to let me know on Twitter.

As an academic, I read a lot of papers. Part of my job is retaining what I read, since deeply understanding the work of others and building upon it is one way I come up with new research ideas. When it comes time to sit down and write a paper, I need to contextualize my ideas in a Related Work section: I include a discussion of other research that has touched upon similar problems or inspired my own work. Over my years in academia, I have settled on an annotated bibliography to manage my own knowledge base of papers.

When I started collecting papers and other references, I added annotations to PDFs. However, this doesn't scale well, since my comments and the documents themselves were difficult to search and lacked easy-access to important metadata. I used Zotero for a while, but that didn't mesh with my workflow either. It was nice enough, but still required that I leave my Emacs environment. In addition, I don't really need the ability to markup PDFs. For a paper I think I may want to find again, I only need a couple of things:

  • A paragraph-long summary of the paper.
  • A BibTeX entry for the paper.

An annotated bibliography is perfect for this. Everything is in one place. It's easy to edit and share. And I can manage the entire thing from within Emacs with ease.

Five years ago, my life exploded in complexity. I had just started a new position in a new field. I was planning my wedding. And my inability to say NO to anyone and everyone had culminated in my serving on the board of three graduate student organizations. Inevitably, cracks began to form, and my finite brain started to lose track of tasks. My calendar was sufficient to ensure that I wouldn't miss meetings, but I would often only prepare for those meetings at the eleventh hour. My productivity and the quality of my work both suffered. Something needed to change.

This guide is devoted to a discussion of the organizational system that I have honed in the time since. With it, I have found that my time is spent more wisely. Better organization means that I can consciously devote effort where it is needed early on, as opposed to scrambling to keep up, and deliver higher quality work without expending more energy.

Many of the ideas presented here derive from the Getting Things Done methodology, but adapted and expanded to meet my personal needs.

You too can streamline your process. This guide is meant to serve as an example of how you might reorganize your workflow and find order through the chaos of your busy life. Yet different lifestyles have different demands: what works for me may not work as well for you. As such, I do not expect that you will replicate this system in its entirety. Instead, I hope you will take inspiration from my system and use elements of it to build a workflow that works for you.

This document is broken into three main parts:

  • Goals: in which I dive into more detail about what it is I have tried to accomplish with my system.
  • Framework: in which I describe the core ideas and systems I employ to record information and keep track of my tasks.
  • Tooling: in which I discuss the tools—including hardware, software, whatever—that I use to implement the framework.

In addition, I conclude with two sections in which I describe what I see as limitations of my existing system and some other technical details.

Let's dive in.

As a researcher at the intersection of Robotics and Machine Learning, the most surprising shift over my five years in the field is how quickly people have warmed to the idea of having AI impact their lives. Learning thermostats are becoming increasingly popular (probably good), digital voice assistants pervade our technology (probably less good), and self-driving vehicles populate our roads (about which I have mixed feelings). Along with this rapid adoption, fueled largely by the hype associated with artificial intelligence and recent progress in machine learning, we as a society are opening ourselves up to risks associated with using this technology in circumstances it is yet unprepared to deal with. Particularly for safety-critical applications or the automation of tasks that can directly impact quality of life, we must be careful to avoid what I call the valley of AI trust—the dip in overall safety caused by premature adoption of automation.

As an academic, I see a lot of talks. In general, good presentations tend to be based on a good slide deck; even very capable speakers have a tough time reaching their audience when their slides are a mess. One common pitfall I often see is that many researchers will take figures or diagrams directly from their papers, upon which the talk is usually based, and paste them into their slides. It's often clear to the audience when this happens, since figures in papers tend to be rich with information that can be distracting in a talk. My advice:

Avoid using unedited paper figures in talks.

At the end of every year, I like to take a look back at the different trends or papers that inspired me the most. As a researcher in the field, I find it can be quite productive to take a deeper look at where I think the research community has made surprising progress or to identify areas where, perhaps unexpectedly, we did not advance.

Here, I hope to give my perspective on the state of the field. This post will no doubt be a biased sample of what I think is progress in the field. Not only is covering everything effectively impossible, but my views on what may constitute progress may differ from yours. Hopefully all of you reading will glean something from this post, or see a paper you hadn't heard about. Better yet, feel free to disagree: I'd love to discuss my thoughts further and hear alternate perspectives in the comments below or on Hacker News.

As Jeff Dean points out, there are roughly 100 machine learning papers posted to the Machine Learning ArXiv per day!

There's a story I retell about an incredibly talented researcher friend of mine from time-to-time. Though the exact details elude me now, since it was a number of years ago, the story goes something like this:

My friend and I were on our way to lunch when we ran into someone he knew in the hallway, who we'll call Stumped Researcher. He was having some odd issue with a measurement apparatus he'd built; we were all physicists, and every lab has their own custom setup of sensors, signal analyzers, etc. to probe physical phenomena. After a lengthy description, stumped researcher was clearly distraught, unable to collect any data that made sense, indicating that something was wrong with his setup. Without ever having seen the measurement setup and without an understanding of the experimental goals, my friend asked a question that astonished me in its specificity, wanting to know the brand of lock-in amplifier that was being used. Stumped researcher (a bit lost, having not mentioned that any lock-in amplifier was even being used) didn't remember. My friend responded "Yeah, the older model lock-in amplifiers produced by $COMPANY_NAME ship with cables that are known to fail sometimes. I'll bet that's the problem." Sure enough, a couple days later, upon running into no-longer-stumped researcher, that was indeed the problem; a quick change of cable remedied the issue.

To this day, it remains one of the most incredible instances of remote problem-solving I've ever seen. The key enabler of this ability: experience. I know that my friend thought that might be the problem because he'd seen it before in the wild. Tinkering was his passion, and with the number of things he'd bought online, taken apart, and sold for parts, he'd no doubt seen it all. And yet, despite knowing how the trick was done, it certainly seemed like magic to me at the time. I find good doctors also have this ability, to have such a deep understanding of the entire body system that a problem in one region causes them to understand. Recently, it occurred to me that I occasionally do the same thing to the undergraduate researchers I work with, asking an obscure question about their code or data or algorithm and then remotely solving the problem that's vexed them for days.

The title is an allusion to the perhaps overused Arthur C. Clarke quote: Any sufficiently advanced technology is indistinguishable from magic.

I have the privilege of being surrounded by brilliant scientists, philosophers, and thinkers of all kinds, so I witness this phenomena with relative frequency. Yet every time I see someone who surprises me in this way, I try to remember that these circumstances don't just happen: only though dedication to a craft can one gain the depth of understanding necessary to demonstrate this level of mastery. The pull of impostor's syndrome is real, but I try to be inspired by these moments whenever I can. Perhaps someday I'll feature in someone else's anecdotes.

As always, I welcome your thoughts (and personal anecdotes) in the comments below or on Hacker News.

I can't tell you the number of articles I've read devoted to "debunking myths". They try to communicate the author's opinion by listing a set of negative examples, often with section headings labeled Myth #1, Myth #2, etc. At best, it's an easy way of building up a straw-man argument, yet at worst, such an article confuses the reader, filling their screen with potentially contentious or confusing statements. Try as I might, I rarely find these Myth List articles compelling. One particularly problematic article I recently came across boasted a headline of the form "10 Myths about […]" whose in-article headings were simply all the myths. At the start of every new section, I needed to remind myself that the author's belief was opposite to what was written on the page. As you might imagine, the article was far from compelling.

Worse still are articles in which the author's goal is to persuade rather than inform, and whether or not myths are actually myths is a contentious point.

The mental hoops I sometimes have to jump through to figure out what the author is trying to communicate rarely outweigh the benefits they might have gotten by introducing an opposing viewpoint. In succinctly summarizing only a point of view that is not being arguing for, the author introduces a cognitive dissonance in the reader that need not exist. Many such articles could benefit from a more clearly presented statement of the author's viewpoint. Even having both views side-by-side would be a massive improvement, and could be made even clearer by adding visual markers to indicate which statement agrees with the author's. Particularly in the modern era in which online attention span is limited and skimming is the norm, it is to the author's benefit to make their article as skimmable as possible. Myth lists are in direct conflict with this goal, since the author's perspective is often only fleshed out in the body of the text.