Welcome to
Caches to Caches

This blog is devoted to the broad interests of Gregory J Stein, which includes topics such as Numerical Modeling, Web Design, Robotics, and a number of my individual hobby projects. If there's any article you would like to see, or something you've been wondering about, be sure to let me know on Twitter.


Finding examples of "problematic" AI is relatively easy these days. Microsoft has inadvertently given rise to an unhinged, neo-nazi Twitter Bot while an AI beauty contest judge seems to strongly favor white women. Despite the sensational nature of these examples, they reflect a pervasive problem plaguing many modern AI systems.

Machine learning is designed to discover and exploit patterns in data so as to optimize some notion of performance. Most measures of good performance involve maximizing accuracy, yet this performance metric is often sufficient only for situations in which perfect accuracy can be achieved. When a task is difficult enough that the system is prone to errors, AI agents may fail in ways that we, as humans, may consider unfair or that take advantage of undesirable patterns in the data. Here, I discuss the issue of bias in AI and argue that great care must be taken to train a machine learning system to avoid systematic bias.

The notion of "perfect accuracy" is also simplistic in general. If an AI system is being used to screen candidates to hire, deciding how to define accuracy is already a value judgment.

In short, if you are a business professional looking to use some form of machine learning, you need to be aware of how bias can manifest itself in practice.

I've described at length how I use Emacs and Org as a project management tool. As part of my process, I frequently use Org as a lab notebook, in which I keep track of various bits of data and record both the code I run and various parameters I used in the process. My workflow requires (1) running code, (2) logging the results, and (3) including my own thoughts and analysis in between, a programming paradigm known more generally as literate programming.

A number of folks on Reddit and irreal.com have pointed out that I don't dive deep enough to really call the content in this post literate programming. Perhaps a more appropriate title would include Literate Scripting; regardless, the content I present here is still an integral part of my Emacs-based workflow.

Org makes it easy to asynchronously execute code for multiple programming languages (and even allows for remote code execution over ssh). For instance, on a recent project of mine I had a few shell scripts that I would occasionally run that would loop through some data files I was generating on a remote machine and return some statistics about them; Org makes it possible for me to do this without having to leave my notes. In this article, I'll go over a few use-cases that illustrate the utility of using Emacs with Org for coding projects and walk you through some of the functionality I couldn't live without.


After making quick progress during a summer I spent doing research at Sandia National Labs before my senior year of college, I was invited (at the very last minute) to present at a conference one of my mentors was helping to organize. Rather than pay dues for the conference, I was flagged as a student volunteer. However, while the other 30 students were all pre-assigned specific tasks from a grid, as a late addition, I was not. Instead, I was told "just make yourself useful", with the expectation that I wouldn't do very much.

Having students staffing academic conferences is pretty common practice; they do not have to worry about paying for the registration and have an opportunity to interact directly with the high-profile researchers organizing the conference.

The reality was entirely different.


Even with so many deep learning papers coming out this year, there were a few publications I felt managed to rise above the rest. Here are the five papers that impacted my mental models the most over the last year. For each, I state the "goal" of the paper, briefly summarize the work, and explain why I found it so interesting.


It's no secret that I've spent longer on the aesthetic of this blog than I have on the content. I first created this website during a two-week-long effort to learn about web development over two years ago. Since then, I've toiled over the design of my posts and have become obsessed with visual design. I consider ways of maximizing the signal-to-noise ratio in everything I do.

Maximizing the signal-to-noise ratio is one of John-luc Doumont's universal principles of design that he outlines in his book, Trees, maps, and theorems.

As I've experimented with the layout of Caches To Caches, I've come to embrace one fundamental rule of effective communication:

Clarity is key.

This sounds simple enough, but you'd be surprised how difficult it is to get this right. For example, the last version of this blog was filled with unnecessary clutter—particularly the left sidebar. By drawing the eye away from the margin notes on the right, the sidebar distracts us from the content without contributing much. Instead, the new sidebar, revealed only in the presence of excess screen space, is isolated by color and avoids causing confusion:

Here you can see the dramatic change in the left sidebar in the new version of the site. Rather than have a table of contents which blends in with the text, it's now drawn with a dark background so as to visually isolate it from the center column. The sidebar quickly disappears as screen size shrinks, so as to prioritize the text.

Here is a collection of guidelines I've come up with to help you emphasize your content and avoid needless distractions.


Another day, another fire rages at Uber.

Since you've likely lost track of all the different scandals at Uber (at the time of this writing), allow me to refresh your memory:

I recognize that this list is far from complete.

At some point in the midst of all of this news, I'd finally had it. I couldn't handle all of the sexism and the twisted company culture. After getting viscerally upset at Uber for the Nth time, I decided to delete the app and install Lyft instead.


I get a lot of email. I'm also pretty sure you get a lot of email. However, email is still not a solved problem. Each potential email client is acceptable on it's own, yet none of them satisfied all of my desired features:

This is evidenced by the fact that a quick Google search yields no less than ten viable options for email clients on my Mac.

  • The ability to access my email without an internet connection.
  • I travel quite a lot, so this was very important to me.

  • Easily move messages between different folders, which is how I keep all of my emails organized by project.
  • Quick yet powerful search of all my mail messages.
  • Having an auto-updating status indicator that shows me how many unread messages I have.
  • Managing multiple accounts (Gmail for personal emails and Microsoft Exchange for work emails) and syncing local changes so that my phone can still be up-to-date.

If you follow this blog, you'll recognize that I've gotten a bit carried away with migrating the different aspects of my life to operate within the Emacs environment. So it was only a matter of time until I finally decided to give it a shot, and I converged upon a solution which happily satisfies all of the above constraints. Every email service is a bit different so YMMV, but this setup works for me.