Andrew Hunter likes making code go fast. Before joining Jane Street, he worked for seven years at Google on multithreaded architecture, and was a tech lead for tcmalloc, Google’s world-class scalable malloc implementation. In this episode, Andrew and Ron discuss how, paradoxically, in some ways it’s easier to optimize systems at hyperscale because of the impact that even miniscule changes can have. Finding performance wins in trading systems, which operate at a smaller scale, but which have bursty, low-latency workloads, is often trickier. Andrew explains how he approaches the problem, including his favorite profiling techniques and visuaization tools; the unique challenges of optimizing OCaml versus C++; and when you should and shouldn’t care about nanoseconds. They also touch on the joys of musical theater, and how to pass an interview when you’re sleep-deprived.