Abstract: Learning-based control approaches like reinforcement learning (RL) have recently produced a slew of impressive results for tasks like quadrotor trajectory tracking and drone racing. Naturally, it is common to demonstrate the advantages of these new controllers against established methods like analytical controllers. We observe, however, that reliably comparing the performance of these very different classes of controllers is more complicated than might appear at first sight. As a case study, we take up the problem of agile tracking of an end-effector for a quadrotor with a fixed-arm. We develop a set of best practices for synthesizing the best RL and Geometric controllers for benchmarking. In the process, we fix widely prevalent RL-favoring biases in prior studies that provide asymmetric access to: (1) the task definition in the form of objective functions, (2) datasets for parameter optimization, and (3) “feed-forward” controller inputs revealing the desired future trajectory. The resulting contributions are threefold: first, our improved robust experimental protocol reveals that the gaps between the two controller classes are much smaller than expected from previously published findings. Geometric control performs on par or better than RL in most practical settings, while RL fares better in transient performance at the expense of steady-state error. Second, our improvements to the experimental protocol for comparing learned and classical controller synthesis approaches are critical: each of the above asymmetries can yield misleading conclusions, and we show evidence that suggests that they indeed have in prior quadrotor studies. Finally, we open-source implementations of Geometric and RL controllers for these aerial vehicles implementing best practices for future development.