Abstract: We study nonlinear output-feedback control from high-resolution RGB images and provide robust constraint satisfaction guarantees despite partial observability, sensor noise, and nonlinear dynamics. To enable scalability while retaining guarantees, we propose: (i) a learned low-dimensional observation map from pretrained visual features with state-dependent error bounds, and (ii) a causal affine time-varying output-feedback policy optimized via System Level Synthesis (SLS). We efficiently solve the resulting nonconvex program via sequential convex programming. On two simulated visuomotor tasks (a 4D car and a 10D quadrotor) with \ge 512 × 512 pixels and a humanoid task with partial observability, our method enables safe, information-gathering behavior that reduces uncertainty while maintaining zero observed constraint violations across trials. We also validate our method on hardware, safely controlling a ground vehicle from onboard images. Together, these results show that learned visual abstractions coupled with SLS make certified visuomotor output-feedback practical at scale.