Abstract: Simulation-based reinforcement learning (RL) is central for robotic control when expert demonstrations are unavailable. However, scaling RL to high-dimensional robots remains challenging. On-policy methods such as PPO are reliable but require large amounts of simulation because they discard past data. Off-policy methods can reuse experience and are more sample-efficient, but they often become unstable in high-dimensional control due to critic errors that are amplified during bootstrapped updates. We introduce FlashSAC, a fast and stable off-policy RL algorithm for high-dimensional robotic control. FlashSAC improves training stability in two ways: (1) it explicitly bounds weight, feature, and gradient norms to limit critic error amplification, and (2) it increases data coverage through large-scale parallel simulation, a high-capacity replay buffer, and strong exploration. These design choices preserve the sample efficiency of off-policy learning while improving training stability. Across 50+ state-based and vision-based tasks in 10 simulators, FlashSAC consistently surpasses PPO and strong off-policy baselines in both final performance and wall-clock efficiency, with larger gains on higher-dimensional tasks. In sim-to-real humanoid walking, FlashSAC reduces training time from hours to minutes while maintaining stable real-world deployment. Our results show that stabilizing off-policy learning enables scalable sim-to-real RL for high-dimensional robotic systems.