Abstract: Recent vision-language-action models (VLAs), which build upon pretrained vision-language models and leverage diverse robot datasets, have demonstrated strong task execution, language-following ability, and out-of-distribution generalization. Despite their success, VLAs struggle with novel robot setups and must be fine-tuned to achieve optimal performance. However, existing fine-tuning methods yield suboptimal speed and task performance, and systematic investigations of alternative adaptation strategies and controlled evaluations of their effects remain largely underexplored. In this work, we conduct a comprehensive study of adaptation design choices for the recently released OpenVLA model, examining different action decoding schemes, action representations, and learning objectives for fine-tuning. Based on our findings, we propose OpenVLA-OFT, an instantiation of our Optimized Fine-Tuning recipe that integrates parallel decoding, action chunking, continuous action representations, and a simple L1 regression-based learning objective to altogether improve inference efficiency, policy performance, and model input/output flexibility. OpenVLA-OFT sets a new state of the art on the LIBERO simulation benchmark, significantly boosting OpenVLA’s average success rate across four task suites from 76% to 97% while increasing action generation throughput by 26×. In real-world evaluation, OpenVLA-OFT successfully performs dexterous, high-frequency control tasks on a dual-arm ALOHA robot and matches or outperforms strong imitation learning methods trained from scratch as well as other fine-tuned VLAs. We will release code for the optimized fine-tuning recipe, pretrained model checkpoints, and datasets upon publication.