Vision-Based Goal-Conditioned Policies for Underwater Navigation in the Presence of Obstacles

Travis Manderson (McGill University); Juan Camilo Gamboa Higuera (McGill University); Stefan Wapnick (McGill University); Jean-François Tremblay (McGill University); Florian Shkurti (University of Toronto); David Meger (McGill University); Gregory Dudek (McGill University)


We present Nav2Goal, a data-efficient and end-to-end learning method for goal-conditioned visual navigation. Our technique is used to train a navigation policy that enables a robot to navigate close to sparse geographic waypoints provided by a user without any prior map, all while avoiding obstacles and choosing paths that cover user-informed regions of interest. Our approach is based on recent advances in conditional imitation learning. General-purpose safe and informative actions are demonstrated by a human expert. The learned policy is subsequently extended to be goal-conditioned by training with hindsight relabelling, guided by the robot's relative localization system, which requires no additional manual annotation. We deployed our method on an underwater vehicle in the open ocean to collect scientifically relevant data of coral reefs, which allowed our robot to operate safely and autonomously, even at very close proximity to the coral. Our field deployments have demonstrated over a kilometer of autonomous visual navigation, where the robot reaches on the order of 40 waypoints, while collecting scientifically relevant data. This is done while travelling within 0.5 m altitude from sensitive corals and exhibiting significant learned agility to overcome turbulent ocean conditions and to actively avoid collisions.

Live Paper Discussion Information

Start Time End Time
07/15 15:00 UTC 07/15 17:00 UTC

Virtual Conference Presentation

Paper Reviews

Review 1

The proposed approach is a novel integration of several existing methods. The approach that is used for balancing exploration and exploitation (i.e., “uncertainty guided exploration”) does not seem to be very sophisticated. There is also a lack of evaluation metrics for the experimental data. A comparison with benchmarks, such as a method based on traditional mapping, planning and control approaches would be beneficial. The manuscript also has several typos and grammatical errors. Here are a few examples: Page 5: “As well, in contains an Inertial Measurement Unit (IMU)…” Page 7: “With these trajectories, we created a dataset of X images, goal, action tuples to train the goal-conditioned policies3. We split this dataset into a training set of X samples and a validation set of Y samples.” Page 8: “, as it the robot’s motion is now influenced by…”

Review 2

In addition to the comments above, the paper is rather well presented, well organised and easy to follow. The experiments are well designed and convincing and the experiments in simulation provide some information about the performances of the system. The following list questions or details that may need to be addressed in future iterations of the paper. III.A: patch -> pitch, backpropegation -> back-propagation. End of IV: "dataset it was sued to generate it". I do not understand this sentence. How can a dataset be sued? V.A: in contains -> it contains Fig. 8: the black crosses are hard to read on the dark background of coral mounds. VI.B: and collect new data -> and collected new data as it the robot's motion -> as the robot's motion Still in VI.B: the system is said to be trained from a new set of collected data. It might have been interesting to evaluate the feasibility of fine-tuning the policy from the model trained in simulation. Is this something that was tried? considered? rejected? A few lines on this consideration would be very interesting. Additionally, in VI.B, a number of failure cases are implicitly reported by stating the success rate of the approach. It would actually be very interesting to describe and understand these failure cases in more details, in particular to understand the limits of the trained approach. It would also be interesting to describe/observe the performance of the methods at the edge of its observation space. Does it break dramatically as soon as the environment changes? This last point is particularly relevant because the conclusion claims that the system is robust for practical deployment in the field. Robustness is a property which is hard to quantify for such a system. The reported performance hint at a possibly robust system but no experiments was produced to explicitly test the robustness.

Review 3

Overall, the paper is reasonably well written and the figures serve to illustrate the proposed system and results. There were a few typographical and grammatical errors that should be addressed in revising the paper. There were also a few design decisions that need to be explored in more detail to allow a reader to grasp the rationale for these choices. It was a bit unclear how the ground-truth data for the low-level controller was generated. Was this based on human-labelled examples or previous robot experience? How can the performance of the robot be guaranteed in terms of obstacle avoidance if the labels are being supplied by an expert? The simplification of the environment to a simple 2D goal location is a bit questionable, particularly in a coral reef environment where the interesting regions tend to be significantly more rugose than surrounding areas of sand. It is not clear why a 3D goal was not used - nor what the implication of taking depth/altitude into account would be on computational efficiency and goal convergence. The choice of specifying goals relative to the current position may make it difficult to direct the system towards specific geographic locations. What would the implication be of specifying desired waypoints in global coordinates - or is the problem that the robot doesn't have a notion of its global location and so must rely on navigating within its current local frame of reference? Fig. 10 is designed to illustrate how achievable trajectories are stitched together form more complex trajectories of waypoints. The training trajectories look highly asymmetric and are likely to be impacted by effects such as currents and wave motion. It is not clear why a parametric model couldn't be used in this case - nor why the robot wasn't capable of turning on the spot (given that it has this capacity). It appears that the desired behaviour was to maintain a constant forward speed but this appears to come at the expense of the ability to follow more complex trajectories.