HMPO: Human Motion Prediction in Occluded Environments for Safe Motion Planning

Jaesung Park (University of North Carolina at Chapel Hill); Dinesh Manocha (University of Maryland at College Park)


We present a novel approach to generate collision-free trajectories for a robot operating in close proximity with a human obstacle in an occluded environment. The self-occlusions of the robot can significantly reduce the accuracy of human motion prediction, and we present a novel deep learning-based prediction algorithm. Our formulation uses CNNs and LSTMs and we augment human-action datasets with synthetically generated occlusion information for training. We also present an occlusion-aware planner that uses our motion prediction algorithm to compute collision-free trajectories. We highlight performance of the overall approach (HMPO) in complex scenarios and observe upto 68% performance improvement in motion prediction accuracy, and 38% improvement in terms of error distance between the ground-truth and the predicted human joint positions.

Live Paper Discussion Information

Start Time End Time
07/15 15:00 UTC 07/15 17:00 UTC

Virtual Conference Presentation

Supplementary Video

Paper Reviews

Review 1

Comments: *. This paper seeks to address the important problem of generating collision-free paths for a robot by predicting human motion in occluded scenes. The idea of introducing occlusion-based constraints in the objective function for motion planning is well motivated. *. Please consider revising the description in the paper to clearly state the inputs and outputs of the different components. For instance, until Section IV, it is unclear what the inputs and outputs are of the human motion prediction component, although this component is referenced multiple times in the initial few sections. *. Could you please highlight the specific novel contributions instead of claiming that the entire approach is novel? The use of deep networks for human motion prediction is not really new, especially when you seem to be using pre-trained features. The novelty here seems to be in the inclusion of the "occlusion masks" to augment the input data vectors. In a similar manner, the optimization-based algorithm for robot trajectory planning is not new; in fact, even the inclusion of additional occlusion-based constraints is not really new. The novelty here seems to be in the particular formulation (and the associated heuristics) introduced in the paper. *. Is the insertion of occlusion (based on forward kinematics) in the human motion tracking datasets accurate? How is the corresponding ground truth determined for experimental evaluation? If this projection is accurate, is it potentially possible to build on such an approach to determine regions of occlusion in images without having to use the deep networks? This would be a more classical approach for predicting human motion, and it may be more computationally efficient. *. The stated improvement in performance, especially in action classification, in the text of the paper does not seem to match the numbers in the table. Does HMPO really improve classification accuracy by 63% or 86% and if so how/why?

Review 2

The authors present an approach for motion planning when working alongside a human. I think this is a good problem to solve. The idea is that the authors train a neural net to solve the problem. The model architecture is based on a CNN component, which uses pretrained ResNet features.They train an LSTM to predict (a) human action, (b) joint positions, and (c) degree of occlusion. They made predictions out to 3 seconds in to the future. I would've liked some extra details about the neural net and training parameters The paper focuses on occlusions caused specifically by the robot arm. This means that instead of collecting a new dataset, they can use simulated images and generate their own augmented datasets. They report prediction results on three different datasets with occlusion. I would have liked to see plots of accuracy over time, instead of just the single accuracy measure reported in Table 1. Error still seems extremely high to me -- at best being 31.8 cm -- but the authors did a good number of comparisons against different baselines. The motion optimization algorithm isn't too novel, but seems thorough and well-explained. I think the biggest problem I have with this is that I'm not sure how this would be used in the real world. The neural net is given both the images with and without the robot occluding the scene, which is a problem. The authors describe some real robot experiments, but they don't show it in their video and it's not clear to me exactly how this would work. In the end, I thought it was a good paper, but not an amazing one. More thorough results would help a lot. Minor notes: - There are some weird artifacts and spacing. On pg. 8, for example, there's a really big gap between two paragraphs. I think the authors could always expand the paper, add more images of their data or experiments, and generally better use space. - "Small caps" captions for tables are pretty annoying, kind of hard to read. - pg. 6: "prevents the robot to occlude" --> "prevents the robot from occluding"