*Dicong Qiu, Yibiao Zhao, Chris Baker*

Autonomous agents are limited in their ability to observe the world state. Partially observable Markov decision processes (POMDPs) model planning under world state uncertainty, but POMDPs with multimodal beliefs, continuous actions, and nonlinear dynamics suitable for robotics applications are challenging to solve. We present a dynamic programming algorithm for planning in the belief space over discrete latent states in POMDPs with continuous states, actions, observations, and nonlinear dynamics. Unlike prior belief space motion planning approaches which assume unimodal Gaussian uncertainty, our approach constructs a novel tree-structured representation of possible observations and multimodal belief space trajectories, and optimizes a contingency plan over this structure. We apply our method to problems with uncertainty over the reward or cost function (e.g., the configuration of goals or obstacles), uncertainty over the dynamics, and uncertainty about interactions, where other agents' behavior is conditioned on latent intentions. Three experiments show that our algorithm outperforms strong baselines for planning under uncertainty, and results from an autonomous lane changing task demonstrate that our algorithm can synthesize robust interactive trajectories.

Start Time | End Time | |
---|---|---|

07/15 15:00 UTC | 07/15 17:00 UTC |

This paper presents a variant of differential dynamic programming that allows for optimization with latent discrete dynamics parameters. A POMDP is defined over observable continuous state and action spaces with non-linear dynamics. The dynamics are further parameterized by latent discrete variables. An algorithm is derived that performs DDP over a horizon with all possible assignments of the latent parameters. The method is evaluated in 3 toy tasks and compared to other variants of DDP. The method is described well and the algorithm is stated clearly. The experiments provide a nice illustrative example of the algorithm execution which helps understanding the method. Further, the addressed POMDP is sufficient to model an interesting range of tasks. I see 3 main points of potential improvement that the paper could benefit from. The latent discrete variables as defined in Fig. 1 have the limitation that they are not controllable, i.e. they are not functions of the actions. If this restriction was removed, the method could be applied to an even broader range of dynamical systems (for example, manipulation / locomotion with discrete contact variables, etc.). One drawback of the algorithm is the exponential complexity w.r.t horizon length and dimension of the discrete parameter Z. I like the suggestion in Section IV C) for a naive way of addressing the issue. It would be interesting to see if there are other ways to exploit structure in typical use-cases of Z that could help make the method computationally tractable. The experiments are focused on cases where Z is a binary variable. It would be interesting to showcase PODDP on more complex problems. Further, I am missing a comparison of computational efficiency with the baseline algorithms.

The topic of the paper is relevant and the authors do a good job motivating and framing it within the state of the art. The approach is reasonably well explained and the paper is easy to read. However, the approach seems to take a strong assumption and a direct adaptation of known algorithms; doesn't compare itself against interesting baselines; and is not proven to scale to any kind of realistic system with multimodal beliefs. More specifically, I'd like to see the following issues addressed in the rebuttal phase: 1. The algorithm is dependent on the fact that one can build a discrete tree by assuming a discrete latent variable and MLO observations. The latter assumption seems rather strong, but there is no discussion on it in the paper and the empirical evaluation doesn't study its impact. Furthermore, it seems to me that after one makes this strong assumption, the resulting algorithm doesn't really bring significant novelty to the state of the art. 2. Given the assumptions made that allow for a discrete tree representation of the trajectories, couldn't one use something like POMCP directly, even if with a coarse discretization of actions? This would be a nicer baseline to compare against. 3. The experiments only deal with binary latent variables, which kind of defeats the purpose of developing an approach for multimodal beliefs. Furthermore, no results on scalability are provided. The approach is only evaluated on really small examples, making me doubt its feasibility for realistic problems.

The paper is technically sound and all necessary details are given. The approach is well evaluated in different experiments. Nevertheless, the paper can be improved in some areas. Although the evaluation is well done, the presentation of the results can be improved. For instance, the authors should show an overview of the considered scenario in part A of the evaluation. The graphs in Fig. 3 are hard to read and interpret. Considering Fig. 3b, the authors should briefly explain why two trajectories are not reaching the goal for completeness. The readability of the Fig. 4 and 5 can also be improved. In particular, Fig. 5 misses labels to indicate important details. For reproducibility, the authors should summarize necessary parameters and details of each scenario. The evaluation can further be improved by adding computation times and showing scenarios in which the other two approaches (MLDDP and PWDDP) may perform better. Some other minor issues are: * Usually, autonomous vehicles have no uncertainty in "the mode of the vehicle and its components" (see introduction). The authors should provide references here. * For completeness, the authors should show additional steps between Eq. 4 and 5. * Since POMDP solvers significantly differ in the performance, the authors should provide more details on the performance of their proposed solution, e.g., computation time, complexities, depth of the tree and branching factor. * On p. 5, the authors should give more explanations (and references) to the claim "because perturbations can push the belief off of the |Z|-1-dimensional simplex". Also, provide the necessary computation steps afterwards. * What variable corresponds to the "uncertainty level" (see results sections)? The authors should refer to this variable. * The trajectory partition needs to be explained in more detail. * Can the authors comment on improving the performance for T>=5? * For easier reference, can the authors mention the meaning of each variable in Alg. 1 again? The paper is well written and the authors guide the reader through each section. Yet, minor issues are: * The BibTex files needs revision, e.g., capitalization of "Markov" in [1], "CHOMP" in [2], conference name missing in [5], full author list missing in [12], pages missing in [15], [22], [25], [26], [27]. Please check each reference. * The colon before Eq. 3 is misplaced. * What do the authors want to say when using \doteq in Eq. 1? * Why are the authors using \cdot in Eq. 1? * Please use big brackets if possible to improve readability, e.g., in Eq. 5. * A bracket is misplaced in "a(n" on p. 4. * The authors should use proper set notations for "1:|Z|" in caption of Fig. 2. * The authors should use \eqref when referencing equations (instead of writing "Equation 1" or Eq. 7"). * Please reference algorithms using \ref in the text. * In Eq. 8, the variable \tilde{Q} is dangling and should be placed on the page before instead. * The authors should balance the columns on the last page. The proposed approach is interesting and addresses an important problem. The results are promising and well presented. The authors are encouraged to revise the paper for improved readability.