Florian Wirnshofer (Siemens AG); Philipp Sebastian Schmitt (Siemens AG); Georg von Wichert (Siemens AG); Wolfram Burgard (University of Freiburg)
In this paper, we present an integrated, model-based system for state estimation and control in dynamic manipulation tasks with partial observability. We track a belief over the system state using a particle filter from which we extract a Gaussian Mixture Model (GMM). This compressed representation of the belief is used to automatically create a discrete set of goal-directed motion controllers. A reinforcement learning agent then switches between these motion controllers in real-time to accomplish the manipulation task. The proposed system closes the loop from joint sensor feedback to high-frequency, acceleration-limited position commands, thus eliminating the need for pre- and post-processing. We evaluate our approach with respect to five distinct manipulation tasks from the domains of active localization, grasping under uncertainty, assembly, and non-prehensile object manipulation. Extensive simulations demonstrate that the hierarchical policy actively exploits the uncertainty information encoded in the compressed belief. Finally, we validate the proposed method on a real-world robot.
Start Time | End Time | |
---|---|---|
07/14 15:00 UTC | 07/14 17:00 UTC |
In this paper, the authors propose an integrated system for state estimation and control in dynamic manipulation tasks with partial observability. The system is provided only with an initial belief of the state at the starting time and, then, only joint states are received as feedback. The pipeline they designed consisted of the following steps: 1) they track the belief over the system state using a particle filter; 2) they compressed the belief in a GMM model; 3) they use the GMM model to automatically create a discrete set of goal-directed motion controllers; 3) they finally use RL to learn a high-level switcher which selects the low-level motion controllers. Even though the results are interesting and promising, the paper is not clear and lacks of explanation in several points. For this reason, I'm not convinced about the usefulness and relevance of the proposed method, in that it sounds over complicated and not so clear. I think a better shape of the paper would help in evaluating the contributions better. Here after, I'll give more detailed feedback on each section. Abstract: In my opinion, saying immediately why the environment is partially observable would value the paper much more. In general, the authors never say explicitly why the system is PO, making the reading not to clear at a first glance. Introduction: Again, also here the PO is not properly explained. It seems the authors attribute the PO to the fact that the system has high-dimensional state and action space, nonlinear dynamics, multimodal distributions and real-time constraints, while it is instead because the only feedback is given by the joint states. Even though this is written, it's not explained properly. Also, what does this sentence mean: "Our system considers the full state space of objects"? Related work: This section includes several works although they are presented in a confusing way in my opinion. A better guiding thread of the speech would improve this section a lot. Problem formulation and background: I would not highlight the richness in contact of the considered manipulation tasks. All manipulation tasks require contact with the object and, in particular, the ones addressed in this paper do not require so much contact, in that they mostly consist of pushing tasks. Controlling manipulation under uncertainty: In section C., the authors do not explain how the low-level Cartesian controllers are generated, although this seems to be the central contribution. In my opinion, equations (8) and (9) make the reader loose the focus. I would rather explain more why acceleration-resolved controllers are advantageous since it is not clear enough from the authors' explanation. Also the reinforcement learning setting in section D. is not very well explained. Implementation and results: The authors should justify more why they use that RL particular training, instead of relying on more standard ways of training policies. The benchmarks problems chosen for the evaluation are very nice and challenging. Results are good and pretty clear, even though Fig. 8 could be either explained better or replaced with another kind of plot, since it is not very intuitive. Regarding the real world evaluation, more details would be required to better evaluate the quality of the results. Also, is the evaluation executed as zero-shot transfer? Mentioning that would be clearer. Discussion: It's good to read that the authors are aware of some of the limitations of their work.
The paper proposes an interesting approach to the challenging problem of hierarchical learning in manipulation. The approach is well demonstrated in the experimental section and the belief-space formulation is interesting and sets the contribution apart from other techniques. I also enjoyed the introduction section and thought that the related work section was comprehensive. I have three comments that I believe may improve the paper. First, in section III.B. the authors mention that z_t is the only feedback observation, denoting the noisy measurements of the states of the robot and does not include the object. However, in the following subsections the particle filtering and belief propagation algorithms operate on the uncertainty of the object. There is some disconnect here and the wording/explanation could be clarified here. Following on this point, the approach depends on an a priori specific number of Gaussian Mixtures to collapse the particles used to propagate belief, but there is no explicit mention/ablation of the effect of variation of their count. There is also an implicit assumption that objects can be detected, counted, and tracked which would could be made explicit early on in the paper. This is a sharp distinction w.r.t. to other approaches that operate on the raw sensory data. Second, particle depletion seems to be a significant challenge w.r.t. the approach. Particle depletion is commonly encountered in state-estimation, and the paper takes one approach to injecting particles in the physics representations to attempt to fix this issue. This is a solid idea. In the experimental section, the authors attribute some loss in performance to this issue but it appears that this was not the case in the simulation studies. If this is the case, why does particle depletion play a bigger role in real experiments? Any guesses would be extremely helpful to others deploying this or similar approaches. Third, there is no clear explanation on how the t_i targets for each component policy are chosen w.r.t. to the GMM beliefs of the objects in the scene. There seems to be some hand-engineering here such as making sure contact-free motion is executed before the robot pushes the object. If this is the case, there should be a clarification in the text alongside a more explicit explanation of determining t_i. It would be interesting to see the effect of ablation for these choices on the performance of the approach given a fixed task.
While the interesting proposed frame work, the evaluation (especially experimental) is not enough. Rather, while the authors performed some elemental experiment, it seems that it is difficult to implement the proposed method to the real experiment. The contact condition will be very different between simulation and experiment. Moreover, the effect of sensor noise is included in the real experiment (although the simulation is nicely performed). This is especially be crucial under nonprehensile manipulation. Justification of the compressed expression of the particle filter is not enough.