Learning to Manipulate Deformable Objects without Demonstrations


Yilin Wu, Wilson Yan, Thanard Kurutach, Lerrel Pinto, Pieter Abbeel

Abstract

In this paper we tackle the problem of deformable object manipulation through model-free visual reinforcement learning (RL). In order to circumvent the sample inefficiency of RL, we propose two key ideas that accelerate learning. First, we propose an iterative pick-place action space that encodes the conditional relationship between picking and placing on deformable objects. The explicit structural encoding enables faster learning under complex object dynamics. Second, instead of jointly learning both the pick and the place locations, we only explicitly learn the placing policy conditioned on random pick points. Then, by selecting the pick point that has Maximal Value under Placing (MVP), we obtain our picking policy. Using this learning framework, we obtain an order of magnitude faster learning compared to independent action-spaces on our suite of deformable object manipulation tasks. Finally, using domain randomization, we transfer our policies to a real PR2 robot for challenging cloth and rope manipulation.

Live Paper Discussion Information

Start Time End Time
07/15 15:00 UTC 07/15 17:00 UTC

Virtual Conference Presentation

Paper Reviews

Review 1

The paper developed a learning system for creating controllers that can manipulate cloth or rope through a series of pick and place actions. The key challenge is that jointly learning the pick and place locations is difficult, due to the larger search space, and the dependency between the actions. The paper proposes to learn a conditional policy for the placing action alone, and have it observe different picking actions during training. During deployment, the picking action is then selected as the one that achieves the highest value in the value function. This approach seems effective in the presented results and outperformed several baseline methods. The paper is in general well written, and the results look interesting. There are, however, a few issues that merits further clarification and potential experiments, as listed below. First, the experiments presented in this work are largely flattening/extending a deformable object. For these tasks, the exact picking location may not be as important, as suggested by the close gap between uniform pick and the proposed method that optimizes the pick location. It would be interesting to see results for the inverse process, i.e. folding a piece of cloth, or manipulating the rope into certain configurations, where finding the right picking location is more essential and the planning is more difficult. Second, previous work ([67] Wang et al) demonstrated manipulation of rope using a self-supervised learning approach. The tasks seem more complex than the one in this paper for the rope manipulation domain and the approach is more flexible: after one training session it can in theory work for many tasks. I think it would be helpful to illustrate scenarios where the proposed method works better than [67]. Finally, in the video the robot seems to take many unnecessary manipulation steps, before it is able to achieve the task. It’s not clear if it’s due to the difference in the simulation and real-world, or if it is the behavior in simulation as well, which would be a bit concerning.

Review 2

The paper presented a model free RL technique to solve deformable body manipulation problem. Deformable body manipulation is challenging in robotics, I think the proposed strategy of learning placing only and optimizing picking based on placing value approximator can inspire more works. The paper is well written and easy to understand. The main concern I have is whether the approach can generalize to more complex tasks, and the lack of comparison with the existing techniques (see my detailed comments below). From the experiment result, learned picking strategies (including independent and conditional) perform much worse than uniform picking, when the picking up location is constrained to be four corners. But this behavior doesn't appear in the full cloth environment. Why is that? I would imagine the opposite, as learning to pick becomes harder when the action space is larger in full cloth environment. The conclusion that conditional learning speeds up learning in Section V.D. is somewhat inconsistent with the result in Fig.4, as conditional policy actually suffers from mode collapse. The study with the independent and conditional strategy baselines is great, but there are lack of comparison with other existing deformable object manipulation techniques, such as those with demonstration. The task the paper demonstrated on is limited. It is a spreading task and the picking place may not be very important. It would be interesting to see how well this technique can be applied to other more complex tasks such as folding, or knotting. It is good to see sim-to-real works, and simply with domain randomization. From Fig.7, looks like physics randomization has no or even negative impact. Can you explain that?