Shared Autonomy with Learned Latent Actions


Hong Jun, Dylan Losey, Dorsa Sadigh

Abstract

Assistive robots enable people with disabilities to conduct everyday tasks on their own. However, these tasks can be complex, containing both coarse reaching motions and fine-grained manipulation. For example, when eating, not only does one need to move to the correct food item, but they must also precisely manipulate the food in different ways (e.g., cutting, stabbing, scooping). Shared autonomy methods make robot teleoperation safer and more precise by arbitrating user inputs with robot controls. However, these works have focused mainly on the high-level task of reaching a goal from a discrete set, while largely ignoring manipulation of objects at that goal. Meanwhile, dimensionality reduction techniques for teleoperation map useful high-dimensional robot actions into an intuitive low-dimensional controller, but it is unclear if these methods can achieve the requisite precision for tasks like eating. Our insight is that---by combining intuitive embeddings from learned latent actions with robotic assistance from shared autonomy---we can enable precise assistive manipulation. In this work, we adopt learned latent actions for shared autonomy by proposing a new model structure that changes the meaning of the human's input based on the robot's confidence of the goal. We show convergence bounds on the robot's distance to the most likely goal, and develop a training procedure to learn a controller that is able to move between goals even in the presence of shared autonomy. We evaluate our method in simulations and an eating user study. See videos of our experiments here: https://youtu.be/7BouKojzVyk.

Live Paper Discussion Information

Start Time End Time
07/14 15:00 UTC 07/14 17:00 UTC

Virtual Conference Presentation

Supplementary Video

Paper Reviews

Review 1

Overall, this paper is well-written and makes a significant contribution with solid evaluation. The main weakness of this paper as written is that it doesn't provide a good sense of its own limitations (perhaps because the authors decided to eliminate a discussion section in favor of including more results). In combination with the confusing use of the terms "goal" and "preference", this means that the authors are effectively substantially overstating the generality of the work in almost every section. More detail on a few key examples of this: 1. As mentioned above, the goals vs preferences language is quite confusing -- and it doesn't really make the distinction the authors want it to. The example is also confusing, since a task like "cut off a piece of tofu and pick it up with the fork" could either be discrete options (cut vs. stab vs. lift) or it could be the kind of continuous motion that I think the authors are trying to talk about, and a task like "reach the tofu" could be discrete options (as the authors intend) but could also involve continuous preference (for example moving along an arc to avoid knocking over a glass of water). Perhaps the authors mean something closer to the target of the motion and the shape of the motion? Overall, I would say the way that the authors describe their work in terms of "goals" and "preferences" feels like an over-reach. 2. As far as I can tell, this paper does not include any input from or testing with users with disabilities. Not every technical paper needs to take a fully participatory design approach, but it's bad form to not even mention this as a limitation (if nothing else, it means that the subjective results need to be taken with a grain of salt). The authors need to at the very least include a discussion of how things might change if evaluated with target users. As a start, consider how things might change with: - participants who use a wheelchair mounted arm full-time and are therefore extremely expert - participants who have limited ability to provide input (for example, who find it easier to move a joystick in one direction than another) - disabled participants who are particularly sensitive to having their autonomy curtailed - participants who are familiar with one method of controlling the arm (e.g., mode-switching) and are given this new method - participants with multiple disabilities (e.g., low vision or cognitive impairments) Alternatively, if I have misunderstood, then the authors should provide significantly more detail on the profile of the participants (what type of disability, their level of familiarity with assistive arms, etc.). 3. I found the description of the remapping function to be a bit glib; it makes sense that you can change reference frame for many physical manipulation actions, but the authors should provide more description of the limitations of this approach. For example, how would you know how to remap from opening a door (side hinge) to opening an oven (bottom hinge)? Remapping from picking up an espresso cup to picking up a large coffee mug? For a remapping more complicated than the location of an object this is not a trivial problem (arguably, this type of affordance remapping/transfer learning is still an open problem in robotics). Minor comments and questions: What were the demonstrations for the simulated reaching task? Was there a set of reaching demonstrations for one object that were then remapped to the different goals? Or were there demonstrations provided for each target? University name is included in study description.

Review 2

This is an interesting paper presenting solid work, and contains many well-thought-through aspects of assistive teleoperation for reaching and grasping tasks. I liked the breadth of the presentation, which included a good motivation, new computational methods, and two kinds of analysis. The paper is also well written and the Figures are clear. The authors make a convincing case that their method of switching control modes based on the confidence of the coarse shared autonomy is useful and beneficial for successful assisted teleoperation. I do not have major comments on this paper. In terms of clarity, I would recommend better distinguishing the so-called "Goals" from "Preferences". In the second half of the paper, it is not always clear if the goals of the controller include the "preferences" or just the "goals". Moreover, do preferences have a temporal aspect to them or are they static orientations, as it sometimes seems in the evaluation part of the paper. Revising this for more clarity would help readers. The user study has quite a small sample. This is understandable, since the main contribution of this work is the method and algorithm. Still, a remark on the statistical limitation of such a small sample is necessary. Finally, authors could have done a better job anonymizing. A central citation is to an unpublished ArXiv paper that is quite similar to the submission.

Review 3

Summary: This work proposes an approach that enables robots to reach high-level goals as well as adapt to human preferences. They combine shared autonomy and latent actions so that humans can provide inputs with different meanings (e.g., move towards the left or right vs. adjust the fork orientation). The authors include a theoretical analysis on the robot’s convergence to the human’s goal and the robot’s adaptation to changing and new goals. Experiments on both a simulated robot and a real robot show that both latent actions and shared autonomy together lead to higher efficiency on the task. The authors also conduct a user study to determine how the method works with real human users. They found that the time taken to complete the task was lowest for their method LA+SA and users were most comfortable when the robot used their approach. Originality: I think the method is quite novel as it allows a human to provide input with different meaning, even with the same joystick control. The work also provides multiple perspectives on the problem: a formulation of the problem, theoretical analysis, simulation experiments, real robot experiments, and a user study. This makes the work have an original and holistic perspective on the problem, considering the mathematical side as well as the human-robot interaction side. Clarity: The paper was very well-written. The figures were nicely done and refined. The contributions were laid out clearly in the beginning. The hypotheses for the user study were clearly written. Overall, well-done! Quality: The quality of the paper is quite strong. The authors provided a nice theoretical analysis. They also varied several knobs in the simulation experiments, including human rationality, when the human changes goals in the task, fast vs. slow learner, etc. I particularly appreciated the user study, as many works stop at simulated experiments. It was encouraging to see that the time taken was reduced and that participants reported more positively for their condition. Significance: The work is significant and would be of great use for the community to think about how human input can be used in different ways to guide a robot towards high-level goals as well as low-level preferences. Other comments: - How do the demonstrations have the belief included? It seemed like the demonstrations would be provided before the robot starts interacting with the human. - Why are the beta values for the real robot experiments different from the beta values for the simulation robot experiments? If this was a purposeful decision, it would be good to know why. - In section V.D, there’s a small typo (a equivalent → an equivalent). - The right figure of Figure 8 is missing the “LA+SA” label. - The fit in Figure 5 doesn’t look linear. Could you please clarify? - It was a little hard to understand the left side of Figure 6. There are two “scoop in icing” images in the first row. Also, why does “stab morsel” and “dip in rice” have lower preference alignment than “scoop in icing”? Overall, it’s a really interesting and well-polished paper!