diff --git a/index.html b/index.html index 37139f8..0998552 100644 --- a/index.html +++ b/index.html @@ -66,7 +66,7 @@

Eureka: Human-Level Reward Design with

-->
- Jason Ma1 2, + Yecheng Jason Ma1 2, William Liang1, Guanzhi Wang3, @@ -88,19 +88,19 @@

Eureka: Human-Level Reward Design with 4UT Austin

-
+ -
+
@@ -281,20 +281,13 @@

Eureka: Human-Level Reward Design with

Abstract

- Prompt-based learning has emerged as a successful paradigm in natural language processing, where - a single general-purpose language model can be instructed to perform any task specified by input - prompts. Yet task specification in robotics comes in various forms, such as imitating one-shot - demonstrations, following language instructions, and reaching visual goals. They are often - considered different tasks and tackled by specialized models. We show that a wide spectrum of - robot manipulation tasks can be expressed with multimodal prompts, interleaving textual - and visual tokens. Accordingly, we develop a new simulation benchmark that consists of thousands - of procedurally-generated tabletop tasks with multimodal prompts, 600K+ expert trajectories for - imitation learning, and a four-level evaluation protocol for systematic generalization. We - design a transformer-based robot agent, VIMA, that processes these prompts and outputs motor - actions autoregressively. VIMA features a recipe that achieves strong model scalability and data - efficiency. It outperforms alternative designs in the hardest zero-shot generalization setting - by up to 2.9x task success rate given the same training data. With 10x less training data, VIMA - still performs 2.7x better than the best competing variant. + Large Language Models (LLMs) have excelled as high-level semantic planners for sequential decision-making tasks, but how to use them to learn complex low-level manipulation tasks, such as dexterous pen spinning, remains an open problem. + We bridge this fundamental gap and present Eureka, a human-level reward design algorithm powered by large language models for reinforcement learning. + Eureka exploits the remarkable zero-shot generation, code-writing, and in-context improvement capabilities of state-of-the-art LLMs, such as GPT-4, to perform in-context evolutionary optimization over reward code; + the resulting rewards can then be used to acquire complex skills via reinforcement learning. Without any task-specific prompting or pre-defined reward templates, Eureka generates reward functions that outperform expert human-engineered rewards on + 83% of the tasks in a diverse suite of 29 open-sourced RL environments that include 10 distinct robot morphologies, leading to an average normalized improvement of 54%. + The generality of Eureka enables a new type of reinforcement learning from human feedback (RLHF), readily incorporating human oversight to in-context improve the quality and the safety of the generated rewards. + Finally, using Eureka rewards in a curriculum learning setting, we demonstrate for the first time a simulated 5-finger shadow hand capable of performing pen spinning tricks, adeptly rotating a pen in circles at human speed.

@@ -302,7 +295,7 @@

Abstract

-
+ -
+
-
+
- User instruction: Turn on the faucet.[sep]import numpy as np\\reset_reward() # This is a new task so reset reward; otherwise we don't need it\\set_l2_distance_reward('palm', 'faucet_handle')\\set_joint_fraction_reward('faucet', 1) # Open the faucet\\execute_plan(4)
-
+

-
@@ -671,7 +697,7 @@

-
+ -
+