Project Domains | Mentors | Project Difficulty |
---|---|---|
Reinforcement Learning, Robotics Simulation, Imitation Learning | Ansh, Prajwal | Hard |
Project Description
Kurma is a budget quadruped with a turtle-like frame.
Your mission: teach it to walk—first in simulation, then in the real world.
-
Policy Learning (PPO)
- Build a Gym-compatible MuJoCo environment for Kurma.
- Craft rewards for speed, stability, and energy use.
- Train a continuous-action neural policy using Proximal Policy Optimization.
-
Imitation / Inverse RL (stretch)
- Script or joystick-teleop a stable gait.
- Use Inverse RL to learn a reward that reproduces the demonstration, then refine with PPO.
Deploy the final network to a Raspberry Pi controller, drive affordable servos, and film an untethered demo of Kurma on the move.
Resources
Kurma
Intro to Reinforcement Learning
Inverse RL
PPO – Explained