ViReL

Project Domains	Mentors
Reinforcement Learning, Large Language Models (LLMs), Vision-Language Models (VLMs), Machine Learning	Mehul Totala, Rishiraj Rajgor

Project Description

This project teaches students to finetune Vision-Language Models (VLMs) using Reinforcement Learning from scratch. VLMs are AI models that can see images and respond in natural language. Normally they are trained with supervised learning (showing them correct answers), but recent breakthroughs have shown that training them with RL (letting them try, get rewards for correct answers, and improve) produces dramatically better reasoning abilities.

Mentees will go from zero ML knowledge to running their own GRPO training experiments, understanding the full pipeline: VLM architecture, reward function design, RL training loops, and evaluation on standard benchmarks. The project balances theory (reading key papers) with hands-on implementation (writing real training code).

Resources

DeepSeek-R1 Technical Report
Attention is all you need (Paper)
Intro to RL
Transformers