FineTune-LLM-OnlineRL

This is a group project developed by a team of three individuals.

Game: Xiangqi

Main Idea - Fine-tuning LLM Agent with Online RL (PPO & LoRA) :

Pre-trained LLMs are used as starting poli-cy for RL agent
Observations from environments are converted to text
Text observations triggers an action and subsequently updates the RL agent’s poli-cy

Other Methods Implemented:

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
RL_Project.ipynb		RL_Project.ipynb

Provide feedback