Skip to content

Fine-tuning LLM agents w online RL for XiangQi (Chinese Chess)

Notifications You must be signed in to change notification settings

chowfi/FineTune-LLM-OnlineRL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 

Repository files navigation

This is a group project developed by a team of three individuals.

FineTune-LLM-OnlineRL

Game: Xiangqi

Main Idea - Fine-tuning LLM Agent with Online RL (PPO & LoRA) :

  1. Pre-trained LLMs are used as starting policy for RL agent
  2. Observations from environments are converted to text
  3. Text observations triggers an action and subsequently updates the RL agent’s policy

Other Methods Implemented:

  1. Random
  2. Greedy
  3. DQN
  4. DDQN

About

Fine-tuning LLM agents w online RL for XiangQi (Chinese Chess)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy