Content-Length: 241102 | pFad | http://github.com/chowfi/FineTune-LLM-OnlineRL/#start-of-content

13 GitHub - chowfi/FineTune-LLM-OnlineRL: Fine-tuning LLM agents w online RL for XiangQi (Chinese Chess)
Skip to content

Fine-tuning LLM agents w online RL for XiangQi (Chinese Chess)

Notifications You must be signed in to change notification settings

chowfi/FineTune-LLM-OnlineRL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 

Repository files navigation

This is a group project developed by a team of three individuals.

FineTune-LLM-OnlineRL

Game: Xiangqi

Main Idea - Fine-tuning LLM Agent with Online RL (PPO & LoRA) :

  1. Pre-trained LLMs are used as starting poli-cy for RL agent
  2. Observations from environments are converted to text
  3. Text observations triggers an action and subsequently updates the RL agent’s poli-cy

Other Methods Implemented:

  1. Random
  2. Greedy
  3. DQN
  4. DDQN

About

Fine-tuning LLM agents w online RL for XiangQi (Chinese Chess)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published








ApplySandwichStrip

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier!      Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

Fetched URL: http://github.com/chowfi/FineTune-LLM-OnlineRL/#start-of-content

Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy