"topic:mcts" — Search

SE-Agent is a self-evolution framework for LLM Code agents. It enables trajectory-level evolution to exchange information across reasoning paths via Revision, Recombination, and Refinement, expanding the search space and escaping local optima. On SWE-bench Verified, it achieves SOTA performance

Python23829Updated 5 days ago

claude-codecode-agentcode-fixmctsself-evolveswe-agentswe-benchtest-time-scaling

sungyubkim/Deep_RL_with_pytorch

A pytorch tutorial for DRL(Deep Reinforcement Learning)

Jupyter Notebook22547Updated 3 weeks ago

a2cc51counterfactual-regret-minimizationdeep-reinforcement-learningdqngailhedgeiqnmctsppopytorchqr-dqnrandom-network-distillationself-imitation-learningsoft-actor-criticuct

initial-h/AlphaZero_Gomoku_MPI

An asynchronous/parallel method of AlphaGo Zero algorithm with Gomoku

Python21945Updated 1 week ago

algorithmalphagoalphazeroalphazero-gomokudeep-reinforcement-learningdirichlet-distributiongomokumctsmpi4pyparalleltensorflowtensorlayertree-search

thuxugang/doudizhu

AI斗地主

Python18667Updated 4 weeks ago

aicard-gamedoudizhudqnmctsreinforcement-learning

zjeffer/chess-deep-rl

Research project: create a chess engine using Deep Reinforcement Learning

Jupyter Notebook17113Updated 3 weeks ago

aialphazeroartificial-intelligencechesschess-enginedeep-learningdeep-reinforcement-learningmachine-learningmctsneural-networkneural-networksreinforcement-learning

kaesve/muzero

A clean implementation of MuZero and AlphaZero following the AlphaZero General framework. Train and Pit both algorithms against each other, and investigate reliability of learned MuZero MDP models.

Jupyter Notebook16827Updated 2 months ago

alphazerodeep-learningdeep-reinforcement-learningmctsmuzeroreinforcement-learningtensorflowtensorflow2tf2

PuYuuu/vehicle-interaction-decision-making

The decision-making of multiple vehicles at intersection bases on level-k game and MCTS

C++15053Updated 3 days ago

game-theorylevel-kmcts

akolishchak/doom-net-pytorch

Reinforcement learning models in ViZDoom environment

Python13019Updated 7 months ago

agentbehavior-treedoomdoomnet-track1learningmctsppopytorchreinforcementreinforcement-learningvizdoom

rlglab/minizero

[IEEE ToG] MiniZero: An AlphaZero and MuZero Training Framework

C++12436Updated 2 days ago

alphazeroatariboard-gamesdeep-reinforcement-learninggogomokugumbel-alphazerogumbel-muzerohexkillall-gomctsmonte-carlo-tree-searchmuzeronogoothelloouter-open-gomokureinforcement-learningtictactoe

CGLemon/Sayuri

AlphaZero based engine for the game of Go (圍棋/围棋).

C++12013Updated 2 days ago

alphagoalphazerobadukdeeplearninggumbel-alphazeromctssayuriweiqi

manyoso/allie

Allie: A UCI compliant chess engine

C++11021Updated 2 weeks ago

alphabetaalphazerochesschess-enginedeepmindmctsneural-network

YoujiaZhang/AlphaGo-Zero-Gobang

AlphaGo-Zero-Gobang 是一个基于强化学习的五子棋(Gobang)模型，主要用以了解AlphaGo Zero的运行原理的Demo，即神经网络是如何指导MCTS做出决策的，以及如何自我对弈学习。源码+教程

Python11010Updated 2 weeks ago

aialphagoalphazerodeep-learninggobanggomukuguimctsresidual-networkstensorflow

lowrollr/turbozero

fast + parallel AlphaZero in JAX

Python11011Updated 2 days ago

alphazerogpu-accelerationjaxmctsmonte-carlo-tree-searchreinforcement-learningvectorization

Urinx/ReinforcementLearning

Reinforcing Your Learning of Reinforcement Learning

Python9622Updated 1 month ago

advantage-actor-criticalphagoalphago-zeroatari-2600cartpoleddpgdoomdqnfrozenlakegomokumctspolicy-gradientppoq-learningreinforcement-learningspace-invaderstic-tac-toe

Wangmerlyn/MCTS-GSM8k-Demo

This is a repo for showcasing using MCTS with LLMs to solve gsm8k problems

Python958Updated 1 month ago

llm-inferencellmsmcts

blanyal/alpha-zero

AlphaZero implementation for Othello, Connect-Four and Tic-Tac-Toe based on "Mastering the game of Go without human knowledge" and "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm" by DeepMind.

Python9428Updated 2 weeks ago

alpha-zeroalphago-zeroalphazeroconnect-fourconnect4deep-learningdeepmindgamemachine-learningmctsothelloreinforcement-learningresnetreversiself-playtensorflowtic-tac-toetictactoe

Page 1 of 16