"topic:self-play" — Search

90 results for “topic:self-play”

A clean implementation based on AlphaZero for any game in any framework + tutorial + Othello/Gobang/TicTacToe/Connect4 and more

Jupyter Notebook4.4k1.1kUpdated 13 hours ago

alpha-zeroalphagoalphago-zeroalphazerodeep-learninggobanggomokukerasmctsmonte-carlo-tree-searchneural-networkothellopytorchreinforcement-learningself-playtensorflowtf

opendilab/DI-engine

OpenDILab Decision AI Engine. The Most Comprehensive Reinforcement Learning Framework B.P.

Python3.6k431Updated 7 hours ago

ataridistributed-reinforcement-learningdistributed-systemdrlexploration-exploitationimitation-learningimpalainverse-reinforcement-learningminigridmodel-based-reinforcement-learningmujocomultiagent-reinforcement-learningoffline-rlpythonpytorch-rlr2d2reinforcement-learningreinforcement-learning-algorithmsself-playsmac

opendilab/LightZero

[NeurIPS 2023 Spotlight] LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios (awesome MCTS)

Python1.5k188Updated 1 day ago

alpha-beta-pruningalphazeroatariboard-gameboard-gamescontinuous-controlefficientzerogomokugumbel-muzerogymmctsmcts-algorithmmonte-carlo-tree-searchmuzeropytorchreinforcement-learningsampled-muzeroself-playstochastic-muzerotictactoe

opendilab/DI-star

An artificial intelligence platform for the StarCraft II with large-scale distributed training and grand-master agents.

Python1.3k124Updated 1 day ago

artificial-intelligencedeep-learningdeep-reinforcement-learningleaguereinforcment-learningself-playstarcraft2

uclaml/SPIN

The official implementation of Self-Play Fine-Tuning (SPIN)

Python1.2k104Updated 2 hours ago

deep-learningfine-tuninglarge-language-modelsself-play

uclaml/SPPO

The official implementation of Self-Play Preference Optimization (SPPO)

Python58347Updated 1 week ago

deep-learningfine-tuninglarge-language-modelsrlhfself-play

inspirai/TimeChamber

A Massively Parallel Large Scale Self-Play Framework

Python36138Updated 2 months ago

deep-reinforcement-learningisaac-gymmulti-agentreinforcement-learningself-play

spiral-rl/spiral

SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning

Python17720Updated 1 week ago

large-language-modelsmulti-agent-reinforcement-learningreinforcement-learningself-play

Naton1/osrs-pvp-reinforcement-learning

Train a neural network to PvP in Old School RuneScape using reinforcement learning.

Java15955Updated 1 week ago

artificial-intelligencedeep-learninggymjavamachine-learningoldschool-runescapeosrsppopythonpytorchreinforcement-learningrspsrunescapeself-play

ChuaCheowHuan/gym-continuousDoubleAuction

A custom MARL (multi-agent reinforcement learning) environment where multiple agents trade against one another (self-play) in a zero-sum continuous double auction. Ray [RLlib] is used for training.

Jupyter Notebook15331Updated 1 month ago

double-auctionfinancial-engineeringgym-environmenthigh-frequency-tradinglimit-order-booklstmmarket-microstructuremarlmulti-agent-reinforcement-learningn-playerppoquantitative-financequantitative-tradingrayrllibself-playzero-sumzero-sum-games

blanyal/alpha-zero

AlphaZero implementation for Othello, Connect-Four and Tic-Tac-Toe based on "Mastering the game of Go without human knowledge" and "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm" by DeepMind.

Python9428Updated 2 weeks ago

alpha-zeroalphago-zeroalphazeroconnect-fourconnect4deep-learningdeepmindgamemachine-learningmctsothelloreinforcement-learningresnetreversiself-playtensorflowtic-tac-toetictactoe

Alibaba-Quark/SSP

Search Self-Play: Pushing the Frontier of Agent Capability without Supervision

Python948Updated 1 day ago

agentalibabadeep-researchllmreinforcement-learningself-play

cestpasphoto/alpha-zero-general

A very fast implementation of AlphaZero, applied to games like Splendor, Santorini, The Little Prince, … Browser version available

Python6219Updated 1 week ago

alphagoalphago-zeroalphazeromachikorominivillesnumbapythonpytorchreinforcement-learningsantorinisantorini-gameself-playsplendorthe-little-prince

seungeunrho/football-paris

The exact codes used by the team "liveinparis" at the kaggle football competition ranked 6th/1141

Python5712Updated 1 week ago

gfootballkaggleliveinparisppopytorchreinforcement-learningself-play

dellalibera/gym-backgammon

Backgammon OpenAI Gym

Python5415Updated 3 weeks ago

artificial-intelligencebackgammonbackgammon-gamegamegymgym-backgammongym-envopenai-gymopenai-gym-environmentreinforcement-learningself-playtd-gammontd-learningtemporal-differencing-learning

dellalibera/td-gammon

TD-Gammon implementation

Python5112Updated 1 month ago

artificial-intelligencebackgammonconvolutional-neural-networksgameneural-networkpytorchreinforcement-learningself-playtemporal-differencing-learningvalue-function

tobiasemrich/SchafkopfRL

AI agents for the bavarian card game Schafkopf trained with reinforcement learning

Python406Updated 1 week ago

card-gameimperfect-information-gameppopytorchreinforcement-learningschafkopfself-play

thu-nics/MARSHAL

MARSHAL: Incentivizing Multi-Agent Reasoning via Self-Play with Strategic LLMs

Python391Updated 5 days ago

agentllmmulti-agent-systemsreinforcement-learningself-play

Sebastian-Schuchmann/Self-Play-TicTacToe-AI-ML-Agents-

A Self Play reinforcement learning Agent learns to play TicTacToe using the ML-Agents Framework in Unity.

C#399Updated 2 weeks ago

artificial-intelligencemachine-learningml-agentsneural-networkreinforcement-learningself-playtensorflowunityunity3d

ShibiHe/Model-Free-Episodic-Control

This is the implementation of paper Model Free Episodic Control

Python3610Updated 10 months ago

deepdqn-epfictitiousgame-theoryknnnumpyopenai-gymself-play

arianahejazyan/Athena

A UCI-compatible four-player chess engine

C++304Updated 2 weeks ago

aiartificial-intelligencebitboardbitboard256chesschess-aichess-enginechess-engineschess-variantscppdeep-learningdeep-rlfour-player-chessgamedevneural-networksnnuereinforcement-learningreinforcement-learning-agentself-playuci

sirmammingtonham/alphastone

Using self-play, MCTS, and a deep neural network to create a hearthstone ai player

Python297Updated 1 year ago

aialpha-zerodeep-learningdeep-reinforcement-learninghearthstoneismctsmonte-carlo-tree-searchpytorchself-play

cmubig/sorts

Code base for Social Robot Tree Search (SoRTS).

Python265Updated 10 months ago

intent-predictionmctsself-playsocial-navigation

mbaske/ml-selfplay-fighter

Self-Play Boxing Match made with Unity Machine Learning Agents

C#239Updated 6 months ago

ml-agentsself-playunity

af1tang/convogym

A gym environment to train chatbots.

Python213Updated 1 year ago

active-learningchatbot-platformconvogymdialog-systemsmachine-learningnatural-language-generationnatural-language-processingnlppytorchreinforcement-learningself-play

jianzhnie/RLZero

A clean and easy implementation of MuZero, AlphaZero and Self-Play reinforcement learning algorithms for any game.

Python170Updated 2 months ago

alpha-zeromctsmulti-agentmuzeroreinforcement-learningself-play

backpropper/s2p

Code repository for On the interaction between supervision and self-play in emergent communication (ICLR 2020)

Python152Updated 9 months ago

emergent-communicationself-play

neoyung/connect-4

A reinforcement learning agent trained without prior human knowledge

Jupyter Notebook157Updated 4 months ago

alphago-zerodeep-q-networkexperience-replayreinforcement-learningself-play

sebastianbrzustowicz/Robot-Sumo-RL

Python + PyTorch. Advanced Reinforcement Learning (SAC/PPO/A2C) for ✨autonomous Robot Sumo combat featuring competitive self-play in continuous action spaces.

Python143Updated 2 weeks ago

a2cactor-criticartificial-intelligenceautonomous-robotsmachine-learningminisumomobile-robotsphysics-simulationppopytorchreinforcement-learningreward-shapingrlroboticssacself-playsotastate-of-the-art

Jackory/RPBT

(AAAI24 oral) Implementation of RPPO(Risk-sensitive PPO) and RPBT(Population-based self-play with RPPO)

Python121Updated 11 months ago

competitionmulti-agent-reinforcement-learningpopulation-based-trainingpporeinforcment-learningrisk-sensitive-preferencesself-play

Page 1 of 3