AZ
AzulImplementation/AzulMARL
A Multi-agent reinforcement learning environment for Azul board game
AzulMARL
PettingZoo AI env for Azul multiplayer board game to enable AI agent training.
Most important libraries used
Usage
Initiating the env via PettingZoo
from azul_marl_env import azul_v1_2players, azul_v1_3players, azul_v1_4players
env_2players = azul_v1_2players()
env_3players = azul_v1_3players()
env_4players = azul_v1_4players()
env_2players_custom_max_moves = azul_v1_2players(max_moves=100)Initiating the env directly
from azul_marl_env import AzulEnv
env = AzulEnv(player_count=2)
env = AzulEnv(player_count=3)
env = AzulEnv(player_count=4)
env = AzulEnv(player_count=2, max_moves=100)Making moves
from azul_marl_env import azul_v1_2players
import random
# Create and reset the environment
env = azul_v1_2players()
observation, info = env.reset()
# Iterate through agents
for agent in env.agent_iter():
# Get current agent's observation and info
observation, reward, termination, truncation, info = env.last()
if termination or truncation:
break
# Get valid moves for current agent
valid_moves = info["valid_moves"]
# Select a random valid move
action = random.choice(valid_moves)
# Execute the move
env.step(action)
# Render the environment (optional)
env.render()
# Close the environment
env.close()Example of a complete game using random valid moves
from azul_marl_env import azul_v1_2players
import random
def play_random_game():
env = azul_v1_2players()
observation, info = env.reset()
for agent in env.agent_iter():
observation, reward, termination, truncation, info = env.last()
if termination or truncation:
print(f"Game finished! Final scores: {[player['score'] for player in observation['players']]}")
break
# Get valid moves and make a random move
valid_moves = info["valid_moves"]
if valid_moves:
action = random.choice(valid_moves)
env.step(action)
env.close()
play_random_game()Environment Details
Factory count (num_factories):
2 player game -> 5
3 player game -> 7
4 player game -> 9
-
Action Space: MultiDiscrete([num_factories + 1, 5, 20, 5])
- First value: Factory index. Index 0 is taken for the center so the factory indexes are: 0 based factory index + 1.
- Second value: Tile color (0-4 representing different colors)
- Third value: Number of tiles to place on floor (0-19)
- Fourth value: Pattern line index (0-4)
-
Observation Space: Dictionary containing:
factories: Box(0, 4, (num_factories, 5), int32) - Tile counts in each factorycenter: Box(0, 3 * num_factories, (5,), int32) - Tile counts in centerplayers: Tuple of player states, each containing:pattern_lines: Box(0, 5, (5, 5), int32) - Current pattern lineswall: Box(0, 5, (5, 5), int32) - Wall statefloor: Box(0, 5, (7,), int32) - Floor tilesis_starting: Discrete(2) - First player markerscore: Discrete(241) - Player's score
bag: Box(0, 100, (5,), int32) - Remaining tiles in baglid: Box(0, 100, (5,), int32) - Discarded tiles
-
Reward:
-1for each step until game end-2for invalid moves- Final Azul score is added to cumulative reward at game end
-
Done:
Truewhen:-
Game is completed (at least one player filled at least one horizontal wall)
-
Falseotherwise -
Truncated:
Truewhen: -
Maximum moves reached (player_count * 150 by default)
-
Falseotherwise
-
-
Info: Contains
valid_moveslist for the current player
On this page
Languages
Python100.0%
Contributors
GNU Affero General Public License v3.0
Created June 11, 2025
Updated December 19, 2025
