GitHunt
AZ

A Multi-agent reinforcement learning environment for Azul board game

AzulMARL

codecov
PyPI version

PettingZoo AI env for Azul multiplayer board game to enable AI agent training.

AzulRendering

Most important libraries used

  • GitHub
  • GitHub

Usage

Initiating the env via PettingZoo

from azul_marl_env import azul_v1_2players, azul_v1_3players, azul_v1_4players

env_2players = azul_v1_2players()
env_3players = azul_v1_3players()
env_4players = azul_v1_4players()

env_2players_custom_max_moves = azul_v1_2players(max_moves=100)

Initiating the env directly

from azul_marl_env import AzulEnv

env = AzulEnv(player_count=2)
env = AzulEnv(player_count=3)
env = AzulEnv(player_count=4) 

env = AzulEnv(player_count=2, max_moves=100)

Making moves

from azul_marl_env import azul_v1_2players
import random

# Create and reset the environment
env = azul_v1_2players()
observation, info = env.reset()

# Iterate through agents
for agent in env.agent_iter():
    # Get current agent's observation and info
    observation, reward, termination, truncation, info = env.last()
    
    if termination or truncation:
        break
        
    # Get valid moves for current agent
    valid_moves = info["valid_moves"]
    # Select a random valid move
    action = random.choice(valid_moves)
    # Execute the move
    env.step(action)
    
    # Render the environment (optional)
    env.render()

# Close the environment
env.close()

Example of a complete game using random valid moves

from azul_marl_env import azul_v1_2players
import random

def play_random_game():
    env = azul_v1_2players()
    observation, info = env.reset()
    
    for agent in env.agent_iter():
        observation, reward, termination, truncation, info = env.last()
        
        if termination or truncation:
            print(f"Game finished! Final scores: {[player['score'] for player in observation['players']]}")
            break
            
        # Get valid moves and make a random move
        valid_moves = info["valid_moves"]
        if valid_moves:
            action = random.choice(valid_moves)
            env.step(action)
    
    env.close()

play_random_game()

Environment Details

Factory count (num_factories):
    2 player game -> 5
    3 player game -> 7
    4 player game -> 9
  • Action Space: MultiDiscrete([num_factories + 1, 5, 20, 5])

    • First value: Factory index. Index 0 is taken for the center so the factory indexes are: 0 based factory index + 1.
    • Second value: Tile color (0-4 representing different colors)
    • Third value: Number of tiles to place on floor (0-19)
    • Fourth value: Pattern line index (0-4)
  • Observation Space: Dictionary containing:

    • factories: Box(0, 4, (num_factories, 5), int32) - Tile counts in each factory
    • center: Box(0, 3 * num_factories, (5,), int32) - Tile counts in center
    • players: Tuple of player states, each containing:
      • pattern_lines: Box(0, 5, (5, 5), int32) - Current pattern lines
      • wall: Box(0, 5, (5, 5), int32) - Wall state
      • floor: Box(0, 5, (7,), int32) - Floor tiles
      • is_starting: Discrete(2) - First player marker
      • score: Discrete(241) - Player's score
    • bag: Box(0, 100, (5,), int32) - Remaining tiles in bag
    • lid: Box(0, 100, (5,), int32) - Discarded tiles
  • Reward:

    • -1 for each step until game end
    • -2 for invalid moves
    • Final Azul score is added to cumulative reward at game end
  • Done: True when:

    • Game is completed (at least one player filled at least one horizontal wall)

    • False otherwise

    • Truncated: True when:

    • Maximum moves reached (player_count * 150 by default)

    • False otherwise

  • Info: Contains valid_moves list for the current player

Languages

Python100.0%

Contributors

GNU Affero General Public License v3.0
Created June 11, 2025
Updated December 19, 2025