Deep Reinforcement Learning for Adaptive Traffic Light Signal Control

This repository aims to provide a framework that can be used to easily define an environment compatible
with OpenAI Gym for an arbitrary road network defined in a SUMO simulation
configuration.

How to train an agent [LINUX ONLY]

The pre-trained agents are not provided. However, you can train one by yourself within 4-8 hours,
dependinding on your hardware. Just run the following script:

python tls/agents/agent_dqn.py \
  --net-file path/to/*.net.xml \
  --config-file path/to/*.sumocfg \
  --additional-file path/to/*.det.xml \
  --num-iters 1000 \
  --checkpoint-freq 100 \
  --mode train

To evaluate the trained agent, replace path to the trained agent in tls/agents/agent_dqn.py
and all the files needed for the environment initialization in tls/rollout.py .
Then run the following script:

python tls/agents/agent_dqn.py \
  --mode eval

The repository also provides several other scripts to train APEX DQN and PPO agents in the directory tls/agents/.

How it works

The simulation environment is based on an open-source microscopic traffic simulation package
Simulation of Urban MObility (SUMO). The implementation of all RL
algorithms is taken from RLlib: Scalable Reinforcement Learning.

On the diagram below the interactions between the components of the framework is shown. The Environment
component is the key one because it abstracts the interaction with the simulation and provides an
interface to initialize, step through and reset the simulation.

During initialization of the Environment component the following things happen:

The SUMO road network definition is pre-processed and for each traffic light
extracted an internal representation of the corresponding intersection that can
be used to observe the situation at the intersection by the reinforcement learning
agent in a convenient way;
Inside Controller component, from the additional file, defining the detectors
in the simulation, extracted information about installed in the road network
induction-loop detectors;
Created Controller component that is responsible for the interaction with the
environment and initialized with the extracted traffic light skeletons and information
about the detectors; Additionally, the following things happen
- The controller Trafficlight is created for each traffic light;
- The state observer Observer is created for each traffic light.

After an environment has been created and the SUMO process with the appropriate configuration files has been
started, the reinforcement learning agent can start to interact with the environment by calling the step through
function repeatedly, passing in a joint action. The actions are applied to the system, then the simulation is
progressed one step further, and the result of the simulation step is returned back to the agent.

Intersection observation

The intersection defined in the sumo network configuration is represented as a 1 or 0 valued matrix.
An example of how an agent see the environment shown below.

Internal network representation

The internal representation of a road network definition is a JSON object and its schema is presented below.
The keys of the JSON object store information about the relative position of lanes, which constitute the
observed part of the road network. The intersection is split into several segments, one for each side of
the world. Each segment contains a list of lanes that are adjacent to the intersection. Because in the
SUMO network definition the lanes are separated by connections, each lane in the list is represented as
another list, where the sequence of lanes actually corresponds to a single physical lane with direction
and offset additionally specified. Sometimes segment can connect two intersections, where only one of them
must be uncontrolled, then additionally the segment may contain internal representation of a nested intersection.
The nested intersection has the same structure, however, with the specified offset from the main intersection
and unspecified segment that creates the connection.

{
  "id": %JUNCTION_ID%,
  "offset": [%X_OFFSET%, %Y_OFFSET%],
  "segments": {
    "bottom": {
      "junction": {
        %NESTED_JUNCTION%
      },
      "lanes": [
        [
          [%START_POSITION%, %DIRECTION%, %LANE_ID%],
          ...
        ],
        ...
      ]
    },
    "right": {
      %ADJACENT_SEGMENT%
    },
    "top": {
      %ADJACENT_SEGMENT%
    },
    "left": {
      %ADJACENT_SEGMENT%
    }
  }
}

goshaQ/adaptive-tls

Deep Reinforcement Learning for Adaptive Traffic Light Signal Control

How to train an agent [LINUX ONLY]

How it works

Intersection observation

Internal network representation

On this page

Languages

Contributors