This is a Deep Reinforcement Learning-based vehicle scheduling algorithm at the signalized intersection. In our sheduling algorithm, each CAV is regarded as an agent. The agent have three goals:
-
Reducing the energy consumption for passing the intersection of the CAV.
-
Minimizing the time for passing the intersection of the CAV.
-
Ensuring driving safety.
state
The CAV can obtain a observation
-
V2X: leading vehicle’s speed
$v_{lead}$ , current traffic phase$t_{flag}$ , rest time of the current traffic phase$t_{dur}$ , the distance to the stop line of the intersection$d_{inter}$ . -
On-Board Diagnosis (OBD) Unit: ego vehicle’s speed
$v_{ego}$ and acceleration$a_{ego}$ . -
Vehicle-mounted sensors:
$f_{lead}$ indicates whether there is a vehicle ahead of the ego-vehicle.$d_{lead}$ denotes the distance between the ego-vehicle and the leading vehicle.
action
Obviously, controlling the vehicle’s acceleration discretely will damage driving stability. Therefore, we choose to generate the vehicle’s acceleration
reward
our reward function contains four sub-reward items:
1)reward for vehicle speed
2)reward for acceleration
3)reward for pass the green light
4)reward for driving safety
So the total reward of the CAV can obtain at time-step
How to run our code
1) Environment preparation
simulation platform: sumo, python == 3.9.7, pytorch == 1.12.1, traci == 1.13.0
2) Training
run train_main.py
3) Testing