MASTraf: a decentralized multi-agent system for network-wide traffic signal control with dynamic coordination
KeywordsTraffic signal control
Full recordShow full item record
AbstractContinuous increases in traffic volume and limited available capacity in the roadway system have created a need for improved traffic control. From traditional pre-timed isolated signals to actuated and coordinated corridors, traffic control for urban networks has evolved into more complex adaptive signal control systems. However, unexpected traffic fluctuations, rapid changes in traffic demands, oversaturation, the occurrence of incidents, and adverse weather conditions, among others, significantly impact the traffic network operation in ways that current control systems cannot always cope with. On the other hand, strategies for traffic control based on developments from the field of machine learning can provide promising alternative solutions, particularly those that make use of unsupervised learning such as reinforcement learning (RL) - also referred as approximate dynamic programming (ADP) in some research communities. For the traffic control problem, examples of convenient RL algorithms are the off-policy Q-learning and the ADP using a post decision state variable, since they address processes with sequential decision making, do not need to compute transition probabilities, and are well suited for high dimensional spaces. A series of benefits are expected from these algorithms in the traffic control domain: 1) no need of prediction models to transition traffic over time and estimate the best actions; 2) availability of cost-to-go estimates at any time (appropriate for real-time applications); 3) self-evolving policies; and 4) flexibility to make use of new sources of information part of emergent Intelligent Transportation Systems (ITS) such as mobile vehicle detectors (Bluetooth and GPS vehicle locators). Given the potential benefits of these strategies, this research proposes MASTraf: a decentralized Multi-Agent System for network-wide Traffic signal control with dynamic coordination. MASTraf is designed to capture the behavior of the environment and take decisions based on situations directly observed by RL agents. Also, agents can communicate with each other, exploring the effects of temporary coalitions or subgroups of intersections as a mechanism for coordination. Separate MASTraf implementations with similar state and reward functions using Q-learning and ADP were tested using a microscopic traffic simulator (VISSIM) and real-time manipulation of the traffic signals through the software???s COM interface. Testing was conducted to determine the performance of the agents in scenarios with increasing complexity, from a single intersection, to arterials and networks, both in undersaturated and oversaturated conditions. Results show that the multi-agent system provided by MASTraf improves its performance as the agents accumulate experience, and the system was able to efficiently manage the traffic signals of simple and complex scenarios. Exploration of the policies generated by MASTraf showed that the agents followed expected behavior by providing green to greater vehicle demands and accounting for the effects of blockages and lost time. The performance of MASTraf was on par with current state of practice tools for finding signal control settings, but MASTraf can also adapt to changes in demands and driver behavior by adjusting the signal timings in real-time, thus improving coordination and preventing queue spillbacks and green starvation. A strategy for signal coordination was also tested in one of the MASTraf implementations, showing increased throughput and reduced number of stops, as expected. The coordination employed a version of the max-plus algorithm embedded in the reward structure, acting as a bias towards improved coordination. The response of the system using imprecise detector data, in the form of coarse aggregation, showed that the system was able to handle oversaturation under such conditions. Even when the data had only 25% of the resolution of the original implementation, the system throughput was only reduced by 5% and the number of stops per vehicle was increased by 8%. The state and reward formulations allowed for a simple function approximation method in order to reduce the memory requirements for storing the state space, and also to create a form of generalization for states that have not been visited or that have not been experienced enough. Given the discontinuities in the reward function generated by penalties for blockages and lost times, the value approximation was conducted through a series of functions for each action and each of the conditions before and after a discontinuity. The policies generated using MASTraf with a function approximation were analyzed for different intersections in the network, showing agent behavior that reflected the principles formulated in the original problem using lookup tables, including right of way assignment based on expected rewards with consideration of penalties such as lost time. In terms of system performance, MASTraf with function approximation resulted in average reductions of 1% in the total system throughput and 3.6% increases in the number of stops per vehicle, when compared to the implementation using lookup tables on a congested network of 20 intersections.