On the Convergence Rate of Fast Reinforcement Learning Algorithms with Application to Energy-Efficient Delay-Sensitive Wireless Communications
MetadataShow full item record
In this thesis, we consider the problem of energy-efficient point-to-point scheduling of delay-sensitive traffic over a block fading channel. To optimally trade off energy and delay, we combine adaptive rate transmission at the physical layer with system-level dynamic power management, which allows the transmitter to be put in to a low power sleep state when it does not need to transmit any data. We formulate the scheduling problem as a Markov decision process (MDP) and solve it online using reinforcement learning (RL) so that the transmitter can achieve the minimum possible energy consumption under its delay constraint in the presence of stochastic and a priori unknown traffic and channel dynamics. In particular, we use a state-of-the-art RL algorithm that was recently proposed in the literature. We prove that this algorithm converges to the optimal solution. Additionally, we provide analysis to substantiate claims of the original authors that the RL algorithm converges 2–3 orders of magnitude faster than the wellknown Q-learning algorithm.