Stochastic approximation methods for risk-sensitive control of discrete-event systems
MetadataShow full item record
This research investigates the use of stochastic approximation methods using simulation for solving risk-sensitive Markov decision problems (MDPs) and semi-Markov decision problems (SMDPs). For the risk-sensitive formulation of the MDP and SMDP, a well-known Markowitz paradigm is employed, which is based on the variance of revenues. An advantage of a simulation-based approach is that it does not require the computation of the transition probabilities of the associated Markov chains, which are difficult to estimate for many complex problems. The stochastic approximation methods under consideration are called "learning automata (LA)" and "simultaneous perturbation (SP)." A new risk-sensitive LA algorithm is developed. The SP algorithm is employed for the first time to solve risk-sensitive MDPs and SMDPs. A hierarchical version of the LA algorithm that converges faster than flat LA is proposed. Numerical tests are conducted on small MDPs and large SMDPs from the domain of preventive maintenance. Empirical evidence obtained from the use of these algorithms is very encouraging. Convergence conditions for the LA algorithm are also studied numerically.