Combining supervisory control of discrete event systems and reinforcement learning to control MRS
Goncalo Neto and Pedro U. Lima
In this paper we combine the Theory of Supervisory Control of Discrete Event Systems (DES) and Reinforcement Learning to restrict the free
behavior of a (set of) robot(s), so that it meets qualitative and quantitative performance specifications. We assign the qualitative
(logic) specifications to the DES supervisor, and introduce a reinforcement learning component, as a way to optimize the behavior of the
agent within the bounds imposed by Supervisory Control. Although at the logic level the system does not need to consider time, we
introduce continuous time since it is needed to obtain a discounted utility function, and is coherent with the fact that the agent reacts
to the event firings, instead of making decisions on a fixed timestep. In order to keep the Markov Property, we simplify the stochastic
clock structure of the system by making some assumptions, and show that it reduces to a Semi Markov Decision Process. We use the
optimality equations for SMDPs and apply them to this case, deriving the existence of solution conditions for the eventbased case. Then,
we apply a modifed Qlearning update rule and obtain the conditions on the event firing times that enable the update rule to converge.
Finally, we present a simple application example that illustrates how this method can be used in a robotic decision making setup and
explain how this approach can be extended to a multirobot scenario.
