Advances in the field of reinforcement learning (RL)-based drive control allow formulation of holistic optimization goals for the data-driven training phase. The resulting controllers feature efficient drive operation without the necessity of an a priori known plant model but, so far, conduction of the corresponding training phase in real-world drive systems has been applied only sparsely due to safety concerns. This contribution targets the challenging problem of self-learning torque control for a permanent magnet synchronous motor assuming a finite control set (FCS), i.e., the direct selection of switching actions instead of a modulator-based setup. In order to allow a secure and effective online training with real-world drive systems, the RL controller is monitored by a safeguarding algorithm that prevents application of unsafe switching actions, e.g., such that result in overcurrent. The accruing amount of measurement data is handled with the use of an edge computing pipeline to outsource the RL training from the embedded control hardware. The inference of the utilized artificial neural network in hard real time is realized with the use of a reconfigurable FPGA architecture. The resulting RL-based algorithm is able to learn a torque control policy in just ten minutes which has been validated during comprehensive real-world experiments.