SLEM - self-learning and self-explanatory machine - is a prototype of an assistance system that uses artificial intelligence (AI) to learn from experienced users and pass on this knowledge to inexperienced users.The assistance system can be divided into three main components. First, the machine must recognize which actions the user is performing on the machine. Fraunhofer IPA implemented this component. To achieve this, anonymized skeletal poses are extracted from video camera data and pose data is aggregated over time to classify activities using machine learning (ML) methods. The classifications are specific to the use case and include classes such as "operating the display interface" or "inserting component." However, human activity alone is not sufficient to obtain a comprehensive understanding. Therefore, Knowtion developed the second component of the SLEM system to evaluate activities or the state of the machine based on the sensor data. In order to process the abundance of machine data, it is clustered and thus reduced to the essential features.
The information from these two components - i.e. the machine states and the human activities - is fused in the third component, which was developed by SABO Mobile IT. With the help of a recurrent neural network, relevant process steps are extracted and the data streams are merged. This enables SLEM to learn new process sequences, for example when observing an expert operating a machine. In order for SLEM to handle common activities of humans and machines, a one-time training of SLEM on annotated data of the machine scene is necessary. After a new process is learned by SLEM, text-based instructions of the process, broken down into individual steps and activities, can be generated using a Transformer model. Furthermore, during the execution of the machine operation, it can be checked whether the correct process is actually executed by comparing the actual and targeted activities using an AI.
Additional use cases from industry can now be integrated into SLEM. At the same time, the complexity of the data fusion is to be increased, making more features evaluable. The evaluation of human activity recognition is to be extended to include more context information. Furthermore, additional features from the machine data are to be evaluated. Lastly, the way in which feedback is provided to the user about faulty process steps is to be expanded.