Code and data for "ActivityGen: Extracting Enabled Activities from Screenshots"

From MaRDI portal
Dataset:6708269



DOI10.5281/zenodo.13375065Zenodo13375065MaRDI QIDQ6708269FDOQ6708269

Dataset published at Zenodo repository.

Harry H. Beyel, Sovin Manuel, W. M. P. van der Aalst

Publication date: 26 August 2024

Copyright license: Creative Commons Attribution 4.0 International



The code and data for the paper "ActivityGen: Extracting Enabled Activities from Screenshots" is provided here. Abstract Many tasks in organizations are performed in a desktop environment. It is possible to record users' interactions in a desktop environment by taking screenshots when an action happens. The result is an interaction log. By considering the associated images of a record, it is possible to detect which activity was performed and which activities were enabled. This information can be extracted, resulting in a translucent event log. Such a translucent event log is valuable and can be used as input for dedicated process-mining techniques. The results can be used to analyze human-computer interactions or create bots for robotic process automation. However, current techniques for extracting information on enabled activities rely on template matching, which is rigid and sensitive to variations. To solve this issue, we present our modular framework, ActivityGen. ActivityGen detects and labels graphical user interface elements by also considering additional information. ActivityGen uses more advanced techniques to overcome the limitations of previous approaches and can extract information without a user's input. Furthermore, it can be adjusted to a user's needs. It detects graphical user interface elements more accurately than state-of-the-art techniques and labels them faster, more robust, and more domain-oriented than state-of-the-art techniques. Data ReDraw_CLS and ReDraw_ViSM are specified in the work. The basis for ReDraw_CLS is the ReDraw dataset. We focus on the following components: Button, CheckBox, EditText, Image, ImageButton (which we refer to as icon), RadioButton, and Switch. We noticed that the examples of ImageView and ImageButton are similar, primarily consisting of icon images. Therefore, we removed the ImageView class and introduced an Image class instead. The Image class contains images from the validation set of the Coco validation set 2017 and the YouTube Thumbnails dataset, enabling the detection of general website images. ReDraw_ViSM iterates add 6,000 synthetically created buttons to the former dataset by distributing them in the same ratio into train, test, and validation sets. lm_basic and lm_extended contain the text training for the language models. Code The code allows for the execution of ActivityGen. Moreover, we provide our evaluation scripts. However, the models do not have to be trained. The models' weights are provided in the model folder.







This page was built for dataset: Code and data for "ActivityGen: Extracting Enabled Activities from Screenshots"