MediaText: a media industry-based dataset for scene text detetcion (Q6699604)

From MaRDI portal
!
WARNING

This is the item page for this Wikibase entity, intended for internal use and editing purposes.

Dataset published at Zenodo repository.
Language Label Description Also known as
default for all languages
No label defined
    English
    MediaText: a media industry-based dataset for scene text detetcion
    Dataset published at Zenodo repository.

      Statements

      0 references
      Media-Text Media-Text dataset comprising images of banners, posters, covers and another images characterised for media industry. Full paper is available here: Media-Text: a Media Industry-Based Dataset for Scene Text Detection DATASET DESCRIPTION 400 images 7 744 annotated text instances 973 annotations have been marked as illegible for the task of text recognition 659 texts have been markes as do not care (###) for scene text detection. Images are represented by 193 unique resolutions. Annotation Format - Each image has corresponding gt_*.txt file, which contains annotations in bounding box format (defined by 4 courners), transcription, and bool flag which determines that text is illegible for OCR. Proposed format is similar to ICDAR15 annotations. x1, x2, ..., x4, y4, transcription, OCR Flag Example:37,68,198,49,214,181,52,200,LADIES,False ACKNOWLEDGMENT This work was supported by the Silesian University of Technology (SUT) through the subsidy for maintaining and developing research potential grant in 2024 for young researchers, No. 2/070/BKM24/0058, and by the Ministry of Science and Higher Education "Implementation Doctorate" No. DWD/5/0511/2021. Thanks to the graphic department of media-press group for the preparation and possibility of sharing graphics thematically related to the prepared dataset. LICENSE Annotations created by authors are licesned under CC-BY-4.0 license.Images from the Open-Image-V7 dataset and are licensed according to their source information. Source information is defined in a file metadata.csv file that defines all the metadata of each file (File name corresponds to the ImageID column). Images whose name corresponds to the media_press pattern are provided for academic use. CITING THE RELATED WORKS Please cite the related works in your publications if it helps your research: ``` @inproceedings{inproceedings, author = {Kalisz, Seweryn and Marczyk, Michał and Polanska, Joanna}, booktitle = {Modelling and simulation 2024. The 2024 European Simulation and Modelling Conference} editor = {Manuel Graa; J. David Nuez-Gonzalez} year = {2024}, month = {10}, pages = {138-144}, publisher = {EUROSIS-ETI}, title = {Media-Text: a Media Industry-Based Dataset for Scene Text Detection} } ```
      0 references
      22 July 2024
      0 references
      0 references
      0 references
      0 references
      0 references

      Identifiers

      0 references