BEELINE (Q4834871)

From MaRDI portal





Dataset published at Zenodo repository.
Language Label Description Also known as
default for all languages
No label defined
    English
    BEELINE
    Dataset published at Zenodo repository.

      Statements

      0 references
      27 February 2023
      0 references
      This collection consists of over 400 single-cell gene expression datasets across four curated and six synthetic gene regulatory networks. It was created to benchmarking algorithms for gene regulatory network inference in Pratapa et al. (2020). Task: The collection can be used to study causal inference algorithms. Summary: Size of collection: 400 datasets on 6 - 19 features of different size Task: Causal Inference Problem Data Type: Mixed Data Dataset Scope: Collection of Datasets Ground Truth: Known Graph Temporal Structure: Static Data License: CC BY-NC 4.0 (see 10.5281/zenodo.3701939) Missing Values: No Missing Values Missingness Statement: There are no missing values. Collection: (for a detailed description see Peng et al. (2024), for simulation details see Pratapa et al. (2020)) Curated: There are experiments on four curated gene regulatory networks:mCAD (Mammalian Cortical Area Development, 14 edges and 5 nodes), VSC (Ventral Spinal Cord Development, 15 edges and 8 nodes), HSC (Hematopoietic Stem Cell Differentiation, 30 edges and 11 nodes), and GSD (Gonadal Sex Determination, 79 edges and 18 nodes). Synthetic: There are experiments six synthetic gene regulatory networks: dyn-BF (Bifurcating, 12 edges and 5 nodes), dyn-BFC (Bifurcating Converging, 18 edges and 9 nodes), dyn-CY (Cycle, 6 edges and 5 nodes), dyn-LI (Linear, 8 edges and 7 nodes), dyn-LL (Linear Long, 19 edges and 18 nodes), and dyn-TF (Trifurcating, 20 edges and 7 nodes). Files per Experiment: GroundTruth.csv: This file represents the actual biological regulatory interactions between genes, typically derived from known databases, literature, or synthetic models. An edge weight of +1 represents activation, -1 represents inhibition. refNetwork.csv: This file is a processed version of the ground truth network, keeping only the sign (+ or -) of interactions. ExpressionData.csv:This file contains the RNAseq data, with genes as rows and cell IDs as columns. PseudoTime.csv: This file contains the Pseudotime. It is a computationally inferred measure that orders single cells along a trajectory to represent their progression through a biological process, such as differentiation or development.
      0 references

      Identifiers

      0 references