Dataset related to the article "Binary classification of copy number alteration profiles in liquid biopsy with potential clinical impact in advanced NSCLC"
DOI10.5281/zenodo.11366940Zenodo11366940MaRDI QIDQ6697198FDOQ6697198
Dataset published at Zenodo repository.
Publication date: 28 May 2024
Copyright license: Creative Commons Attribution 4.0 International
This record contains original data used in the article "Binary classification of copy number alteration profiles in liquid biopsy with potential clinical impact in advanced NSCLC" to develop a linear support vector machine (SVM) classifier to predict chromosomal instability. We retrospectively evaluated the results of plasma NGS analysis performed at our Institution by using the AVENIO ctDNA Expanded Kit, a panel of 77 genes, which detects the major classes of genetic alterations. Binary classification, into stable (SCP) or unstable (UCP) chromosomal profiles, was initially performed by visual inspection of individual CNV alteration profiles by two independent professionals of our group. Then we decided to implement a support vector machine (SVM) classifier to automatically classify CNV profiles as SCP or UCP, beyond operators experience. We considered the segmented log2 ratios (.cns) files provided by the CNV kit software and computed three features (Segments, Size, Chromosomes). An alteration (occurrence of instability) in the CNV profile was defined each time we found a DNA segment of any size with absolute value of the log2 copy ratio exceeding a fixed cut-off. Two different cut-off values on log2 copy ratio were examined: 0.1 and 0.2. Once the cut-off was defined, three features were considered as covariates in the SVM classifier: 1) number of altered segments (Segments), 2) total length of altered regions (Size) and 3) number of affected chromosomes (Chromosomes). The dataset_0.1.txt and dataset_0.2.txt files are the original data matrices obtained by considering a cut-off of 0.1 and 0.2, respectively, on the absolute value of the log2 copy ratio. Rows represent available samples in our study (n=177). Columns contain the following variables: anonymized sample IDs (Sample), the class, stable or unstable, as assigned by two independent professionals of our group (Class), the corresponding binary label (Label: 0 for stable, 1 for unstable), the three features used as covariates in the SVM classifier and computed as described above (Segments, Size, Chromosomes). For the detailed results of our work, please refer to the full article.
This page was built for dataset: Dataset related to the article "Binary classification of copy number alteration profiles in liquid biopsy with potential clinical impact in advanced NSCLC"