biomarker

Resources for single-cell transcriptomics datasets

Cancer	Literature	PMID	Sequencing platform	Cell number	Data processing	Resource
Breast cancer	A single-cell and spatially resolved atlas of human breast cancers	34493872	Illumina NextSeq 500	130,246	The EmptyDrops method from the DropletUtils package was applied for cell filtering with additional cutoffs for cells with a gene and unique molecular identifier (UMIs) count greater than 200 and 250, respectively, and a mitochondrial percentage less than 20%.	GSE176078
Non-small cell lung cancer	Global characterization of T cells in non-small-cell lung cancer by single-cell sequencing	29942094	Illumina Hiseq 2500 or Illumina Hiseq 4000	12,346	Low-quality cells were discarded if the cell library size or the number of expressed genes (counts larger than 0) was smaller than pre-defined thresholds, which were the medians of all cells minus 3 × median absolute deviation. Cells were also removed if their proportions of mitochondrial gene expression were larger than 10%. Only cells with the average TPM of CD3D, CD3E and CD3G larger than 10 were kept for subsequent analysis.	GSE99254
Lung cancer	Integrated single-cell RNA sequencing analysis reveals distinct cellular and transcriptional modules associated with survival in lung cancer	35027529	Illumina NovaSeq 6000	220,716	Samples with less than 500 cells were removed. Cells were required to have more than 1000 UMIs and only genes with more than 1000 UMIs across all cells were kept for further analyses.	http://lungcancer.chenlulab.com/#/download
Lung adenocarcinoma	Single-cell RNA sequencing reveals distinct tumor microenvironmental patterns in lung adenocarcinoma	34663877	Illumina HiSeq 4000	114,489	Transcriptomes were filtered for cells with 500–10,000 genes detected, 1000–100,000 UMIs counted, fraction of mitochondrial reads <30%, and fraction of hemoglobin reads <5%.	Code Ocean capsule from 10.24433/CO.0121060.v1.
Lung cancer	Therapy-Induced Evolution of Human Lung Cancer Revealed by Single-Cell RNA Sequencing	32822576	Illumina NextSeq or NovaSeq 6000	23,261	Standard procedures for filtering were performed using the Seurat v3 using R, where cells with fewer than 500 genes and 50,000 reads were excluded. DoubletFinder was used to identify potentially sorted doublet cells.	NCBI BioProject #PRJNA591860
Gastric cancer	Single-cell RNA sequencing reveals a pro-invasive cancer-associated fibroblast subgroup associated with poor clinical outcomes in patients with gastric cancer	34976204	Illumina HiSeq 4000	36,897	Cells with fewer than 400 expressed genes, as well as genes expressed in less than four cells, were removed.	wxy@ibms.pumc.edu.cn
Gastric cancer	Single-Cell Genomic Characterization Reveals the Cellular Reprogramming of the Gastric Tumor Microenvironment	32060101	Illumina sequencer	56,167	Cells that expressed fewer than 200 genes, had greater than 20% mitochondrial genes or had number of UMI in an outlier range indicative of potential doublets were removed. The authors also excluded genes detected in fewer than three cells.	genomics_ji@stanford.edu
Hepatocellular carcinoma	Single-cell landscape of the ecosystem in early-relapse hepatocellular carcinoma	33357445	BGISEQ500	16,498	The authors defined genes with TPM > 1 as detected genes. To filter out low-quality cells they set the following criterion: 1). Mapping reads ≥ 1 M; 2). Mapping rate ≥ 30%; 3). 1,500 ≤ detected genes number ≤ 10,000.	fan.jia@zs-hospital.sh.cn
Liver cancer	A single cell atlas of the human liver tumor microenvironment	33332768	NextSeq 550	7,947	Cells with UMI counts below 200 or higher than 3,000 or mitochondrial content above 35% were removed.	GSE146409
Pancreatic ductal adenocarcinoma	Single-cell RNA-seq highlights intra-tumoral heterogeneity and malignant progression in pancreatic ductal adenocarcinoma	31273297	Illumina HiSeq X Ten	57,530	Low quality cells (<200 genes/cell, <3 cells/gene and >10% mitochondrial genes) were excluded.	GSA：CRA001160
Prostate cancer	Single-cell analysis of human primary prostate cancer reveals the heterogeneity of tumor-associated epithelial cell states	35013146	Seq-Well	21,743	Cells with less than 300 genes, 500 transcripts, or a mitochondrial level of 20% or greater, were filtered out. Then, an upper threshold for the number of genes per cell in each individual sample was set in order to filter potential doublets.	GSE176031
Renal cell carcinoma	Identification of a novel cancer stem cell subpopulation that promotes progression of human fatal renal cell carcinoma by single-cell RNA-seq analysis	33162821	Illumina Hiseq X	15,208	To guarantee the quality of sequencing, the cells with <200 or > 5000 genes were depleted from the original data	cuixingang@smmu.edu.cn
Renal cell carcinoma	Single-cell transcriptomics reveals a low CD8+ T cell infiltrating state mediated by fibroblasts in recurrent renal cell carcinoma	35121646	Illumina NovaSeq 6000	32,073	Low-quality cells were removed following 3 measurements: 1) cells had either fewer than 200 or over 6000 unique molecular identifiers (UMIs), over 20,000 or less than 200 expressed genes or over 15% UMIs derived from the mitochondrial genome, or over 2.5% UMIs derived from the erythrocytic genome; 2) cells had an average expression level of less than 2 for a curated list of housekeeping genes; 3) cells had a co-expression of EPCAM and PTPRC. 4) Doublets were detected by DoubletFinder R package for single sample and manually detected the doublets in re-clustering the cell types.	zhangzhl@sysucc.org.cn
Colorectal cancer	Multiregion single-cell sequencing reveals the transcriptional landscape of the immune microenvironment of colorectal cancer	33463049	BGISEQ500	15,115	Cells with less than 500 genes (TPM > 1) or over 20% TPM derived from the mitochondrial genome were removed.	CNGB Nucleotide Sequence Archive; CNP0000916
Head and neck squamous cell carcinoma	Investigating immune and non-immune cell interactions in head and neck tumors by single-cell RNA sequencing	34921143	Illumina NextSeq 500/550	134,606	Based on the QC metrics suggested in the Scanpy tutorial, cells with less than 200 genes expressed were filtered out. Cells expressing more than 5000 genes, and more than ten percent mitochondrial genes were also removed. Genes expressed in less than 3 cells were also filtered out of the analysis.	NCBI Sequence Read Archive: accession ID SRP301444.
Head and neck squamous cell carcinoma	Immune Landscape of Viral- and Carcinogen-Driven Head and Neck Cancer	31924475	Illumina NextSeq 500	131,224	After creation of the gene/barcode matrix, a cell-level filtering step was performed to remove cells with either few genes per cell (<200) or many molecules per cell (>20,000). Next, genes that were lowly expressed (fewer reads than 3 counts in 1% of cells, or genes expressed in fewer than 1% of cells) across all samples were removed.	GSE139324
Nasopharyngeal carcinoma	Tumour heterogeneity and intercellular networks of nasopharyngeal carcinoma at single cell resolution	33531485	Illumina HiSeq X Ten	176,447	The R package “DoubletFinder” was applied to predict doublets in the data. The authors removed doublets in each sample individually, with an expected doublet rate of 0.05 and default parameters used otherwise. Next, any cells were removed for which had either less than 101 UMIs, or expression of less than 501 genes, or over 15% UMIs linked to mitochondrial genes.	GSE162025
Neuroblastoma	Single-cell transcriptomic analyses provide insights into the developmental origins of neuroblastoma	33767450	Illumina NextSeq 500	100,337	The R package Seurat was used to calculate the quality control metrics35. Cells were removed from the analysis if fewer than 500 distinct genes, 1,000 counts or more than 2.5% of reads mapping to mitochondrial genes were detected, for data generated with the Chromium Next GEM Single Cell 3' Kit v.3.1 (10x Genomics). For the Chromium Single Cell 3' Kit v.2 (10x Genomics) data, cells with fewer than 300 distinct genes, 1,000 counts or more than 2.5% of reads mapping to mitochondrial genes were filtered. Doublets were detected and filtered using the R package DoubletFinder with default settings. Genes that were expressed in fewer than three cells were excluded.	GSE163431
Esophageal squamous cell carcinoma	Dissecting esophageal squamous-cell carcinoma ecosystem by single-cell transcriptomic analysis	34489433	Illumina HiSeq X Ten	208,659	For quality filtering, the authors removed genes whose expressions were detected in <0.1% of all cells and filtered out cells that had gene counts <500 or mitochondrial RNA content >20%. The Seurat package (version 2.3.4) was used for quality filtering.	GSE160269
Esophageal squamous cell carcinoma	Integrated single-cell transcriptome analysis reveals heterogeneity of esophageal squamous cell carcinoma microenvironment	34921160	Illumina Hiseq X (PE150)	62,161	Potential doublets were detected and filtered using DoubletFinder based on the expression proximity of each cell to artificial doublets. Further, cells with high mitochondrial content (>= 20%) were removed.	Sequence Read Archive (SRA) under accession number PRJNA777911.
Cervical cancer	Single-Cell RNA Sequencing Reveals Multiple Pathways and the Tumor Microenvironment Could Lead to Chemotherapy Resistance in Cervical Cancer	34900703	Illumina NovaSeq 6000	24,371	The number of unique molecular identifiers (UMIs), the number of genes, and the percentage of mitochondrial genes were examined for quality control. Cells expressing <500 or >4,000 genes (potential cell duplets) and gene expression not detected in fewer than three cells were trimmed from the library.	shenchao@whu.edu.cn
Multiple myeloma	Single-cell RNA sequencing infers the role of malignant cells in drug-resistant multiple myeloma	34918874	Illumina HiSeq X Ten	52,793	To obtain cells with high quality, the ratio of mitochondria lower than 0.2 and cells with genes over 2000 were maintained.	wangliangtrhos@126.com
Endometrial carcinoma	Phenotyping of immune and endometrial epithelial cells in endometrial carcinomas revealed by single-cell RNA sequencing	33429363	Illumina HiSeq X Ten	30,780	Genes detected in < 3 cells and cells where < 100 genes had nonzero counts were excluded. Low-quality cells that had > 5% mitochondrial genes were discarded.	The SRA accession number is PRJNA650549.
Osteosarcoma	Single-cell RNA landscape of intratumoral heterogeneity and immunosuppressive microenvironment in advanced osteosarcoma	33303760	Illumina HiSeq X	100,987	The cells with no. of expressed genes <300 genes or the percent of mitochondrial genes over 10% of total expressed genes were filtered out. Further, the DoubletFinder package of the R was used to remove the potential doublets (and to an even lesser extent of higher-order multiplets) that occurred in the encapsulation step and/or as occasional pairs of cells that were not dissociated in sample preparation.	GSE152048
Ovarian cancer	Identification of grade and origin specific cell populations in serous epithelial ovarian cancer by single cell RNA-seq	30383866	Illumina NextSeq 500	2,911	The R software package Seurat was used for further analysis. Genes were initially filtered on expression in at least three cells and each cell needed to have at least 200 genes expressed.	GSE118828
Uveal melanoma	Single-cell analysis reveals new evolutionary complexity in uveal melanoma	31980621	Illumina NextSeq 500	59,915	Filtering was conducted by retaining cells that had unique molecular identifiers (UMIs) greater than 400, expressed 100 and 8000 genes inclusive, and had mitochondrial content less than 10 percent.	GSE139829
T-cell lymphoma	Single-cell RNA sequencing reveals markers of disease progression in primary cutaneous T-cell lymphoma	34583709	Illumina NovaSeq 6000	47,172	The command “doubletCells” simulates thousands of doublets by adding together two randomly chosen single cell profiles. For each cell the number of simulated doublets in the neighborhood was recorded and used as input to calculate a doublet score. Threshold to filter putative doublets was set to three times the median absolute deviation of the doublet score and all cells with a higher score were discarded.	GSE173205
Thyroid cancer	Characterizing dedifferentiation of thyroid cancer by integrated analysis	34321197	Illumina NovaSeq	46,205	Several criteria were set to filter low-quality cells and genes: minimal expression of 200 genes per cell, mitochondrial content less than 15%, and genes that are expressed in more than 3 cells.	Access number: HRA000686, https://bigd.big.ac.cn/gsa-human/browse/.

Browse

On the Browse page, users can browse SCancerRNA by clicking on diagrams related to the categories (RNA type, biological function, clinical application and tissue) listed above. The result page is shown in the figure below.

Biomarker Result

1. The results for non-coding biomarkers are displayed.

2. Each entry includes the name of the non-coding RNA biomarker, the RNA type of the biomarker, the type of the cancer and the testing methods of the non-coding RNA biomarker.

3. Users can explore whether this biomarker is related to biological function and clinical application through T or F. The specific biological functions (cell proliferation, growth, apoptosis, autophagy and epithelial mesenchymal transformation) and clinical applications (migration, metastasis, circulation, survival and recurrence) of biomarkers can be checked by clicking ‘Detail’ button.

T：This biomarker is associated with this listed biological function or clinical application.

F：This biomarker is not associated with this listed biological function or clinical application.

4. Users are allowed to acquire more detailed information in the original literature corresponding to the biomarker by clicking the 'PMID' link.

5. Users can click the ‘more details’ button to check detailed information for the ncRNA biomarker.

6. By clicking on the network logo, the interaction network of different types of ncRNA biomarkers will be shown.

7. Biomarker results can be downloaded in excel or csv format.

8. Input an interested non-coding RNA biomarker for search.

Single cell Result

1. The results of single-cell sequencing analysis for the corresponding genes of the biomarkers are displayed.

2. Each entry includes the name of the gene, the corresponding biomarker, the RNA type of the biomarker and the cancer implicated in single-cell sequencing analysis.

3. Users can explore the average log2 fold change value, adjusted p-value and description of the gene in the differential expression analysis at the single-cell level.

4. Users can obtain sequencing platform information and quality control steps in single-cell sequencing analysis.

5. Users are allowed to acquire more detailed information in the original literature by clicking the 'PMID' link.

6. Single cell results can be downloaded in excel or csv format.

7. Input an interested gene or RNA biomarker for search.

1. Users are allowed to search for non-coding RNA biomarkers by RNA name or gene name.

2. Input an interested cancer type to search for non-coding biomarkers.

3. Select some interested biological functions for advanced search.

4. Select some interested clinical applications for advanced search.

Single cell

SCancerRNA provides two modules on the ‘single cell’ page, which allows users to easily access biomarkers associated with genes of interest and to discover single-cell expression data associated with specific cancers.

1. By searching for a gene in the search bar on the right side of the module, users are able to obtain the differential expression data for the gene in different cancers and different cell types and the SCancerRNA link of the corresponding biomarker.

2. In the ‘Biomarker in single cell’ module, results can be downloaded in excel or csv format.

3. Users are able to select a cancer type in the cancer drop-down bar on the right to obtain differential expression data for genes associated with the selected cancer at the single-cell level.

4. Input an interested gene or RNA biomarker for search.

5. In the ‘Cancer’ module, results can be downloaded in excel or csv format.

Statistics

The visualization of detailed statistics of SCancerRNA is provided in the "Statistics" page.

Users can explore the data through visualizations according to their needs.

Download

All the data from the SCancerRNA database can be accessed on the ‘Download’ page.

Users can click the arrow symbol next to the file to download their interested data.

Submit

Users need to input their data into corresponding blanks and submit. Users can also select the biological functions and clinical applications of each biomarker to provide more detailed and comprehensive information for SCancerRNA.

We will further curate the submitted information to determine whether to add the new entries to the database or not.