Single-cell RNA-seq (scRNA-seq) is widely used to investigate the composition of complex tissues. However, it is often challenging to directly compare the cells identified in two different experiments. scmap allows you to project cells from an scRNA-seq experiment (the Projection) on to the cell-types identified in a different experiment (the Reference).

scmap manuscript is available on bioRxiv.
scmap source code is available on GitHub.
scmap R package is also available on Bioconductor.
More information about the existing Reference can be found on our dataset website.
Please send your feedback/comments/suggestions to Vladimir Kiselev.

scmap is based on SingleCellExperiment format. Please make yourself familiar with it before running scmap.

rowData slots of both the Reference and Projection dataset must have the feature_symbol column which contains Feature (gene/transcript) names from the same organism.

Contents

1 Datasets

In this tutorial we will run scmap on the four human pancreas datasets, xin, segerstolpe, muraro and baron, which are used as positive controls. In the segerstolpe dataset we will remove cells labeled as not applicable since it is unclear how to interpret this label and how it should be matched to the other datasets. In the xin dataset cells labeled as alpha.contaminated, beta.contaminated, gamma.contaminated and delta.contaminated were also removed since they likely correspond to cells of lower quality. All datasets in Bioconductor SingleCellExperiment class format can be downloaded from our website. The datasets can also be found in the ~/data folder. Let’s load the data:

library(SingleCellExperiment)

# xin
xin <- readRDS("~/data/xin.rds")
xin <- xin[,colData(xin)$cell_type1 != "alpha.contaminated"]
xin <- xin[,colData(xin)$cell_type1 != "beta.contaminated"]
xin <- xin[,colData(xin)$cell_type1 != "delta.contaminated"]
xin <- xin[,colData(xin)$cell_type1 != "gamma.contaminated"]

# segerstolpe
segerstolpe <- readRDS("~/data/segerstolpe.rds")
segerstolpe <- segerstolpe[,colData(segerstolpe)$cell_type1 != "not applicable"]

# muraro
muraro <- readRDS("~/data/muraro.rds")

# baron
baron <- readRDS("~/data/baron-human.rds")

Overview of the datasets:

xin
## class: SingleCellExperiment 
## dim: 39851 1492 
## metadata(0):
## assays(2): normcounts logcounts
## rownames(39851): A1BG A2M ... LOC102724004 LOC102724238
## rowData names(1): feature_symbol
## colnames(1492): Sample_1 Sample_2 ... Sample_1598 Sample_1600
## colData names(6): donor.id condition ... gender cell_type1
## reducedDimNames(0):
## spikeNames(1): ERCC
segerstolpe
## class: SingleCellExperiment 
## dim: 25525 2209 
## metadata(0):
## assays(2): counts logcounts
## rownames(25525): SGIP1 AZIN2 ...
##   ERCC_0.05722046:mix1_0.11444092:mix2
##   ERCC_0.01430512:mix1_0.02861023:mix2
## rowData names(10): feature_symbol is_feature_control ...
##   total_counts log10_total_counts
## colnames(2209): AZ_A10 AZ_A11 ... HP1526901T2D_P7 HP1526901T2D_P9
## colData names(33): cell_quality cell_type1 ... pct_counts_ERCC
##   is_cell_control
## reducedDimNames(0):
## spikeNames(1): ERCC
muraro
## class: SingleCellExperiment 
## dim: 19127 2126 
## metadata(0):
## assays(2): normcounts logcounts
## rownames(19127): A1BG-AS1__chr19 A1BG__chr19 ... ZZEF1__chr17
##   ZZZ3__chr1
## rowData names(1): feature_symbol
## colnames(2126): D28.1_1 D28.1_13 ... D31.8_93 D31.8_94
## colData names(3): cell_type1 donor batch
## reducedDimNames(0):
## spikeNames(1): ERCC

By default we put the cell labels provided in the original publication into the cell_type1 column of each dataset:

as.character(unique(xin$cell_type1))
## [1] "beta"  "alpha" "delta" "gamma"
as.character(unique(segerstolpe$cell_type1))
##  [1] "delta"                  "alpha"                 
##  [3] "gamma"                  "ductal"                
##  [5] "acinar"                 "beta"                  
##  [7] "unclassified endocrine" "co-expression"         
##  [9] "MHC class II"           "PSC"                   
## [11] "endothelial"            "epsilon"               
## [13] "mast"                   "unclassified"
as.character(unique(muraro$cell_type1))
##  [1] "alpha"       "ductal"      "endothelial" "delta"       "acinar"     
##  [6] "beta"        "unclear"     "gamma"       "mesenchymal" "epsilon"

In the following chapters we will be projecting baron dataset to the others using both scmap-cluster and scmap-cell methods (Fig. 1a).

2 Feature selection

Now we will load scmap and for all of the reference datasets select the most informative features (genes) using the dropout feature selection method (Fig. S1a):

library(scmap)
xin <- selectFeatures(xin, suppress_plot = FALSE)
## Warning in linearModel(object, n_features): Your object does not contain
## counts() slot. Dropouts were calculated using logcounts() slot...

segerstolpe <- selectFeatures(segerstolpe, suppress_plot = FALSE)

muraro <- selectFeatures(muraro, suppress_plot = FALSE)
## Warning in linearModel(object, n_features): Your object does not contain
## counts() slot. Dropouts were calculated using logcounts() slot...

Features are stored in the scmap_features column of the rowData slot of each dataset. By default scmap selects 500 features (it can also be controlled by setting n_features parameter):

table(rowData(xin)$scmap_features)
## 
## FALSE  TRUE 
## 39351   500
table(rowData(segerstolpe)$scmap_features)
## 
## FALSE  TRUE 
## 25025   500
table(rowData(muraro)$scmap_features)
## 
## FALSE  TRUE 
## 18627   500

3 scmap-cluster

3.1 Index

The scmap-cluster index of a reference dataset is created by finding the median gene expression for each cluster. By default scmap uses the cell_type1 column of the colData slot in the reference to identify clusters. Other columns can be manually selected by adjusting cluster_col parameter:

xin <- indexCluster(xin)
segerstolpe <- indexCluster(segerstolpe)
muraro <- indexCluster(muraro)

The function indexCluster automatically writes the scmap_cluster_index item of the metadata slot of the reference dataset. The index can be visualized as a heatmap:

library(pheatmap)
pheatmap(metadata(xin)$scmap_cluster_index, show_rownames = FALSE)