Single-cell RNA-seq (scRNA-seq) is widely used to investigate the composition of complex tissues. However, it is often challenging to directly compare the cells identified in two different experiments. scmap allows you to project cells from an scRNA-seq experiment (the Projection) on to the cell-types identified in a different experiment (the Reference).
scmap
manuscript is available on
bioRxiv.
scmap source code
is available on
GitHub.
scmap R package is also available on
Bioconductor.
More information about the existing Reference
can be found on our
dataset website.
Please send your feedback/comments/suggestions to
Vladimir Kiselev.
scmap is based on SingleCellExperiment format. Please make yourself familiar with it before running scmap.
rowData slots of both the Reference and Projection dataset must have the feature_symbol column which contains Feature (gene/transcript) names from the same organism.
scmap
tutorial
2017-11-28
Contents
1 Datasets
In this tutorial we will run scmap
on the four human pancreas datasets, xin
, segerstolpe
, muraro
and baron
, which are used as positive controls. In the segerstolpe
dataset we will remove cells labeled as not applicable since it is unclear how to interpret this label and how it should be matched to the other datasets. In the xin
dataset cells labeled as alpha.contaminated, beta.contaminated, gamma.contaminated and delta.contaminated were also removed since they likely correspond to cells of lower quality. All datasets in Bioconductor SingleCellExperiment
class format can be downloaded from our website. The datasets can also be found in the ~/data
folder. Let’s load the data:
library(SingleCellExperiment)
# xin
xin <- readRDS("~/data/xin.rds")
xin <- xin[,colData(xin)$cell_type1 != "alpha.contaminated"]
xin <- xin[,colData(xin)$cell_type1 != "beta.contaminated"]
xin <- xin[,colData(xin)$cell_type1 != "delta.contaminated"]
xin <- xin[,colData(xin)$cell_type1 != "gamma.contaminated"]
# segerstolpe
segerstolpe <- readRDS("~/data/segerstolpe.rds")
segerstolpe <- segerstolpe[,colData(segerstolpe)$cell_type1 != "not applicable"]
# muraro
muraro <- readRDS("~/data/muraro.rds")
# baron
baron <- readRDS("~/data/baron-human.rds")
Overview of the datasets:
xin
## class: SingleCellExperiment
## dim: 39851 1492
## metadata(0):
## assays(2): normcounts logcounts
## rownames(39851): A1BG A2M ... LOC102724004 LOC102724238
## rowData names(1): feature_symbol
## colnames(1492): Sample_1 Sample_2 ... Sample_1598 Sample_1600
## colData names(6): donor.id condition ... gender cell_type1
## reducedDimNames(0):
## spikeNames(1): ERCC
segerstolpe
## class: SingleCellExperiment
## dim: 25525 2209
## metadata(0):
## assays(2): counts logcounts
## rownames(25525): SGIP1 AZIN2 ...
## ERCC_0.05722046:mix1_0.11444092:mix2
## ERCC_0.01430512:mix1_0.02861023:mix2
## rowData names(10): feature_symbol is_feature_control ...
## total_counts log10_total_counts
## colnames(2209): AZ_A10 AZ_A11 ... HP1526901T2D_P7 HP1526901T2D_P9
## colData names(33): cell_quality cell_type1 ... pct_counts_ERCC
## is_cell_control
## reducedDimNames(0):
## spikeNames(1): ERCC
muraro
## class: SingleCellExperiment
## dim: 19127 2126
## metadata(0):
## assays(2): normcounts logcounts
## rownames(19127): A1BG-AS1__chr19 A1BG__chr19 ... ZZEF1__chr17
## ZZZ3__chr1
## rowData names(1): feature_symbol
## colnames(2126): D28.1_1 D28.1_13 ... D31.8_93 D31.8_94
## colData names(3): cell_type1 donor batch
## reducedDimNames(0):
## spikeNames(1): ERCC
By default we put the cell labels provided in the original publication into the cell_type1
column of each dataset:
as.character(unique(xin$cell_type1))
## [1] "beta" "alpha" "delta" "gamma"
as.character(unique(segerstolpe$cell_type1))
## [1] "delta" "alpha"
## [3] "gamma" "ductal"
## [5] "acinar" "beta"
## [7] "unclassified endocrine" "co-expression"
## [9] "MHC class II" "PSC"
## [11] "endothelial" "epsilon"
## [13] "mast" "unclassified"
as.character(unique(muraro$cell_type1))
## [1] "alpha" "ductal" "endothelial" "delta" "acinar"
## [6] "beta" "unclear" "gamma" "mesenchymal" "epsilon"
In the following chapters we will be projecting baron
dataset to the others using both scmap-cluster
and scmap-cell
methods (Fig. 1a).
2 Feature selection
Now we will load scmap
and for all of the reference datasets select the most informative features (genes) using the dropout feature selection method (Fig. S1a):
library(scmap)
xin <- selectFeatures(xin, suppress_plot = FALSE)
## Warning in linearModel(object, n_features): Your object does not contain
## counts() slot. Dropouts were calculated using logcounts() slot...
segerstolpe <- selectFeatures(segerstolpe, suppress_plot = FALSE)
muraro <- selectFeatures(muraro, suppress_plot = FALSE)
## Warning in linearModel(object, n_features): Your object does not contain
## counts() slot. Dropouts were calculated using logcounts() slot...
Features are stored in the scmap_features
column of the rowData
slot of each dataset. By default scmap
selects 500 features (it can also be controlled by setting n_features
parameter):
table(rowData(xin)$scmap_features)
##
## FALSE TRUE
## 39351 500
table(rowData(segerstolpe)$scmap_features)
##
## FALSE TRUE
## 25025 500
table(rowData(muraro)$scmap_features)
##
## FALSE TRUE
## 18627 500
3 scmap-cluster
3.1 Index
The scmap-cluster
index of a reference dataset is created by finding the median gene expression for each cluster. By default scmap
uses the cell_type1
column of the colData
slot in the reference to identify clusters. Other columns can be manually selected by adjusting cluster_col
parameter:
xin <- indexCluster(xin)
segerstolpe <- indexCluster(segerstolpe)
muraro <- indexCluster(muraro)
The function indexCluster
automatically writes the scmap_cluster_index
item of the metadata
slot of the reference dataset. The index can be visualized as a heatmap:
library(pheatmap)
pheatmap(metadata(xin)$scmap_cluster_index, show_rownames = FALSE)