Single-cell RNA-seq (scRNA-seq) is widely used to investigate the composition of complex tissues. However, it is often challenging to directly compare the cells identified in two different experiments. scmap allows you to project cells from an scRNA-seq experiment (the Projection) on to the cell-types identified in a different experiment (the Reference).
scmap
manuscript is available on
bioRxiv.
scmap source code
is available on
GitHub.
scmap R package is also available on
Bioconductor.
More information about the existing Reference
can be found on our
dataset website.
Please send your feedback/comments/suggestions to
Vladimir Kiselev.
scmap is based on SingleCellExperiment format. Please make yourself familiar with it before running scmap.
rowData slots of both the Reference and Projection dataset must have the feature_symbol column which contains Feature (gene/transcript) names from the same organism.
scmap
tutorial
2017-11-28
Contents
1 Datasets
In this tutorial we will run scmap
on the four human pancreas datasets, xin
, segerstolpe
, muraro
and baron
, which are used as positive controls. In the segerstolpe
dataset we will remove cells labeled as not applicable since it is unclear how to interpret this label and how it should be matched to the other datasets. In the xin
dataset cells labeled as alpha.contaminated, beta.contaminated, gamma.contaminated and delta.contaminated were also removed since they likely correspond to cells of lower quality. All datasets in Bioconductor SingleCellExperiment
class format can be downloaded from our website. The datasets can also be found in the ~/data
folder. Let’s load the data:
library(SingleCellExperiment)
# xin
xin <- readRDS("~/data/xin.rds")
xin <- xin[,colData(xin)$cell_type1 != "alpha.contaminated"]
xin <- xin[,colData(xin)$cell_type1 != "beta.contaminated"]
xin <- xin[,colData(xin)$cell_type1 != "delta.contaminated"]
xin <- xin[,colData(xin)$cell_type1 != "gamma.contaminated"]
# segerstolpe
segerstolpe <- readRDS("~/data/segerstolpe.rds")
segerstolpe <- segerstolpe[,colData(segerstolpe)$cell_type1 != "not applicable"]
# muraro
muraro <- readRDS("~/data/muraro.rds")
# baron
baron <- readRDS("~/data/baron-human.rds")
Overview of the datasets:
xin
## class: SingleCellExperiment
## dim: 39851 1492
## metadata(0):
## assays(2): normcounts logcounts
## rownames(39851): A1BG A2M ... LOC102724004 LOC102724238
## rowData names(1): feature_symbol
## colnames(1492): Sample_1 Sample_2 ... Sample_1598 Sample_1600
## colData names(6): donor.id condition ... gender cell_type1
## reducedDimNames(0):
## spikeNames(1): ERCC
segerstolpe
## class: SingleCellExperiment
## dim: 25525 2209
## metadata(0):
## assays(2): counts logcounts
## rownames(25525): SGIP1 AZIN2 ...
## ERCC_0.05722046:mix1_0.11444092:mix2
## ERCC_0.01430512:mix1_0.02861023:mix2
## rowData names(10): feature_symbol is_feature_control ...
## total_counts log10_total_counts
## colnames(2209): AZ_A10 AZ_A11 ... HP1526901T2D_P7 HP1526901T2D_P9
## colData names(33): cell_quality cell_type1 ... pct_counts_ERCC
## is_cell_control
## reducedDimNames(0):
## spikeNames(1): ERCC
muraro
## class: SingleCellExperiment
## dim: 19127 2126
## metadata(0):
## assays(2): normcounts logcounts
## rownames(19127): A1BG-AS1__chr19 A1BG__chr19 ... ZZEF1__chr17
## ZZZ3__chr1
## rowData names(1): feature_symbol
## colnames(2126): D28.1_1 D28.1_13 ... D31.8_93 D31.8_94
## colData names(3): cell_type1 donor batch
## reducedDimNames(0):
## spikeNames(1): ERCC
By default we put the cell labels provided in the original publication into the cell_type1
column of each dataset:
as.character(unique(xin$cell_type1))
## [1] "beta" "alpha" "delta" "gamma"
as.character(unique(segerstolpe$cell_type1))
## [1] "delta" "alpha"
## [3] "gamma" "ductal"
## [5] "acinar" "beta"
## [7] "unclassified endocrine" "co-expression"
## [9] "MHC class II" "PSC"
## [11] "endothelial" "epsilon"
## [13] "mast" "unclassified"
as.character(unique(muraro$cell_type1))
## [1] "alpha" "ductal" "endothelial" "delta" "acinar"
## [6] "beta" "unclear" "gamma" "mesenchymal" "epsilon"
In the following chapters we will be projecting baron
dataset to the others using both scmap-cluster
and scmap-cell
methods (Fig. 1a).
2 Feature selection
Now we will load scmap
and for all of the reference datasets select the most informative features (genes) using the dropout feature selection method (Fig. S1a):
library(scmap)
xin <- selectFeatures(xin, suppress_plot = FALSE)
## Warning in linearModel(object, n_features): Your object does not contain
## counts() slot. Dropouts were calculated using logcounts() slot...
segerstolpe <- selectFeatures(segerstolpe, suppress_plot = FALSE)
muraro <- selectFeatures(muraro, suppress_plot = FALSE)
## Warning in linearModel(object, n_features): Your object does not contain
## counts() slot. Dropouts were calculated using logcounts() slot...
Features are stored in the scmap_features
column of the rowData
slot of each dataset. By default scmap
selects 500 features (it can also be controlled by setting n_features
parameter):
table(rowData(xin)$scmap_features)
##
## FALSE TRUE
## 39351 500
table(rowData(segerstolpe)$scmap_features)
##
## FALSE TRUE
## 25025 500
table(rowData(muraro)$scmap_features)
##
## FALSE TRUE
## 18627 500
3 scmap-cluster
3.1 Index
The scmap-cluster
index of a reference dataset is created by finding the median gene expression for each cluster. By default scmap
uses the cell_type1
column of the colData
slot in the reference to identify clusters. Other columns can be manually selected by adjusting cluster_col
parameter:
xin <- indexCluster(xin)
segerstolpe <- indexCluster(segerstolpe)
muraro <- indexCluster(muraro)
The function indexCluster
automatically writes the scmap_cluster_index
item of the metadata
slot of the reference dataset. The index can be visualized as a heatmap:
library(pheatmap)
pheatmap(metadata(xin)$scmap_cluster_index, show_rownames = FALSE)
pheatmap(metadata(segerstolpe)$scmap_cluster_index, show_rownames = FALSE)
pheatmap(metadata(muraro)$scmap_cluster_index, show_rownames = FALSE)
Visual inspection suggests that the alpha, beta, gamma and delta cells are relatively similar compared to the other cell-types in the human pancreas.
3.2 Projection
Once the scmap-cluster
indexes have been generated we can use them to project the baron
dataset. This can be done with one index at a time, but scmap
allows for simultaneous projection to multiple indexes if they are provided as a list:
scmapCluster_results <- scmapCluster(
projection = baron,
index_list = list(
muraro = metadata(muraro)$scmap_cluster_index,
xin = metadata(xin)$scmap_cluster_index,
segerstolpe = metadata(segerstolpe)$scmap_cluster_index
)
)
3.3 Results
scmap-cluster
projects the query dataset to all projections defined in the index_list
. The results of cell label assignements are merged into one matrix:
head(scmapCluster_results$scmap_cluster_labs)
## muraro segerstolpe xin
## [1,] "acinar" "unassigned" "unassigned"
## [2,] "acinar" "acinar" "unassigned"
## [3,] "unassigned" "unassigned" "unassigned"
## [4,] "acinar" "unassigned" "unassigned"
## [5,] "acinar" "unassigned" "unassigned"
## [6,] "acinar" "unassigned" "unassigned"
Corresponding similarities are stored in the scmap_cluster_siml
item:
head(scmapCluster_results$scmap_cluster_siml)
## muraro segerstolpe xin
## [1,] 0.7478097 0.6624229 0.6200678
## [2,] 0.7972836 0.7113680 0.6280858
## [3,] 0.6791786 0.5719351 NA
## [4,] 0.7426530 0.6470397 0.5562021
## [5,] 0.7423038 0.6621919 0.5407762
## [6,] 0.7112467 0.6275218 0.5432593
scmap
also provides combined results of all reference dataset (choose labels corresponding to the largest similarity across reference datasets), as shown in Fig. 2d:
head(scmapCluster_results$combined_labs)
## [1] "acinar" "acinar" "unassigned" "acinar" "acinar"
## [6] "acinar"
3.4 Visualisation
The results of scmap-cluster
can be visualized as a Sankey diagram to show how cell-types are matched (getSankey()
function). Note that the Sankey diagram will only be informative if both the query and the reference datasets have been clustered, but it is not necessary to have meaningful labels assigned to the query (cluster1
, cluster2
etc. is sufficient). scmap
allows to visualise both reference-specific and combined results (Fig. 2cd):
plot(
getSankey(
colData(baron)$cell_type1,
scmapCluster_results$scmap_cluster_labs[,"xin"],
plot_height = 400
)
)
plot(
getSankey(
colData(baron)$cell_type1,
scmapCluster_results$combined_labs,
plot_height = 400
)
)
Clearly the combination strategy (projection to large number of references) improves scmap
results.
3.5 Quantification
If the cluster names match, as is the case for the pancreas datasets, then we can calculate the quality of the projection using Cohen’s kappa. We can use the kappa2
function from the irr
package to do this:
library(irr)
## Loading required package: lpSolve
kappa2(
data.frame(
colData(baron)$cell_type1,
scmapCluster_results$scmap_cluster_labs[,"xin"]
)[scmapCluster_results$scmap_cluster_labs[,"xin"] != "unassigned", ]
)$value
## [1] 0.8640227
Since kappa does not take the unassigned cells into consideration, this fraction also needs to be calculated to evaluate the projection:
length(which(scmapCluster_results$scmap_cluster_labs[,"xin"] != "unassigned"))/
ncol(baron)*100
## [1] 1.120317
We repeat these calculations for muraro
and segerstolpe
:
kappa2(
data.frame(
colData(baron)$cell_type1,
scmapCluster_results$scmap_cluster_labs[,"segerstolpe"]
)[scmapCluster_results$scmap_cluster_labs[,"segerstolpe"] != "unassigned", ]
)$value
## [1] 0.7961309
length(which(scmapCluster_results$scmap_cluster_labs[,"segerstolpe"] != "unassigned"))/
ncol(baron)*100
## [1] 12.32349
kappa2(
data.frame(
colData(baron)$cell_type1,
scmapCluster_results$scmap_cluster_labs[,"muraro"]
)[scmapCluster_results$scmap_cluster_labs[,"muraro"] != "unassigned", ]
)$value
## [1] 0.8834265
length(which(scmapCluster_results$scmap_cluster_labs[,"muraro"] != "unassigned"))/
ncol(baron)*100
## [1] 36.98214
We note that all three projections are good with kappa values of \(.8\) or greater. However, the fraction of unassigned cells is high, in particular for xin
. However, as shown in a Sankey diagram above, using the combination strategy the results can be improved:
kappa2(
data.frame(
colData(baron)$cell_type1,
scmapCluster_results$combined_labs
)[scmapCluster_results$combined_labs != "unassigned", ]
)$value
## [1] 0.8775028
length(which(scmapCluster_results$combined_labs != "unassigned"))/
ncol(baron)*100
## [1] 37.36725
3.6 Properties
Next, we ask what properties of the datasets determine the quality of the projection. We start by comparing the number of expressed features (per cell):
n_features_xin <- colSums(logcounts(xin) != 0)
n_features_muraro <- colSums(logcounts(muraro) != 0)
n_features_segerstolpe <- colSums(logcounts(segerstolpe) != 0)
n_features_baron <- colSums(logcounts(baron) != 0)
boxplot(
list(
xin = n_features_xin,
segerstolpe = n_features_segerstolpe,
muraro = n_features_muraro,
baron = n_features_baron
)
)
Clearly, baron
has far fewer genes detected than the other three datasets. In particular, we note that the difference in number of features matches the kappa value - the smallest difference is for muraro
, which also has the smallest number of unassigned cells and the highest kappa. We hypothesize that the discrepancy in the number of features between the projection dataset (baron
) and reference datasets (xin
, segerstolpe
and muraro
) is making it challenging for scmap-cluster
to find the correct projections. Indeed, it’s also in agreement with experimental protocols used to obtain these datasets: droplet-based (inDrop
) protocol for baron
dataset versus full-length transcript methods used for xin
, segerstolpe
and muraro
(SMARTer
, Smart-Seq2
and CEL-Seq2
correspondingly). Interestingly, the reversed projections of full-length transcript datasets to baron
were very good and did not show any technology-based biases (Table S2). An important implication of this result is that it will most likely work well to build a reference with shallowly sequenced cells and then later carry out in-depth studies with deeper sequencing.
4 scmap-cell
4.1 Stochasticity
scmap-cell
contains k-means step which makes it stochastic, i.e. running it multiple times will provide slightly different results. Therefore, we will fix a random seed, so that a user will be able to exactly reproduce our results:
set.seed(1)
4.2 Index
In the scmap-cell
index is created by a product quantiser algorithm in a way that every cell in the reference is identified with a set of sub-centroids found via k-means clustering based on a subset of the features.
xin <- indexCell(xin)
segerstolpe <- indexCell(segerstolpe)
muraro <- indexCell(muraro)
Unlike scmap-cluster
index scmap-cell
index contains information about each cell and therefore can not be easily visualised. scmap-cell
index consists of two items:
names(metadata(muraro)$scmap_cell_index)
## [1] "subcentroids" "subclusters"
4.2.1 Sub-centroids
subcentroids
contains coordinates of subcentroids of low dimensional subspaces defined by selected features, k
and M
parameters of the product quantiser algorithm (see ?indexCell
).
For the muraro
dataset:
muraro
dataset contains \(N = 2126\) cells- We selected \(f = 500\) features (
scmap
default) M
was calculated as \(f / 10 = 50\) (scmap
default for \(f \le 1000\)).M
is the number of low dimensional subspaces- Number of features in any low dimensional subspace equals to \(f / M = 10\)
k
was calculated as \(k = \sqrt{N} \approx 46\) (scmap
default).
length(metadata(muraro)$scmap_cell_index$subcentroids)
## [1] 50
dim(metadata(muraro)$scmap_cell_index$subcentroids[[1]])
## [1] 10 46
metadata(muraro)$scmap_cell_index$subcentroids[[1]][,1:5]
## 1 2 3 4 5
## A1CF 0.178557833 0.043573607 0.020780502 0.38818725 0.46565223
## ABCC3 0.007349785 0.402101099 0.019804404 0.00697640 0.01173490
## ABCC8 0.810363545 0.048265403 0.030323987 0.54042966 0.43811744
## ABLIM1 0.057267860 0.263258431 0.134775189 0.52402905 0.54438961
## ACTN1 0.300223851 0.469366223 0.551164399 0.36975894 0.40989502
## ADAM9 0.033574091 0.547897504 0.294440230 0.06375817 0.33677837
## ADCYAP1 0.028559960 0.008874296 0.009282826 0.01638350 0.01351845
## AGT 0.032614509 0.080030710 0.043300057 0.06337012 0.08933909
## AHNAK 0.064549246 0.486013238 0.766480182 0.02510823 0.03763972
## AKAP12 0.459115274 0.039907830 0.012723226 0.36998639 0.06089972
4.2.2 Sub-clusters
For each of the M
partitions of the feature space, subclusters
contains information about which cluster the cells belong to:
dim(metadata(muraro)$scmap_cell_index$subclusters)
## [1] 50 2126
metadata(muraro)$scmap_cell_index$subclusters[1:5,1:5]
## D28.1_1 D28.1_13 D28.1_15 D28.1_17 D28.1_2
## [1,] 4 20 5 42 34
## [2,] 5 8 46 17 22
## [3,] 46 13 25 38 33
## [4,] 10 31 45 10 21
## [5,] 22 4 7 10 1
4.3 Projection
Once the scmap-cell
indexes have been generated we can use them to project the baron
dataset. This can be done with one index at a time, but as before scmap
allows for simultaneous projection to multiple indexes if they are provided as a list:
scmapCell_results <- scmapCell(
baron,
list(
xin = metadata(xin)$scmap_cell_index,
segerstolpe = metadata(segerstolpe)$scmap_cell_index,
muraro = metadata(muraro)$scmap_cell_index
)
)
4.4 Results
scmapCell_results
contains results of projection for each reference dataset in a list:
names(scmapCell_results)
## [1] "xin" "segerstolpe" "muraro"
For each dataset there are two matricies. The cells
matrix contains the top 10 (scmap
default) cell IDs of the cells of the reference dataset that a given cell of the projection dataset is closest to:
scmapCell_results$xin$cells[,1:3]
## human1_lib1.final_cell_0001 human1_lib1.final_cell_0002
## [1,] 41 401
## [2,] 42 675
## [3,] 671 321
## [4,] 45 45
## [5,] 48 1004
## [6,] 791 1196
## [7,] 757 973
## [8,] 995 893
## [9,] 9 1046
## [10,] 893 995
## human1_lib1.final_cell_0003
## [1,] 352
## [2,] 45
## [3,] 1046
## [4,] 1260
## [5,] 1004
## [6,] 321
## [7,] 1117
## [8,] 616
## [9,] 9
## [10,] 609
The similarities
matrix contains corresponding cosine similarities:
scmapCell_results$xin$similarities[,1:3]
## human1_lib1.final_cell_0001 human1_lib1.final_cell_0002
## [1,] 0.5831939 0.5883823
## [2,] 0.5816648 0.5894626
## [3,] 0.5754689 0.5925527
## [4,] 0.5895254 0.5907931
## [5,] 0.5832395 0.5883772
## [6,] 0.5815033 0.5887095
## [7,] 0.5777185 0.5919815
## [8,] 0.5775480 0.5941742
## [9,] 0.5761091 0.5950193
## [10,] 0.5763782 0.6007502
## human1_lib1.final_cell_0003
## [1,] 0.4550436
## [2,] 0.4635613
## [3,] 0.4580886
## [4,] 0.4615652
## [5,] 0.4582631
## [6,] 0.4573644
## [7,] 0.4625933
## [8,] 0.4612085
## [9,] 0.4552231
## [10,] 0.4599124
4.5 Cluster annotation
If cell cluster annotation is available for the reference datasets, in addition to finding top 10 nearest neighbours scmap-cell
can also annotate the cells from the projection dataset using the labels of the reference. It does so by looking at the top 3 nearest neighbours (scmap
default) and if they all belong to the same cluster in the reference and their maximum similarity is higher than a threshold (\(0.5\) is the scmap
default), then a projection cell is assigned to the corresponding reference cluster:
scmapCell_clusters <- scmapCell2Cluster(
baron,
scmapCell_results,
list(
colData(xin)$cell_type1,
colData(segerstolpe)$cell_type1,
colData(muraro)$cell_type1
)
)
scmap-cell
results are in the same format as the ones provided by scmap-cluster
(see above):
head(scmapCell_clusters$scmap_cluster_labs)
## muraro segerstolpe xin
## [1,] "unassigned" "acinar" "beta"
## [2,] "acinar" "acinar" "unassigned"
## [3,] "unassigned" "acinar" "unassigned"
## [4,] "acinar" "acinar" "alpha"
## [5,] "acinar" "acinar" "alpha"
## [6,] "acinar" "acinar" "unassigned"
The corresponding similarities are stored in the scmap_cluster_siml
item:
head(scmapCell_clusters$scmap_cluster_siml)
## muraro segerstolpe xin
## [1,] NA 0.6320202 0.5895254
## [2,] 0.7097839 0.6710663 NA
## [3,] NA 0.5507419 NA
## [4,] 0.6710335 0.6104701 0.5114259
## [5,] 0.6696379 0.6362097 0.5103914
## [6,] 0.6512017 0.5999376 NA
head(scmapCell_clusters$combined_labs)
## [1] "acinar" "acinar" "acinar" "acinar" "acinar" "acinar"
4.6 Visualisation
If the query cells are annotated with cluster names, then we can visualize the projection using Sankey diagrams as before:
plot(
getSankey(
colData(baron)$cell_type1,
scmapCell_clusters$scmap_cluster_labs[,"xin"],
plot_height = 400
)
)
plot(
getSankey(
colData(baron)$cell_type1,
scmapCell_clusters$combined_labs,
plot_height = 400
)
)
4.7 Quantification
Similarly, we can quantify the accuracy of the projection by calculating Cohen’s kappa and fraction of unassigned cells:
kappa2(
data.frame(
colData(baron)$cell_type1,
scmapCell_clusters$scmap_cluster_labs[,"xin"]
)[scmapCell_clusters$scmap_cluster_labs[,"xin"] != "unassigned", ]
)$value
## [1] 0.6985075
length(which(scmapCell_clusters$scmap_cluster_labs[,"xin"] != "unassigned"))/
ncol(baron)*100
## [1] 63.78807
kappa2(
data.frame(
colData(baron)$cell_type1,
scmapCell_clusters$scmap_cluster_labs[,"segerstolpe"]
)[scmapCell_clusters$scmap_cluster_labs[,"segerstolpe"] != "unassigned", ]
)$value
## [1] 0.8886673
length(which(scmapCell_clusters$scmap_cluster_labs[,"segerstolpe"] != "unassigned"))/
ncol(baron)*100
## [1] 79.37916
kappa2(
data.frame(
colData(baron)$cell_type1,
scmapCell_clusters$scmap_cluster_labs[,"muraro"]
)[scmapCell_clusters$scmap_cluster_labs[,"muraro"] != "unassigned", ]
)$value
## [1] 0.8983198
length(which(scmapCell_clusters$scmap_cluster_labs[,"muraro"] != "unassigned"))/
ncol(baron)*100
## [1] 85.82098
kappa2(
data.frame(
colData(baron)$cell_type1,
scmapCell_clusters$combined_labs
)[scmapCell_clusters$combined_labs != "unassigned", ]
)$value
## [1] 0.8838892
length(which(scmapCell_clusters$combined_labs != "unassigned"))/
ncol(baron)*100
## [1] 94.50344
4.8 Properties
Comparing the results with the ones for scmap-cluster
, we conclude that the kappa values are similar to before, while the fraction of unassigned cells is much lower. Thus, for this example, we conclude that scmap-cell
performs better.
5 sessionInfo()
## R version 3.4.2 (2017-09-28)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Sierra 10.12.6
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
##
## attached base packages:
## [1] parallel stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] irr_0.84 lpSolve_5.6.13
## [3] pheatmap_1.0.8 bindrcpp_0.2
## [5] scmap_1.1.2 SingleCellExperiment_1.0.0
## [7] SummarizedExperiment_1.8.0 DelayedArray_0.4.1
## [9] matrixStats_0.52.2 Biobase_2.38.0
## [11] GenomicRanges_1.30.0 GenomeInfoDb_1.14.0
## [13] IRanges_2.12.0 S4Vectors_0.16.0
## [15] BiocGenerics_0.24.0 googleVis_0.6.2
## [17] BiocStyle_2.5.43
##
## loaded via a namespace (and not attached):
## [1] reshape2_1.4.2 lattice_0.20-35 colorspace_1.3-2
## [4] htmltools_0.3.6 yaml_2.1.14 rlang_0.1.2
## [7] e1071_1.6-8 glue_1.1.1 RColorBrewer_1.1-2
## [10] GenomeInfoDbData_0.99.1 bindr_0.1 plyr_1.8.4
## [13] stringr_1.2.0 zlibbioc_1.24.0 munsell_0.4.3
## [16] gtable_0.2.0 codetools_0.2-15 evaluate_0.10.1
## [19] labeling_0.3 knitr_1.17 class_7.3-14
## [22] Rcpp_0.12.13 backports_1.1.1 scales_0.5.0
## [25] jsonlite_1.5 XVector_0.18.0 ggplot2_2.2.1
## [28] digest_0.6.12 stringi_1.1.5 bookdown_0.5
## [31] dplyr_0.7.4 grid_3.4.2 rprojroot_1.2
## [34] tools_3.4.2 bitops_1.0-6 magrittr_1.5
## [37] RCurl_1.95-4.8 lazyeval_0.2.0 proxy_0.4-19
## [40] tibble_1.3.4 randomForest_4.6-12 pkgconfig_2.0.1
## [43] Matrix_1.2-11 assertthat_0.2.0 rmarkdown_1.8.3
## [46] R6_2.2.2 compiler_3.4.2
Reference
What would like to do?
Projection
Select an .rds file containing data in SingleCellExperiment format
(example-segerstolpe - a human pancreatic dataset with 3514 cells)
Your data is not in SingleCellExperiment format! Please upload a data in the correct format.
rowData slot of your SingleCellExperiment object does not contain a feature_symbol column. Please add it to your object and re-upload it.
Would you like to run scmap-cell? (may take some time)
Feature Selection
scmap selects 500 most informative features (genes/transcripts) of the Reference dataset by fitting a linear model to the log(expression) vs log(dropout) distribution of features. The most informative features are shown in red on the plot below. Conceptually, these features have a higher dropout rate for a given mean expression. More information about the selected features is provided in the table below the plot. scmap scores are defined as the residuals of the linear model.
Selected Features
Cell Projection
scmap projects all cells of the Projection dataset to the either cell clusters (scmap-cluster) or each individual cell (scmap-cell) of the Reference dataset.
scmap did not work! Most probably Reference and Projection come from different organisms, please check your inputs!