Single-cell RNA-seq (scRNA-seq) is widely used to investigate the composition of complex tissues. However, it is often challenging to directly compare the cells identified in two different experiments. scmap allows you to project cells from an scRNA-seq experiment (the Projection) on to the cell-types identified in a different experiment (the Reference).

scmap manuscript is available on bioRxiv.
scmap source code is available on GitHub.
scmap R package is also available on Bioconductor.
More information about the existing Reference can be found on our dataset website.
Please send your feedback/comments/suggestions to Vladimir Kiselev.

scmap is based on SingleCellExperiment format. Please make yourself familiar with it before running scmap.

rowData slots of both the Reference and Projection dataset must have the feature_symbol column which contains Feature (gene/transcript) names from the same organism.

1 Datasets

In this tutorial we will run scmap on the four human pancreas datasets, xin, segerstolpe, muraro and baron, which are used as positive controls. In the segerstolpe dataset we will remove cells labeled as not applicable since it is unclear how to interpret this label and how it should be matched to the other datasets. In the xin dataset cells labeled as alpha.contaminated, beta.contaminated, gamma.contaminated and delta.contaminated were also removed since they likely correspond to cells of lower quality. All datasets in Bioconductor SingleCellExperiment class format can be downloaded from our website. The datasets can also be found in the ~/data folder. Let’s load the data:

library(SingleCellExperiment)

# xin
xin <- readRDS("~/data/xin.rds")
xin <- xin[,colData(xin)$cell_type1 != "alpha.contaminated"]
xin <- xin[,colData(xin)$cell_type1 != "beta.contaminated"]
xin <- xin[,colData(xin)$cell_type1 != "delta.contaminated"]
xin <- xin[,colData(xin)$cell_type1 != "gamma.contaminated"]

# segerstolpe
segerstolpe <- readRDS("~/data/segerstolpe.rds")
segerstolpe <- segerstolpe[,colData(segerstolpe)$cell_type1 != "not applicable"]

# muraro
muraro <- readRDS("~/data/muraro.rds")

# baron
baron <- readRDS("~/data/baron-human.rds")

Overview of the datasets:

xin

## class: SingleCellExperiment 
## dim: 39851 1492 
## metadata(0):
## assays(2): normcounts logcounts
## rownames(39851): A1BG A2M ... LOC102724004 LOC102724238
## rowData names(1): feature_symbol
## colnames(1492): Sample_1 Sample_2 ... Sample_1598 Sample_1600
## colData names(6): donor.id condition ... gender cell_type1
## reducedDimNames(0):
## spikeNames(1): ERCC

segerstolpe

## class: SingleCellExperiment 
## dim: 25525 2209 
## metadata(0):
## assays(2): counts logcounts
## rownames(25525): SGIP1 AZIN2 ...
##   ERCC_0.05722046:mix1_0.11444092:mix2
##   ERCC_0.01430512:mix1_0.02861023:mix2
## rowData names(10): feature_symbol is_feature_control ...
##   total_counts log10_total_counts
## colnames(2209): AZ_A10 AZ_A11 ... HP1526901T2D_P7 HP1526901T2D_P9
## colData names(33): cell_quality cell_type1 ... pct_counts_ERCC
##   is_cell_control
## reducedDimNames(0):
## spikeNames(1): ERCC

muraro

## class: SingleCellExperiment 
## dim: 19127 2126 
## metadata(0):
## assays(2): normcounts logcounts
## rownames(19127): A1BG-AS1__chr19 A1BG__chr19 ... ZZEF1__chr17
##   ZZZ3__chr1
## rowData names(1): feature_symbol
## colnames(2126): D28.1_1 D28.1_13 ... D31.8_93 D31.8_94
## colData names(3): cell_type1 donor batch
## reducedDimNames(0):
## spikeNames(1): ERCC

By default we put the cell labels provided in the original publication into the cell_type1 column of each dataset:

as.character(unique(xin$cell_type1))

## [1] "beta"  "alpha" "delta" "gamma"

as.character(unique(segerstolpe$cell_type1))

##  [1] "delta"                  "alpha"                 
##  [3] "gamma"                  "ductal"                
##  [5] "acinar"                 "beta"                  
##  [7] "unclassified endocrine" "co-expression"         
##  [9] "MHC class II"           "PSC"                   
## [11] "endothelial"            "epsilon"               
## [13] "mast"                   "unclassified"

as.character(unique(muraro$cell_type1))

##  [1] "alpha"       "ductal"      "endothelial" "delta"       "acinar"     
##  [6] "beta"        "unclear"     "gamma"       "mesenchymal" "epsilon"

In the following chapters we will be projecting baron dataset to the others using both scmap-cluster and scmap-cell methods (Fig. 1a).

2 Feature selection

Now we will load scmap and for all of the reference datasets select the most informative features (genes) using the dropout feature selection method (Fig. S1a):

library(scmap)
xin <- selectFeatures(xin, suppress_plot = FALSE)

## Warning in linearModel(object, n_features): Your object does not contain
## counts() slot. Dropouts were calculated using logcounts() slot...

segerstolpe <- selectFeatures(segerstolpe, suppress_plot = FALSE)

muraro <- selectFeatures(muraro, suppress_plot = FALSE)

## Warning in linearModel(object, n_features): Your object does not contain
## counts() slot. Dropouts were calculated using logcounts() slot...

Features are stored in the scmap_features column of the rowData slot of each dataset. By default scmap selects 500 features (it can also be controlled by setting n_features parameter):

table(rowData(xin)$scmap_features)

## 
## FALSE  TRUE 
## 39351   500

table(rowData(segerstolpe)$scmap_features)

## 
## FALSE  TRUE 
## 25025   500

table(rowData(muraro)$scmap_features)

## 
## FALSE  TRUE 
## 18627   500

3 scmap-cluster

3.1 Index

The scmap-cluster index of a reference dataset is created by finding the median gene expression for each cluster. By default scmap uses the cell_type1 column of the colData slot in the reference to identify clusters. Other columns can be manually selected by adjusting cluster_col parameter:

xin <- indexCluster(xin)
segerstolpe <- indexCluster(segerstolpe)
muraro <- indexCluster(muraro)

The function indexCluster automatically writes the scmap_cluster_index item of the metadata slot of the reference dataset. The index can be visualized as a heatmap:

library(pheatmap)
pheatmap(metadata(xin)$scmap_cluster_index, show_rownames = FALSE)

pheatmap(metadata(segerstolpe)$scmap_cluster_index, show_rownames = FALSE)

pheatmap(metadata(muraro)$scmap_cluster_index, show_rownames = FALSE)

Visual inspection suggests that the alpha, beta, gamma and delta cells are relatively similar compared to the other cell-types in the human pancreas.

3.2 Projection

Once the scmap-cluster indexes have been generated we can use them to project the baron dataset. This can be done with one index at a time, but scmap allows for simultaneous projection to multiple indexes if they are provided as a list:

scmapCluster_results <- scmapCluster(
  projection = baron, 
  index_list = list(
    muraro = metadata(muraro)$scmap_cluster_index,
    xin = metadata(xin)$scmap_cluster_index,
    segerstolpe = metadata(segerstolpe)$scmap_cluster_index
  )
)

3.3 Results

scmap-cluster projects the query dataset to all projections defined in the index_list. The results of cell label assignements are merged into one matrix:

head(scmapCluster_results$scmap_cluster_labs)

##      muraro       segerstolpe  xin         
## [1,] "acinar"     "unassigned" "unassigned"
## [2,] "acinar"     "acinar"     "unassigned"
## [3,] "unassigned" "unassigned" "unassigned"
## [4,] "acinar"     "unassigned" "unassigned"
## [5,] "acinar"     "unassigned" "unassigned"
## [6,] "acinar"     "unassigned" "unassigned"

Corresponding similarities are stored in the scmap_cluster_siml item:

head(scmapCluster_results$scmap_cluster_siml)

##         muraro segerstolpe       xin
## [1,] 0.7478097   0.6624229 0.6200678
## [2,] 0.7972836   0.7113680 0.6280858
## [3,] 0.6791786   0.5719351        NA
## [4,] 0.7426530   0.6470397 0.5562021
## [5,] 0.7423038   0.6621919 0.5407762
## [6,] 0.7112467   0.6275218 0.5432593

scmap also provides combined results of all reference dataset (choose labels corresponding to the largest similarity across reference datasets), as shown in Fig. 2d:

head(scmapCluster_results$combined_labs)

## [1] "acinar"     "acinar"     "unassigned" "acinar"     "acinar"    
## [6] "acinar"

3.4 Visualisation

The results of scmap-cluster can be visualized as a Sankey diagram to show how cell-types are matched (getSankey() function). Note that the Sankey diagram will only be informative if both the query and the reference datasets have been clustered, but it is not necessary to have meaningful labels assigned to the query (cluster1, cluster2 etc. is sufficient). scmap allows to visualise both reference-specific and combined results (Fig. 2cd):

plot(
  getSankey(
    colData(baron)$cell_type1, 
    scmapCluster_results$scmap_cluster_labs[,"xin"],
    plot_height = 400
  )
)

plot(
  getSankey(
    colData(baron)$cell_type1, 
    scmapCluster_results$combined_labs,
    plot_height = 400
  )
)

Clearly the combination strategy (projection to large number of references) improves scmap results.

3.5 Quantification

If the cluster names match, as is the case for the pancreas datasets, then we can calculate the quality of the projection using Cohen’s kappa. We can use the kappa2 function from the irr package to do this:

library(irr)

## Loading required package: lpSolve

kappa2(
  data.frame(
    colData(baron)$cell_type1,
    scmapCluster_results$scmap_cluster_labs[,"xin"]
  )[scmapCluster_results$scmap_cluster_labs[,"xin"] != "unassigned", ]
)$value

## [1] 0.8640227

Since kappa does not take the unassigned cells into consideration, this fraction also needs to be calculated to evaluate the projection:

length(which(scmapCluster_results$scmap_cluster_labs[,"xin"] != "unassigned"))/
  ncol(baron)*100

## [1] 1.120317

We repeat these calculations for muraro and segerstolpe:

kappa2(
  data.frame(
    colData(baron)$cell_type1,
    scmapCluster_results$scmap_cluster_labs[,"segerstolpe"]
  )[scmapCluster_results$scmap_cluster_labs[,"segerstolpe"] != "unassigned", ]
)$value

## [1] 0.7961309

length(which(scmapCluster_results$scmap_cluster_labs[,"segerstolpe"] != "unassigned"))/
  ncol(baron)*100

## [1] 12.32349

kappa2(
  data.frame(
    colData(baron)$cell_type1,
    scmapCluster_results$scmap_cluster_labs[,"muraro"]
  )[scmapCluster_results$scmap_cluster_labs[,"muraro"] != "unassigned", ]
)$value

## [1] 0.8834265

length(which(scmapCluster_results$scmap_cluster_labs[,"muraro"] != "unassigned"))/
  ncol(baron)*100

## [1] 36.98214

We note that all three projections are good with kappa values of \(.8\) or greater. However, the fraction of unassigned cells is high, in particular for xin. However, as shown in a Sankey diagram above, using the combination strategy the results can be improved:

kappa2(
  data.frame(
    colData(baron)$cell_type1,
    scmapCluster_results$combined_labs
  )[scmapCluster_results$combined_labs != "unassigned", ]
)$value

## [1] 0.8775028

length(which(scmapCluster_results$combined_labs != "unassigned"))/
  ncol(baron)*100

## [1] 37.36725

3.6 Properties

Next, we ask what properties of the datasets determine the quality of the projection. We start by comparing the number of expressed features (per cell):

n_features_xin <- colSums(logcounts(xin) != 0)
n_features_muraro <- colSums(logcounts(muraro) != 0)
n_features_segerstolpe <- colSums(logcounts(segerstolpe) != 0)
n_features_baron <- colSums(logcounts(baron) != 0)
boxplot(
  list(
    xin = n_features_xin, 
    segerstolpe = n_features_segerstolpe, 
    muraro = n_features_muraro, 
    baron = n_features_baron
  )
)

Clearly, baron has far fewer genes detected than the other three datasets. In particular, we note that the difference in number of features matches the kappa value - the smallest difference is for muraro, which also has the smallest number of unassigned cells and the highest kappa. We hypothesize that the discrepancy in the number of features between the projection dataset (baron) and reference datasets (xin, segerstolpe and muraro) is making it challenging for scmap-cluster to find the correct projections. Indeed, it’s also in agreement with experimental protocols used to obtain these datasets: droplet-based (inDrop) protocol for baron dataset versus full-length transcript methods used for xin, segerstolpe and muraro (SMARTer, Smart-Seq2 and CEL-Seq2 correspondingly). Interestingly, the reversed projections of full-length transcript datasets to baron were very good and did not show any technology-based biases (Table S2). An important implication of this result is that it will most likely work well to build a reference with shallowly sequenced cells and then later carry out in-depth studies with deeper sequencing.

4 scmap-cell

4.1 Stochasticity

scmap-cell contains k-means step which makes it stochastic, i.e. running it multiple times will provide slightly different results. Therefore, we will fix a random seed, so that a user will be able to exactly reproduce our results:

set.seed(1)

4.2 Index

In the scmap-cell index is created by a product quantiser algorithm in a way that every cell in the reference is identified with a set of sub-centroids found via k-means clustering based on a subset of the features.

xin <- indexCell(xin)
segerstolpe <- indexCell(segerstolpe)
muraro <- indexCell(muraro)

Unlike scmap-cluster index scmap-cell index contains information about each cell and therefore can not be easily visualised. scmap-cell index consists of two items:

names(metadata(muraro)$scmap_cell_index)

## [1] "subcentroids" "subclusters"

4.2.1 Sub-centroids

subcentroids contains coordinates of subcentroids of low dimensional subspaces defined by selected features, k and M parameters of the product quantiser algorithm (see ?indexCell).

For the muraro dataset:

muraro dataset contains \(N = 2126\) cells
We selected \(f = 500\) features (scmap default)
M was calculated as \(f / 10 = 50\) (scmap default for \(f \le 1000\)). M is the number of low dimensional subspaces
Number of features in any low dimensional subspace equals to \(f / M = 10\)
k was calculated as \(k = \sqrt{N} \approx 46\) (scmap default).

length(metadata(muraro)$scmap_cell_index$subcentroids)

## [1] 50

dim(metadata(muraro)$scmap_cell_index$subcentroids[[1]])

## [1] 10 46

metadata(muraro)$scmap_cell_index$subcentroids[[1]][,1:5]

##                   1           2           3          4          5
## A1CF    0.178557833 0.043573607 0.020780502 0.38818725 0.46565223
## ABCC3   0.007349785 0.402101099 0.019804404 0.00697640 0.01173490
## ABCC8   0.810363545 0.048265403 0.030323987 0.54042966 0.43811744
## ABLIM1  0.057267860 0.263258431 0.134775189 0.52402905 0.54438961
## ACTN1   0.300223851 0.469366223 0.551164399 0.36975894 0.40989502
## ADAM9   0.033574091 0.547897504 0.294440230 0.06375817 0.33677837
## ADCYAP1 0.028559960 0.008874296 0.009282826 0.01638350 0.01351845
## AGT     0.032614509 0.080030710 0.043300057 0.06337012 0.08933909
## AHNAK   0.064549246 0.486013238 0.766480182 0.02510823 0.03763972
## AKAP12  0.459115274 0.039907830 0.012723226 0.36998639 0.06089972

4.2.2 Sub-clusters

For each of the M partitions of the feature space, subclusters contains information about which cluster the cells belong to:

dim(metadata(muraro)$scmap_cell_index$subclusters)

## [1]   50 2126

metadata(muraro)$scmap_cell_index$subclusters[1:5,1:5]

##      D28.1_1 D28.1_13 D28.1_15 D28.1_17 D28.1_2
## [1,]       4       20        5       42      34
## [2,]       5        8       46       17      22
## [3,]      46       13       25       38      33
## [4,]      10       31       45       10      21
## [5,]      22        4        7       10       1

4.3 Projection

Once the scmap-cell indexes have been generated we can use them to project the baron dataset. This can be done with one index at a time, but as before scmap allows for simultaneous projection to multiple indexes if they are provided as a list:

scmapCell_results <- scmapCell(
  baron, 
  list(
    xin = metadata(xin)$scmap_cell_index,
    segerstolpe = metadata(segerstolpe)$scmap_cell_index,
    muraro = metadata(muraro)$scmap_cell_index
  )
)

4.4 Results

scmapCell_results contains results of projection for each reference dataset in a list:

names(scmapCell_results)

## [1] "xin"         "segerstolpe" "muraro"

For each dataset there are two matricies. The cells matrix contains the top 10 (scmap default) cell IDs of the cells of the reference dataset that a given cell of the projection dataset is closest to:

scmapCell_results$xin$cells[,1:3]

##       human1_lib1.final_cell_0001 human1_lib1.final_cell_0002
##  [1,]                          41                         401
##  [2,]                          42                         675
##  [3,]                         671                         321
##  [4,]                          45                          45
##  [5,]                          48                        1004
##  [6,]                         791                        1196
##  [7,]                         757                         973
##  [8,]                         995                         893
##  [9,]                           9                        1046
## [10,]                         893                         995
##       human1_lib1.final_cell_0003
##  [1,]                         352
##  [2,]                          45
##  [3,]                        1046
##  [4,]                        1260
##  [5,]                        1004
##  [6,]                         321
##  [7,]                        1117
##  [8,]                         616
##  [9,]                           9
## [10,]                         609

The similarities matrix contains corresponding cosine similarities:

scmapCell_results$xin$similarities[,1:3]

##       human1_lib1.final_cell_0001 human1_lib1.final_cell_0002
##  [1,]                   0.5831939                   0.5883823
##  [2,]                   0.5816648                   0.5894626
##  [3,]                   0.5754689                   0.5925527
##  [4,]                   0.5895254                   0.5907931
##  [5,]                   0.5832395                   0.5883772
##  [6,]                   0.5815033                   0.5887095
##  [7,]                   0.5777185                   0.5919815
##  [8,]                   0.5775480                   0.5941742
##  [9,]                   0.5761091                   0.5950193
## [10,]                   0.5763782                   0.6007502
##       human1_lib1.final_cell_0003
##  [1,]                   0.4550436
##  [2,]                   0.4635613
##  [3,]                   0.4580886
##  [4,]                   0.4615652
##  [5,]                   0.4582631
##  [6,]                   0.4573644
##  [7,]                   0.4625933
##  [8,]                   0.4612085
##  [9,]                   0.4552231
## [10,]                   0.4599124

4.5 Cluster annotation

If cell cluster annotation is available for the reference datasets, in addition to finding top 10 nearest neighbours scmap-cell can also annotate the cells from the projection dataset using the labels of the reference. It does so by looking at the top 3 nearest neighbours (scmap default) and if they all belong to the same cluster in the reference and their maximum similarity is higher than a threshold (\(0.5\) is the scmap default), then a projection cell is assigned to the corresponding reference cluster:

scmapCell_clusters <- scmapCell2Cluster(
  baron, 
  scmapCell_results, 
  list(
    colData(xin)$cell_type1,
    colData(segerstolpe)$cell_type1,
    colData(muraro)$cell_type1
  )
)

scmap-cell results are in the same format as the ones provided by scmap-cluster (see above):

head(scmapCell_clusters$scmap_cluster_labs)

##      muraro       segerstolpe xin         
## [1,] "unassigned" "acinar"    "beta"      
## [2,] "acinar"     "acinar"    "unassigned"
## [3,] "unassigned" "acinar"    "unassigned"
## [4,] "acinar"     "acinar"    "alpha"     
## [5,] "acinar"     "acinar"    "alpha"     
## [6,] "acinar"     "acinar"    "unassigned"

The corresponding similarities are stored in the scmap_cluster_siml item:

head(scmapCell_clusters$scmap_cluster_siml)

##         muraro segerstolpe       xin
## [1,]        NA   0.6320202 0.5895254
## [2,] 0.7097839   0.6710663        NA
## [3,]        NA   0.5507419        NA
## [4,] 0.6710335   0.6104701 0.5114259
## [5,] 0.6696379   0.6362097 0.5103914
## [6,] 0.6512017   0.5999376        NA

head(scmapCell_clusters$combined_labs)

## [1] "acinar" "acinar" "acinar" "acinar" "acinar" "acinar"

4.6 Visualisation

If the query cells are annotated with cluster names, then we can visualize the projection using Sankey diagrams as before:

plot(
  getSankey(
    colData(baron)$cell_type1, 
    scmapCell_clusters$scmap_cluster_labs[,"xin"],
    plot_height = 400
  )
)

plot(
  getSankey(
    colData(baron)$cell_type1, 
    scmapCell_clusters$combined_labs,
    plot_height = 400
  )
)

4.7 Quantification

Similarly, we can quantify the accuracy of the projection by calculating Cohen’s kappa and fraction of unassigned cells:

kappa2(
  data.frame(
    colData(baron)$cell_type1,
    scmapCell_clusters$scmap_cluster_labs[,"xin"]
  )[scmapCell_clusters$scmap_cluster_labs[,"xin"] != "unassigned", ]
)$value

## [1] 0.6985075

length(which(scmapCell_clusters$scmap_cluster_labs[,"xin"] != "unassigned"))/
  ncol(baron)*100

## [1] 63.78807

kappa2(
  data.frame(
    colData(baron)$cell_type1,
    scmapCell_clusters$scmap_cluster_labs[,"segerstolpe"]
  )[scmapCell_clusters$scmap_cluster_labs[,"segerstolpe"] != "unassigned", ]
)$value

## [1] 0.8886673

length(which(scmapCell_clusters$scmap_cluster_labs[,"segerstolpe"] != "unassigned"))/
  ncol(baron)*100

## [1] 79.37916

kappa2(
  data.frame(
    colData(baron)$cell_type1,
    scmapCell_clusters$scmap_cluster_labs[,"muraro"]
  )[scmapCell_clusters$scmap_cluster_labs[,"muraro"] != "unassigned", ]
)$value

## [1] 0.8983198

length(which(scmapCell_clusters$scmap_cluster_labs[,"muraro"] != "unassigned"))/
  ncol(baron)*100

## [1] 85.82098

kappa2(
  data.frame(
    colData(baron)$cell_type1,
    scmapCell_clusters$combined_labs
  )[scmapCell_clusters$combined_labs != "unassigned", ]
)$value

## [1] 0.8838892

length(which(scmapCell_clusters$combined_labs != "unassigned"))/
  ncol(baron)*100

## [1] 94.50344

4.8 Properties

Comparing the results with the ones for scmap-cluster, we conclude that the kappa values are similar to before, while the fraction of unassigned cells is much lower. Thus, for this example, we conclude that scmap-cell performs better.

5 sessionInfo()

## R version 3.4.2 (2017-09-28)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Sierra 10.12.6
## 
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
## 
## attached base packages:
## [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] irr_0.84                   lpSolve_5.6.13            
##  [3] pheatmap_1.0.8             bindrcpp_0.2              
##  [5] scmap_1.1.2                SingleCellExperiment_1.0.0
##  [7] SummarizedExperiment_1.8.0 DelayedArray_0.4.1        
##  [9] matrixStats_0.52.2         Biobase_2.38.0            
## [11] GenomicRanges_1.30.0       GenomeInfoDb_1.14.0       
## [13] IRanges_2.12.0             S4Vectors_0.16.0          
## [15] BiocGenerics_0.24.0        googleVis_0.6.2           
## [17] BiocStyle_2.5.43          
## 
## loaded via a namespace (and not attached):
##  [1] reshape2_1.4.2          lattice_0.20-35         colorspace_1.3-2       
##  [4] htmltools_0.3.6         yaml_2.1.14             rlang_0.1.2            
##  [7] e1071_1.6-8             glue_1.1.1              RColorBrewer_1.1-2     
## [10] GenomeInfoDbData_0.99.1 bindr_0.1               plyr_1.8.4             
## [13] stringr_1.2.0           zlibbioc_1.24.0         munsell_0.4.3          
## [16] gtable_0.2.0            codetools_0.2-15        evaluate_0.10.1        
## [19] labeling_0.3            knitr_1.17              class_7.3-14           
## [22] Rcpp_0.12.13            backports_1.1.1         scales_0.5.0           
## [25] jsonlite_1.5            XVector_0.18.0          ggplot2_2.2.1          
## [28] digest_0.6.12           stringi_1.1.5           bookdown_0.5           
## [31] dplyr_0.7.4             grid_3.4.2              rprojroot_1.2          
## [34] tools_3.4.2             bitops_1.0-6            magrittr_1.5           
## [37] RCurl_1.95-4.8          lazyeval_0.2.0          proxy_0.4-19           
## [40] tibble_1.3.4            randomForest_4.6-12     pkgconfig_2.0.1        
## [43] Matrix_1.2-11           assertthat_0.2.0        rmarkdown_1.8.3        
## [46] R6_2.2.2                compiler_3.4.2

Reference

What would like to do?

Upload your own dataset

Use existing Reference

Projection

Select an .rds file containing data in SingleCellExperiment format

(example-segerstolpe - a human pancreatic dataset with 3514 cells)

Browse...

Your data is not in SingleCellExperiment format! Please upload a data in the correct format.

rowData slot of your SingleCellExperiment object does not contain a feature_symbol column. Please add it to your object and re-upload it.

Would you like to run scmap-cell? (may take some time)

Yes

Feature Selection

scmap selects 500 most informative features (genes/transcripts) of the Reference dataset by fitting a linear model to the log(expression) vs log(dropout) distribution of features. The most informative features are shown in red on the plot below. Conceptually, these features have a higher dropout rate for a given mean expression. More information about the selected features is provided in the table below the plot. scmap scores are defined as the residuals of the linear model.

Selected Features

Selecting features...

Cell Projection

scmap projects all cells of the Projection dataset to the either cell clusters (scmap-cluster) or each individual cell (scmap-cell) of the Reference dataset.

Projecting cells...

scmap did not work! Most probably Reference and Projection come from different organisms, please check your inputs!

`scmap` tutorial

2017-11-28

Contents

1 Datasets

2 Feature selection

3 scmap-cluster

3.1 Index

3.2 Projection

3.3 Results

3.4 Visualisation

3.5 Quantification

3.6 Properties

4 scmap-cell

4.1 Stochasticity

4.2 Index

4.2.1 Sub-centroids

4.2.2 Sub-clusters

4.3 Projection

4.4 Results

4.5 Cluster annotation

4.6 Visualisation

4.7 Quantification

4.8 Properties

5 sessionInfo()

Reference

Projection

Feature Selection

Selected Features

Cell Projection

scmap tutorial

2017-11-28

Contents

1 Datasets

2 Feature selection

3 scmap-cluster

3.1 Index

3.2 Projection

3.3 Results

3.4 Visualisation

3.5 Quantification

3.6 Properties

4 scmap-cell

4.1 Stochasticity

4.2 Index

4.2.1 Sub-centroids

4.2.2 Sub-clusters

4.3 Projection

4.4 Results

4.5 Cluster annotation

4.6 Visualisation

4.7 Quantification

4.8 Properties

5 sessionInfo()

Reference

Projection

Feature Selection

Selected Features

Cell Projection

`scmap` tutorial