翻车实录之Nature Medicine新冠单细胞文献|附全代码
前言
NGS系列文章包括NGS基礎、轉錄組分析?(Nature重磅綜述|關于RNA-seq你想知道的全在這)、ChIP-seq分析?(ChIP-seq基本分析流程)、單細胞測序分析?(重磅綜述:三萬字長文讀懂單細胞RNA測序分析的最佳實踐教程 (原理、代碼和評述))、DNA甲基化分析、重測序分析、GEO數據挖掘(典型醫學設計實驗GEO數據分析 (step-by-step) - Limma差異分析、火山圖、功能富集)等內容。
大家好!我們又見面了,嘻嘻,今天我們來復現一篇于2020年5月12日深圳第三人民醫院發表于nature medicine的新冠單細胞文獻,這篇文章不久前我其實解讀過,那時還在預印本medrxiv上,題為The landscape of lung bronchoalveolar immune cells in COVID-19 revealed by single-cell RNA sequencing,而發表在NM上的題目略有區別,Single-cell landscape of bronchoalveolar immune cells in patients with COVID-19。詳細的解讀分析請看第一篇新冠單細胞文獻!|解讀 ,廢話不多說,開始吧!
下載數據
GEO datasets:GSE145926(http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE145926),點擊下圖標紅處進行下載:
此時我們在作者原文中驚鴻一瞥,發現作者使用了13個sample,而在下載數據中只有12個,于是我們在metadata中發現作者使用了一個公共數據集:GSM3660650(https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3660650),同樣對下面3個文件進行下載:
加載R包
library(Seurat) library(cowplot) library(Matrix) library(dplyr) library(ggplot2) library(reshape2)加載數據
setwd("/DATA01/home/zhanghu/20200513covNATURE MEDICINE/GSE145926_RAW") library(Seurat)#在下載文件中作者提供的均為.h5文件 C141<-Read10X_h5("GSM4339769_C141_filtered_feature_bc_matrix.h5") C142<-Read10X_h5("GSM4339770_C142_filtered_feature_bc_matrix.h5") C143<-Read10X_h5("GSM4339771_C143_filtered_feature_bc_matrix.h5") C144<-Read10X_h5("GSM4339772_C144_filtered_feature_bc_matrix.h5") C145<-Read10X_h5("GSM4339773_C145_filtered_feature_bc_matrix.h5") C146<-Read10X_h5("GSM4339774_C146_filtered_feature_bc_matrix.h5") C148<-Read10X_h5("GSM4475051_C148_filtered_feature_bc_matrix.h5") C149<-Read10X_h5("GSM4475052_C149_filtered_feature_bc_matrix.h5") C152<-Read10X_h5("GSM4475053_C152_filtered_feature_bc_matrix.h5") C51<- Read10X_h5("GSM4475048_C51_filtered_feature_bc_matrix.h5") C52<- Read10X_h5("GSM4475049_C52_filtered_feature_bc_matrix.h5") C100<- Read10X_h5("GSM4475050_C100_filtered_feature_bc_matrix.h5") GSM3660650<- Read10X(data.dir = '/DATA01/home/zhanghu/20200513covNATURE MEDICINE/GSE145926_RAW/GSM3660650')構建seurat對象
C141<-CreateSeuratObject(counts = C141, project = "C141",min.cells = 3, min.features = 200) C142<-CreateSeuratObject(counts = C142, project = "C142",min.cells = 3, min.features = 200) C143<-CreateSeuratObject(counts = C143, project = "C143",min.cells = 3, min.features = 200) C144<-CreateSeuratObject(counts = C144, project = "C144",min.cells = 3, min.features = 200) C145<-CreateSeuratObject(counts = C145, project = "C145",min.cells = 3, min.features = 200) C146<-CreateSeuratObject(counts = C146, project = "C146",min.cells = 3, min.features = 200) C148<-CreateSeuratObject(counts = C148, project = "C148",min.cells = 3, min.features = 200) C149<-CreateSeuratObject(counts = C149, project = "C149",min.cells = 3, min.features = 200) C152<-CreateSeuratObject(counts = C152, project = "C152",min.cells = 3, min.features = 200) C51<-CreateSeuratObject(counts = C51, project = "C51",min.cells = 3, min.features = 200) C52<-CreateSeuratObject(counts = C52, project = "C52",min.cells = 3, min.features = 200) C100<-CreateSeuratObject(counts = C100, project = "C100",min.cells = 3, min.features = 200) GSM3660650<-CreateSeuratObject(counts = GSM3660650, project = "GSM3660650",min.cells = 3, min.features = 200)分組
根據作者提供的meta.txt(https://github.com/zhangzlab/covid_balf/blob/master/meta.txt)對數據進行分組,一共4個健康(HC),3個輕癥(M),6個重癥(S)樣本。
C141$group<-"mild" C142$group<-"mild" C143$group<-"severe" C144$group<-"mild" C145$group<-"severe" C146$group<-"severe" C148$group<-"severe" C149$group<-"severe" C152$group<-"severe" C51$group <- "healthy" C52$group <- "healthy" C100$group <- "healthy" GSM3660650$group <- "healthy"查看細胞分布
#我們首先進行線粒體基因比例的提取,然后使用小提琴圖對基因,轉錄本及線粒體比例進行展示 C141[["percent.mt"]]<-PercentageFeatureSet(C141,pattern = "^MT") p141<-VlnPlot(C141, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3) C142[["percent.mt"]]<-PercentageFeatureSet(C142,pattern = "^MT") p142<-VlnPlot(C142, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3) C143[["percent.mt"]]<-PercentageFeatureSet(C143,pattern = "^MT") p143<-VlnPlot(C143, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3) C144[["percent.mt"]]<-PercentageFeatureSet(C144,pattern = "^MT") p144<-VlnPlot(C144, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3) C145[["percent.mt"]]<-PercentageFeatureSet(C145,pattern = "^MT") p145<-VlnPlot(C145, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3) C146[["percent.mt"]]<-PercentageFeatureSet(C146,pattern = "^MT") p146<-VlnPlot(C146, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3) C148[["percent.mt"]]<-PercentageFeatureSet(C148,pattern = "^MT") p148<-VlnPlot(C148, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3) C149[["percent.mt"]]<-PercentageFeatureSet(C149,pattern = "^MT") p149<-VlnPlot(C149, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3) C152[["percent.mt"]]<-PercentageFeatureSet(C152,pattern = "^MT") p152<-VlnPlot(C152, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3) C51[["percent.mt"]]<-PercentageFeatureSet(C51,pattern = "^MT") p51<-VlnPlot(C51, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3) C52[["percent.mt"]]<-PercentageFeatureSet(C52,pattern = "^MT") p52<-VlnPlot(C52, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3) C100[["percent.mt"]]<-PercentageFeatureSet(C100,pattern = "^MT") p100<-VlnPlot(C100, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3) GSM3660650[["percent.mt"]]<-PercentageFeatureSet(GSM3660650,pattern = "^MT") pGSM3660650<-VlnPlot(GSM3660650, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3)此時我們會出現無數張這樣的圖(可視化之為什么要使用箱線圖?),我不去解釋里面的意義了,我只談一下第一張圖中發現有兩個細胞聚集“肚子”,我也其實出現過這樣的分布圖,于是有人說這可能是污染,上面細胞是下面細胞基因數的兩倍,我其實大概率認為不太會出現這樣的情況,原因(1)細胞數太多了,這建庫質量得差到一定程度呢;(2)轉錄本并沒有像基因數那樣呈現雙峰;(3)可以試一試去除doublet的工具(單細胞預測Doublets軟件包匯總-過渡態細胞是真的嗎?)啊,僅僅通過這個圖下結論不太合適;(4)該樣本可能是刺激樣本,部分細胞受到刺激后的反應。
QC
我們按照文章中所給出的條件對細胞和基因進行篩選,原文:The following criteria were then applied to each cell of all nine patients and four healthy controls: gene number between 200 and 6,000, UMI count > 1,000 and mitochondrial gene percentage < 0.1.
C141 <- subset(C141, subset = nFeature_RNA > 200 & nFeature_RNA < 6000 & percent.mt < 10 & nCount_RNA >1000) C142 <- subset(C142, subset = nFeature_RNA > 200 & nFeature_RNA < 6000 & percent.mt < 10 & nCount_RNA >1000) C143 <- subset(C143, subset = nFeature_RNA > 200 & nFeature_RNA < 6000 & percent.mt < 10 & nCount_RNA >1000) C144 <- subset(C144, subset = nFeature_RNA > 200 & nFeature_RNA < 6000 & percent.mt < 10 & nCount_RNA >1000) C145 <- subset(C145, subset = nFeature_RNA > 200 & nFeature_RNA < 6000 & percent.mt < 10 & nCount_RNA >1000) C146 <- subset(C146, subset = nFeature_RNA > 200 & nFeature_RNA < 6000 & percent.mt < 10 & nCount_RNA >1000) C148 <- subset(C148, subset = nFeature_RNA > 200 & nFeature_RNA < 6000 & percent.mt < 10 & nCount_RNA >1000) C149 <- subset(C149, subset = nFeature_RNA > 200 & nFeature_RNA < 6000 & percent.mt < 10 & nCount_RNA >1000) C152 <- subset(C152, subset = nFeature_RNA > 200 & nFeature_RNA < 6000 & percent.mt < 10 & nCount_RNA >1000) C51 <- subset(C51, subset = nFeature_RNA > 200 & nFeature_RNA < 6000 & percent.mt < 10 & nCount_RNA >1000) C52 <- subset(C52, subset = nFeature_RNA > 200 & nFeature_RNA < 6000 & percent.mt < 10 & nCount_RNA >1000) C100 <- subset(C100, subset = nFeature_RNA > 200 & nFeature_RNA < 6000 & percent.mt < 10 & nCount_RNA >1000) GSM3660650 <- subset(GSM3660650, subset = nFeature_RNA > 200 & nFeature_RNA < 6000 & percent.mt < 10 & nCount_RNA >1000)歸一化
C141 <- NormalizeData(C141) C142 <- NormalizeData(C142) C143 <- NormalizeData(C143) C144 <- NormalizeData(C144) C145 <- NormalizeData(C145) C146 <- NormalizeData(C146) C148 <- NormalizeData(C148) C149 <- NormalizeData(C149) C152 <- NormalizeData(C152) C51 <- NormalizeData(C51) C52 <- NormalizeData(C52) C100 <- NormalizeData(C100) GSM3660650<- NormalizeData(GSM3660650)計算高變基因
C141 <- FindVariableFeatures(C141, selection.method = "vst", nfeatures = 2000) C142 <- FindVariableFeatures(C142, selection.method = "vst", nfeatures = 2000) C143 <- FindVariableFeatures(C143, selection.method = "vst", nfeatures = 2000) C144 <- FindVariableFeatures(C144, selection.method = "vst", nfeatures = 2000) C145 <- FindVariableFeatures(C145, selection.method = "vst", nfeatures = 2000) C146 <- FindVariableFeatures(C146, selection.method = "vst", nfeatures = 2000) C148 <- FindVariableFeatures(C148, selection.method = "vst", nfeatures = 2000) C149 <- FindVariableFeatures(C149, selection.method = "vst", nfeatures = 2000) C152 <- FindVariableFeatures(C152, selection.method = "vst", nfeatures = 2000) C51 <- FindVariableFeatures(C51, selection.method = "vst", nfeatures = 2000) C52 <- FindVariableFeatures(C52, selection.method = "vst", nfeatures = 2000) C100 <- FindVariableFeatures(C100, selection.method = "vst", nfeatures = 2000) GSM3660650 <- FindVariableFeatures(GSM3660650, selection.method = "vst", nfeatures = 2000)數據整合
nCoV <- FindIntegrationAnchors(object.list = nCoV.list, dims = 1:50) nCoV.integrated <- IntegrateData(anchorset = nCoV, dims = 1:50,features.to.integrate = rownames(nCoV))由于數據量較大,該步驟會需要大量時間,建議去買根冰棍,溜達一圈或者打局王者榮耀。。。我的安琪拉!
大約2個小時后。。。。。。
加載樣本信息
其中作者將樣本信息記錄在https://github.com/zhangzlab/covid_balf/blob/master/meta.txt上:
samples = read.delim2("meta.txt",header = TRUE, stringsAsFactors = FALSE,check.names = FALSE, sep = "\t") nCoV.integrated=sample.combined sample_info = as.data.frame(colnames(nCoV.integrated)) colnames(sample_info) = c('ID') rownames(sample_info) = sample_info$ID sample_info$sample = nCoV.integrated@meta.data$orig.ident sample_info = dplyr::left_join(sample_info,samples) rownames(sample_info) = sample_info$ID nCoV.integrated = AddMetaData(object = nCoV.integrated, metadata = sample_info)對整合數據進行歸一化和標準化
原文中在數據整合后的處理:The filtered gene-barcode matrix was first normalized using ‘LogNormalize’ methods in Seurat v.3 with default parameters. The top 2,000 variable genes were then identified using the ‘vst’ method in Seurat FindVariableFeatures function. Variables ‘nCount_RNA’ and ‘percent.mito’ were regressed out in the scaling step and PCA was performed using the top 2,000 variable genes. 按照以上說明進行操作。
DefaultAssay(nCoV.integrated) <- "RNA" nCoV.integrated[['percent.mito']] <- PercentageFeatureSet(nCoV.integrated, pattern = "^MT-") nCoV.integrated <- NormalizeData(object = nCoV.integrated, normalization.method = "LogNormalize", scale.factor = 1e4) nCoV.integrated <- FindVariableFeatures(object = nCoV.integrated, selection.method = "vst", nfeatures = 2000,verbose = FALSE) nCoV.integrated <- ScaleData(nCoV.integrated, verbose = FALSE, vars.to.regress = c("nCount_RNA", "percent.mito")) VlnPlot(object = nCoV.integrated, features = c("nFeature_RNA", "nCount_RNA"), ncol = 2)我們其實也可以看出不同樣本之間的差異性還是挺大的,比如C100,C51,C52以及GSMXXX都是正常人,但我們也同時發現他們的基因和轉錄本表達的集中程度也是不相同的。當然C144細胞數較少顯而易見。
FeatureScatter(object = nCoV.integrated, feature1 = "nCount_RNA", feature2 = "nFeature_RNA")相關性很好哦。。。。
PCA
nCoV.integrated <- ScaleData(nCoV.integrated, verbose = FALSE, vars.to.regress = c("nCount_RNA", "percent.mito")) nCoV.integrated <- RunPCA(nCoV.integrated, verbose = FALSE,npcs = 100) nCoV.integrated <- ProjectDim(object = nCoV.integrated) ElbowPlot(object = nCoV.integrated,ndims = 100)聚類
###cluster nCoV.integrated <- FindNeighbors(object = nCoV.integrated, dims = 1:50) nCoV.integrated <- FindClusters(object = nCoV.integrated, resolution = 1.2)可視化
nCoV.integrated <- RunTSNE(object = nCoV.integrated, dims = 1:50) DimPlot(object = nCoV.integrated, reduction = 'tsne',label = TRUE)我們可以看到一共分為了32個cluster,命名為cluster0-31。使用umap進行展示:
All seems like rather fine , but,此時我們來看原圖:
原圖fig1
還是有一點點區別的,可能是隨機種子什么的問題,anyway,分群確實也是蠻像的。
查找高變標記基因
DefaultAssay(nCoV.integrated) <- "RNA" # find markers for every cluster compared to all remaining cells, report only the positive ones nCoV.integrated@misc$markers <- FindAllMarkers(object = nCoV.integrated, assay = 'RNA',only.pos = TRUE, test.use = 'MAST') write.table(nCoV.integrated@misc$markers,file='marker_MAST.txt',row.names = FALSE,quote = FALSE,sep = '\t') VlnPlot(object = nCoV.integrated, features = c("nFeature_RNA", "nCount_RNA"))我們使用小提琴圖看每個cluster中基因和轉錄本的分布,一般情況我們其實啥也看不出來的,但上圖我們可以明顯看出cluster 7在基因水平和轉錄本水平均較高,他是什么,他為什么這樣呢,他是doublet嗎?
不同組之間的對比
由于作者先期已經使用每個cluster中的高變基因鑒定出了每種細胞類型,我們這里直接使用marker進行分析。
markers = c('AGER','SFTPC','SCGB3A2','TPPP3','KRT5','CD68','FCN1','CD1C','TPSB2','CD14','MARCO','CXCR2','CLEC9A','IL3RA','CD3D','CD8A','KLRF1','CD79A','IGHG4','MS4A1','VWF','DCN','FCGR3A','TREM2','KRT18') hc.markers=nCoV.integrated@misc$markers hc.markers %>% group_by(cluster) %>% top_n(n = 30, wt = avg_logFC) -> top30 var.genes = c(nCoV.integrated@assays$RNA@var.features,top30$gene,markers) nCoV.integrated <- ScaleData(nCoV.integrated, verbose = FALSE, vars.to.regress = c("nCount_RNA", "percent.mito"),features = var.genes)其實作者使用的以上操作也是我第一次見,我們看作者干了什么?首先作者通過高變基因把cluster中的marker挑了出來,然后top30高變基因,然后把var.features和他們結合在一起作為var.genes,共同進行標準化,我們來使用?ScaleData查看一下features的涵義:
也就是說對features進行標準化唄。
DimPlot(object = nCoV.integrated, reduction = 'umap',label = TRUE)DimPlot(object = nCoV.integrated, reduction = 'umap',label = FALSE, group.by = 'sample_new')從上圖我們其實可以看出,各個樣本之間的重合度還是蠻高的,也可以看出某些cluster在某些樣本中的比例較高。
DimPlot(object = nCoV.integrated, reduction = 'umap',label = TRUE, split.by = 'sample_new', ncol = 4)DimPlot(object = nCoV.integrated, reduction = 'umap',label = FALSE, group.by = 'group')DimPlot(object = nCoV.integrated, reduction = 'umap',label = TRUE, split.by = 'group', ncol = 3)DimPlot(object = nCoV.integrated, reduction = 'umap',label = FALSE, group.by = 'disease')DimPlot(object = nCoV.integrated, reduction = 'umap',label = TRUE, split.by = 'disease', ncol = 2)細胞分型
markers = c('TPPP3','KRT18','CD68','FCGR3B','CD1C','CLEC9A','LILRA4','TPSB2','CD3D','KLRD1','MS4A1','IGHG4')很熟悉吧!我們首先來看一下這些marker所對應的細胞類型,根據作者在Extended Data Fig.1中的描述,發現TPP3是Ciliated,KRT18是Secretory,CD68是Macrophages,FCGR3B是Neutrophil,CD1C是mDC,LILRA4是pDC,TPSB2是Mast cell,CD3D是T cell,KLRD1是NK cell,MS4A1是B cell,IGHG4是Plasma cell,在doublets中表達CD68和CD3D。
pp_temp = FeaturePlot(object = nCoV.integrated, features = markers,cols = c("lightgrey","#ff0000"),combine = FALSE)plots <- lapply(X = pp_temp, FUN = function(p) p + theme(axis.title = element_text(size = 18),axis.text = element_text(size = 18),plot.title = element_text(family = 'sans',face='italic',size=20),legend.text = element_text(size = 20),legend.key.height = unit(1.8,"line"),legend.key.width = unit(1.2,"line") ))pp = CombinePlots(plots = plots,ncol = 4,legend = 'right')哈哈哈哈,以上可以看出這些marker的特異性還是蠻好的。
我們來看一下原圖吧:
Extended Data Fig.1
繪制點圖:
pp = DotPlot(nCoV.integrated, features = rev(markers),cols = c('white','#F8766D'),dot.scale =5) + RotatedAxis()pp = pp + theme(axis.text.x = element_text(size = 12),axis.text.y = element_text(size = 12)) + labs(x='',y='') + guides(color = guide_colorbar(title = 'Scale expression'),size = guide_legend(title = 'Percent expressed')) + theme(axis.line = element_line(size = 0.6))其實我復現的和作者的分群是略有區別的,但也不影響分析,anyway,我們仿照作者進行分群,在Epithelial中TPPP3-Ciliated(13,25),KRT18(15),Myeloid中CD68-Macrophages(0、1、2、3、4、5、7、8、9、11、12、16、17、20、21、23、28),FCGR3B-Neutrophil(19),CD1C-mDC(22),LILRA4-pDC(29),TPSB2-Mast cell(31),NK&T中CD3D-T cell(6、10、30),KLRD1-NK cell(18),B中MS4A1-B cell(26),IGHG4-Plasma cell(24,27),others中CD68&CD3D-Doublets(14)。OK,我們來看一下原圖吧:
Extended Data Fig.1
繪制比例圖
library(ggplot2) nCoV.integrated[["cluster"]] <- Idents(object = nCoV.integrated) big.cluster = nCoV.integrated@meta.data organ.summary = table(big.cluster$sample_new,big.cluster$cluster) write.table(organ.summary,file = '1-nCoV-percentage-sample.txt',quote = FALSE,sep = '\t') library(ggplot2) library(dplyr) library(reshape2) organ.summary = read.delim2("1-nCoV-percentage-sample.txt",header = TRUE, stringsAsFactors = FALSE,check.names = FALSE, sep = "\t") organ.summary$group = rownames(organ.summary) organ.summary.dataframe = melt(organ.summary) colnames(organ.summary.dataframe) = c('group','cluster','cell') samples_name_new = c('HC1','HC2','HC3','HC4','M1','M2','M3','S1','S2','S3','S4','S5','S6') organ.summary.dataframe$group = factor(organ.summary.dataframe$group,labels = samples_name_new,levels = samples_name_new) organ.summary.dataframe$cell = as.numeric(organ.summary.dataframe$cell)作者上面的這頓操作真是猛如虎。。。
準備完數據后,準備畫圖:
new_order = c('0','1','2','3','4','5','7','8','9','11','12','16','17','20','21','23','28','19','24','27','13','25','15','6','10','30','18','22','26','29','31','14')organ.summary.dataframe$cluster = factor(organ.summary.dataframe$cluster,levels = new_order,labels = new_order)cols = c('#32b8ec','#60c3f0','#8ccdf1','#cae5f7','#92519c','#b878b0','#d7b1d2','#e7262a','#e94746','#eb666d','#ee838f','#f4abac','#fad9d9')pp = ggplot(data=organ.summary.dataframe, aes(x=cluster, y=cell, fill=group)) + geom_bar(stat="identity",width = 0.6,position=position_fill(reverse = TRUE),size = 0.3,colour = '#222222') + labs(x='',y='Fraction of sample per cluster (%)') +theme(axis.title.x = element_blank(),axis.text = element_text(size = 16),axis.title.y =element_text(size = 16), legend.text = element_text(size = 15),legend.key.height = unit(5,'mm'),legend.title = element_blank(),panel.grid.minor = element_blank()) + cowplot::theme_cowplot() + theme(axis.text.x = element_text(size = 10)) +scale_fill_manual(values = cols)細胞命名
nCoV.integrated1 <- RenameIdents(object = nCoV.integrated,'13' = 'Epithelial','15' = 'Epithelial','25' = 'Epithelial', '0'='Macrophages','1'='Macrophages','2'='Macrophages','3'='Macrophages','4'='Macrophages','5'='Macrophages','7'='Macrophages','8'='Macrophages','9'='Macrophages','11'='Macrophages','12'='Macrophages','16'='Macrophages','17'='Macrophages','20'='Macrophages','21'='Macrophages','23'='Macrophages','28'='Macrophages','31'='Mast','6'='T','10'='T','30'='T','18'='NK','24'='Plasma','27'='Plasma','26'='B','19'='Neutrophil','22'='mDC','29'='pDC','14'='Doublets')#去除doubletsnCoV.integrated1$celltype = Idents(nCoV.integrated1)nCoV.integrated1 = subset(nCoV.integrated1,subset = celltype != 'Doublets')nCoV_groups = c('Epithelial','Macrophages','Neutrophil','mDC','pDC','Mast','T','NK','B','Plasma')nCoV.integrated1$celltype = factor(nCoV.integrated1$celltype,levels = nCoV_groups,labels = nCoV_groups)Idents(nCoV.integrated1) = nCoV.integrated1$celltype細胞命名后的dimplot:
pp_temp = DimPlot(object = nCoV.integrated1, reduction = 'umap',label = FALSE, label.size = 6,split.by = 'group', ncol = 3,repel = TRUE,combine = TRUE)pp_temp = pp_temp + theme(axis.title = element_text(size = 17),axis.text = element_text(size = 17),strip.text = element_text(family = 'arial',face='plain',size=17), legend.text = element_text(size = 17),axis.line = element_line(size = 1),axis.ticks = element_line(size = 0.8),legend.key.height = unit(1.4,"line"))Extended Data Fig.1(原圖)
按照sample進行分類:
pp_temp = DimPlot(object = nCoV.integrated1, reduction = 'umap',label = FALSE, split.by = 'sample_new', label.size = 7,ncol = 5,repel = TRUE, combine = TRUE,pt.size = 1.5)pp_temp = pp_temp + theme(axis.title = element_text(size = 22),axis.text = element_text(size = 22),strip.text = element_text(family = 'sans',face='plain',size=22),legend.text = element_text(size = 22),legend.key.height = unit(1.8,"line"),legend.key.width = unit(1.2,"line"),axis.line = element_line(size = 1.2),axis.ticks = element_line(size = 1.2))Extended Data Fig.1(原圖)
等等!!!!!!
此時我發現了一個我的重大失誤!就是我的細胞分群錯了。。。。。。哭出聲來!!!
通過featureplot并結合作者原文其實可以看出cluster9是doublets,cluster14是T細胞。。。。
于是應該是這個樣子的:
剩下的我就不重來一遍的,我太難了。。。。
up to now,我基本已經把作者的細胞大群復現完了,你也可以使用作者的github上的上傳代碼進行復現,不過前期我確實沒有看懂,所以就是按照自己結合原文的意思進行復現的。。。
不行了我要去玩耍了,這篇文獻沒有幾頁,但我估計已經看了N遍了。。。。
參考文獻
Liao M, Liu Y, Yuan J, Wen Y, Xu G, Zhao J, Cheng L, Li J, Wang X, Wang F, Liu L, Amit I, Zhang S, Zhang Z. Single-cell landscape of bronchoalveolar immune cells in patients with COVID-19. Nat Med. 2020 May 12. doi:10.1038Code
https://github.com/zhangzlab/covid_balf你可能還想看
讓你的單細胞數據動起來!|iCellR(一)
讓你的單細胞數據動起來!|iCellR(二)
復現nature communication PCA原圖|代碼分析(一)
這篇Nature子刊文章的蛋白組學數據PCA分析竟花費了我兩天時間來重現|附全過程代碼
單細胞分析Seurat使用相關的10個問題答疑精選!
往期精品(點擊圖片直達文字對應教程)
后臺回復“生信寶典福利第一波”或點擊閱讀原文獲取教程合集
總結
以上是生活随笔為你收集整理的翻车实录之Nature Medicine新冠单细胞文献|附全代码的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 复现原文(二):Single-cell
- 下一篇: 关注 | 新冠病毒这次的突变毒株太可怕,