复现原文(一):Single-cell RNA sequencing of human kidney(step by step)
https://www.nature.com/articles/s41597-019-0351-8
NGS系列文章包括NGS基礎、轉錄組分析?(Nature重磅綜述|關于RNA-seq你想知道的全在這)、ChIP-seq分析?(ChIP-seq基本分析流程)、單細胞測序分析?(重磅綜述:三萬字長文讀懂單細胞RNA測序分析的最佳實踐教程 (原理、代碼和評述))、DNA甲基化分析、重測序分析、GEO數據挖掘(典型醫學設計實驗GEO數據分析 (step-by-step) - Limma差異分析、火山圖、功能富集)等內容。
背景介紹
腎臟是具有許多不同功能的高度復雜的器官,由幾個功能和解剖上不連續的部分組成。腎小球和腎小管是腎單位的重要組成部分。足細胞與腎小球內皮細胞一起合成了腎小球基底膜,它是最終的過濾屏障,防止蛋白質損失到尿液中。頂葉上皮細胞(Parietal epithelial cells,PECs)是另一種常見的腎小球細胞類型,可能導致腎小球硬化、新月和假新月形成。近端小管(proximal tubule,PT)通過控制Na+ - H+和HCO3-的轉運在調節全身酸堿平衡中起著重要作用,而遠曲小管則更多地參與電解質的轉運。在先前的研究中,研究人員對腎臟不同組成部分進行了bulk RNA測序(RNA-seq最強綜述名詞解釋&思維導圖|關于RNA-seq,你想知道的都在這(續)),為理解不同片段的轉錄組提供參考。然而,bulk RNA測序不能反映單細胞水平的轉錄組,只能反映總體平均RNA表達(自從用了這個神器,大規模RNA-seq數據挖掘我也可以)。
正常人腎臟的全面細胞解剖結構對于解決腎臟疾病和腎癌的細胞起源至關重要。一些腎臟疾病可能是細胞類型特異性的,尤其是腎小管細胞。為了研究人腎臟的分類和轉錄組信息,作者迅速獲得了腎臟的單細胞懸液并進行了單細胞RNA測序(scRNA-seq)(重磅綜述:三萬字長文讀懂單細胞RNA測序分析的最佳實踐教程 (原理、代碼和評述))。作者介紹了來自三個人類供體腎臟的23,366個高質量細胞的scRNA-seq數據,并將正常人腎細胞劃分為10個clusters。其中,近端腎小管(PT)細胞被分為三個亞型,而集合導管細胞被分為兩個亞型。總體而言,該數據為腎細胞生物學和腎臟疾病的研究提供了可靠的參考。
下面我們按照作者的分析思路復現該文章的部分內容:
首先,從GSE131685中下載數據:
里面的文件名要分別改為“barcodes.tsv”、“genes.tsv”和“matrix.mtx”,在Read10X(Hemberg-lab單細胞轉錄組數據分析(七)- 導入10X和SmartSeq2數據Tabula Muris)時才不會報錯。。。
Kidney data loading
library(devtools) install_github("immunogenomics/harmony") library(Seurat) library(magrittr) library(harmony) library(dplyr)#Kidney data loading 并構建seurat objectK1.data <- Read10X(data.dir = "/Users/zhanghu1992/Documents/GSE131685_RAW/kidney1/") K1 <- CreateSeuratObject(counts = K1.data, project = "kidney1", min.cells = 8, min.features = 200) K2.data <- Read10X(data.dir = "/Users/zhanghu1992/Documents/GSE131685_RAW/kidney2/") K2 <- CreateSeuratObject(counts = K2.data, project = "kidney2", min.cells = 6, min.features = 200) K3.data <- Read10X(data.dir = "/Users/zhanghu1992/Documents/GSE131685_RAW/kidney3/") K3 <- CreateSeuratObject(counts = K3.data, project = "kidney3", min.cells = 10, min.features = 200) kid <- merge(x = K1, y = list(K2, K3)) #讀取文件并用merge函數進行合并插一句嘴,我們來看一下華盛頓大學PhD jared.andrews對merge函數的解釋:
注意老鐵說的“Seurat’s integration method is quite heavy handed in my experience,so if you decide to go the integration route,I’d recommend?using the SeuratWrapper around the fastMNN”(單細胞分析Seurat使用相關的10個問題答疑精選!)
QC
# quality control kid[["percent.mt"]] <- PercentageFeatureSet(kid, pattern = "^MT-") #提取有關線粒體的基因 VlnPlot(kid, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3) #由圖可以看出分布還可以 plot1 <- FeatureScatter(kid, feature1 = "nCount_RNA", feature2 = "percent.mt") plot2 <- FeatureScatter(kid, feature1 = "nCount_RNA", feature2 = "nFeature_RNA") CombinePlots(plots = list(plot1, plot2)) kid <- subset(kid, subset = nFeature_RNA > 200 & nFeature_RNA < 2500 & percent.mt < 30) #篩選條件 kid <- NormalizeData(kid, normalization.method = "LogNormalize", scale.factor = 10000) kid <- NormalizeData(kid) #標準化 kid <- FindVariableFeatures(kid, selection.method = "vst", nfeatures = 2000) #查找高變基因 top10 <- head(VariableFeatures(kid), 10) plot1 <- VariableFeaturePlot(kid) plot2 <- LabelPoints(plot = plot1, points = top10, repel = TRUE) CombinePlots(plots = list(plot1, plot2)) # 計算細胞周期 s.genes <-cc.genes$s.genes g2m.genes<-cc.genes$g2m.genes kid <- CellCycleScoring(kid, s.features = s.genes, g2m.features = g2m.genes, set.ident = TRUE) all.genes <- rownames(kid) kid <- ScaleData(kid, vars.to.regress = c("S.Score", "G2M.Score"), features = all.genes)這里我想叨叨幾句,據我看到的文獻,多數是在進行降維后將細胞周期方面對分群的影響作為一個單獨模塊去敘述,作者在先期不管細胞周期對聚類是否有影響的情況下就對細胞周期相關基因進行去除也是比較明智的,因為作者并不想讓該因素混雜其中影響分群(如何火眼金睛鑒定那些單細胞轉錄組中的混雜因素)。
#當然我們還是要看是否細胞周期真的有影響,感興趣的小伙伴可以看一下,確實是有一定影響的!#kid <- ScaleData(kid, features = rownames(kid)) #kid <- RunPCA(kid , features = c(s.genes, g2m.genes)) #DimPlot(kid)利用harmony算法去除批次效應并細胞分類
#Eliminate batch effects with harmony and cell classification kid <- RunPCA(kid, pc.genes = kid@var.genes, npcs = 20, verbose = FALSE) options(repr.plot.height = 2.5, repr.plot.width = 6) kid <- kid %>%RunHarmony("orig.ident", plot_convergence = TRUE) #等候時間較長,請溜達溜達吧 harmony_embeddings <- Embeddings(kid, 'harmony') harmony_embeddings[1:5, 1:5] kid <- kid %>%RunUMAP(reduction = "harmony", dims = 1:20) %>%FindNeighbors(reduction = "harmony", dims = 1:20) %>%FindClusters(resolution = 0.25) %>%identity() new.cluster.ids <- c(0,1, 2, 3, 4, 5, 6, 7,8,9,10) names(new.cluster.ids) <- levels(kid) kid <- RenameIdents(kid, new.cluster.ids)“harmony”整合不同平臺的單細胞數據之旅
鑒定Marker基因
#Calculating differentially expressed genes (DEGs) and Save rds file kid.markers <- FindAllMarkers(kid, only.pos = TRUE, min.pct = 0.25, thresh.use = 0.25)#尋找高變基因 write.table(kid.markers,sep="\t",file="/home/yuzhenyuan/Seurat/0.2_20.xls") saveRDS(kid,file="/home/yuzhenyuan/kid/har/0.25_20.rds")- 單細胞分群后,怎么找到Marker基因定義每一類群?
可視化
#Some visual figure generation DimPlot(kid, reduction = "umap", group.by = "orig.ident", pt.size = .1, split.by = 'orig.ident') DimPlot(kid, reduction = "umap", group.by = "Phase", pt.size = .1) #按照細胞周期進行劃分 DimPlot(kid, reduction = "umap", label = TRUE, pt.size = .1) #注意作者在用同樣參數設置后分為10個clusters,其實無關緊要,都需要通過marker重新貼現。根據作者提供的marker對細胞亞群進行貼現,如下圖所示:
其實部分marker并不是特異性marker,所以在進行區分的時候一定要好好甄別。
與以下原文圖基本相同,個人感覺tSNE是不是也有什么隨機種子的東東,感覺總會略有不同:
DoHeatmap(kid, features = c("SLC13A3","SLC34A1","GPX3","DCXR","SLC17A3","SLC22A8","SLC22A7","GNLY","NKG7","CD3D","CD3E","LYZ","CD14","KRT8","KRT18","CD24","VCAM1","UMOD","DEFB1","CLDN8","AQP2","CD79A","CD79B","ATP6V1G3","ATP6V0D2","TMEM213")) # 繪制部分基因熱圖R語言 - 熱圖美化
VlnPlot(kid, pt.size =0, idents= c(1,2,3), features = c("GPX3", "DCXR","SLC13A3","SLC34A1","SLC22A8","SLC22A7")) VlnPlot(kid, idents= c(8,10), features = c("AQP2", "ATP6V1B1","ATP6V0D2","ATP6V1G3"))-
R語言 - 箱線圖(小提琴圖、抖動圖、區域散點圖)
-
可視化之為什么要使用箱線圖?
tSNE plot
##tSNE Plot kid <-RunTSNE(kid, reduction = "harmony", dims = 1:20) TSNEPlot(kid, do.label = T, label = TRUE, do.return = T, pt.size = 1) TSNEPlot(kid, do.return = T, pt.size = 1, group.by = "orig.ident", split.by = 'orig.ident') TSNEPlot(kid, do.return = T, pt.size = 1, group.by = "Phase")- Hemberg-lab單細胞轉錄組數據分析(十二)- Scater單細胞表達譜tSNE可視化
與前面的圖是相同的。
提取PT cells進行后續分析
#Select a subset of PT cells(近端小管) PT <- SubsetData(kid, ident.use = c(0,1,2), subset.raw = T)總結
以上是生活随笔為你收集整理的复现原文(一):Single-cell RNA sequencing of human kidney(step by step)的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: AnimalTFDB 3.0 | 动物转
- 下一篇: 感到压力时,你秃的是头,而TA秃的是屁股