Seurat的单细胞免疫组库分析来了!
使用Seurat進(jìn)行單細(xì)胞VDJ免疫分析
NGS系列文章包括NGS基礎(chǔ)、轉(zhuǎn)錄組分析?(Nature重磅綜述|關(guān)于RNA-seq你想知道的全在這)、ChIP-seq分析?(ChIP-seq基本分析流程)、單細(xì)胞測(cè)序分析?(重磅綜述:三萬字長(zhǎng)文讀懂單細(xì)胞RNA測(cè)序分析的最佳實(shí)踐教程 (原理、代碼和評(píng)述))、DNA甲基化分析、重測(cè)序分析、GEO數(shù)據(jù)挖掘(典型醫(yī)學(xué)設(shè)計(jì)實(shí)驗(yàn)GEO數(shù)據(jù)分析 (step-by-step) - Limma差異分析、火山圖、功能富集)等內(nèi)容。
小劇場(chǎng):
我:老板,你聽說沒有樓上做的單細(xì)胞實(shí)驗(yàn)加了VDJ分析?
老板:VCD分析?哦,,我早就聽過了!
我:老板,是VDJ分析?
老板:VDJ分析,哦對(duì)對(duì)就是VDJ分析,,,我早就說咱們也應(yīng)該做了,你看看,又遲了一步,都怪你不提醒我!
我瞟了一眼她桌子上的淘寶頁(yè)面(廉價(jià)燈芯褲),默默走開了。。。
其實(shí)我在介紹clonotypr (令我驚奇的是,當(dāng)我把推送推出去的當(dāng)天,我親愛的作者就把該包從github撤了下來啊!)時(shí)說明過VDJ免疫分析對(duì)免疫及抗體產(chǎn)生的重要意義,這也是為什么現(xiàn)在許多做新冠單細(xì)胞分析的都會(huì)使用5’端測(cè)序聯(lián)用VDJ測(cè)序分析。
1. 為什么要進(jìn)行單細(xì)胞免疫組庫(kù)的分析?
應(yīng)用方向一:探索腫瘤免疫微環(huán)境,輔助免疫治療。
每個(gè)人都擁有一個(gè)自己的適應(yīng)性免疫組庫(kù),TCR和BCR通過基因重組和體細(xì)胞突變?nèi)〉枚鄻有?#xff0c;使得我們身體可以識(shí)別和抵御各種內(nèi)部和外部的入侵者。而腫瘤的發(fā)生往往躲避了人體T淋巴細(xì)胞而產(chǎn)生、增殖和轉(zhuǎn)移。
使用10X Genomics ChromiumTM Single Cell Immune Profiling Solution可以捕捉腫瘤發(fā)生時(shí)的免疫微環(huán)境變化,尋找免疫治療的靶點(diǎn),從而輔助免疫治療更好地抗擊腫瘤。
應(yīng)用方向二:探索自身免疫性疾病和炎癥性疾病發(fā)生機(jī)制,輔助疫苗的研究
自身免疫性疾病發(fā)生起始和發(fā)展的中心環(huán)節(jié)被認(rèn)為是抗原特異性T細(xì)胞激活導(dǎo)致的,使用10X Genomics ChromiumTM Single Cell Immune Profiling Solution,可以解析自身免疫性疾病的發(fā)病機(jī)制,從而為疾病的診療提供依據(jù)。
應(yīng)用方向三:移植和免疫重建
器官或者骨髓移植時(shí),經(jīng)常會(huì)誘發(fā)宿主的排斥反應(yīng),從而發(fā)生慢性移植抗宿主病。同種異體反應(yīng)隨機(jī)分布在整個(gè)T細(xì)胞組庫(kù)的交叉反應(yīng),因此延遲T細(xì)胞恢復(fù)和限制的T細(xì)胞受體多樣性與異體移植后感染和疾病復(fù)發(fā)風(fēng)險(xiǎn)增加相關(guān)。
而我比較注意的是在疫苗接種前后BCR/TCR CDR3免疫組庫(kù)的分析,最近medRxiv上發(fā)表的有關(guān)新冠的文獻(xiàn)Immune Cell Profiling of COVID-19 Patients in the recovery stage by Single-cell sequencing中對(duì)不同BCR/TCR的VDJ重排進(jìn)行分析,揭示了針對(duì)新冠特異的克隆擴(kuò)增。
2. 免疫組庫(kù)主要包括哪幾個(gè)方面?
T淋巴細(xì)胞(T cell)和B淋巴細(xì)胞(B cell)主要負(fù)責(zé)適應(yīng)性免疫應(yīng)答,其抗原識(shí)別主要依賴于T細(xì)胞受體(T cell recptor, TCR)和B細(xì)胞受體(B cell recptor, BCR),這兩類細(xì)胞表面分子的共同特點(diǎn)是其多樣性,可以識(shí)別多種多樣的抗原分子。BCR的輕鏈和TCRβ鏈由V、D、J、C四個(gè)基因片段組成,BCR的重鏈和TCRα鏈由V、J、C三個(gè)基因片段組成,這些基因片段在遺傳過程中發(fā)生重組、重排,組合成不同的形式,保證了受體多樣性。其中變化最大的就是CDR3區(qū)。
3. 10× Genomics VDJ測(cè)序進(jìn)行cellranger后的輸出形式是什么樣的?
Outputs: - Run summary HTML: /home/jdoe/runs/sample345/outs/web_summary.html - Run summary CSV: /home/jdoe/runs/sample345/outs/metrics_summary.csv - All-contig FASTA: /home/jdoe/runs/sample345/outs/all_contig.fasta - All-contig FASTA index: /home/jdoe/runs/sample345/outs/all_contig.fasta.fai - All-contig FASTQ: /home/jdoe/runs/sample345/outs/all_contig.fastq - Read-contig alignments: /home/jdoe/runs/sample345/outs/all_contig.bam - Read-contig alignment index: /home/jdoe/runs/sample345/outs/all_contig.bam.bai - All contig annotations (JSON): /home/jdoe/runs/sample345/outs/all_contig_annotations.json - All contig annotations (BED): /home/jdoe/runs/sample345/outs/all_contig_annotations.bed - All contig annotations (CSV): /home/jdoe/runs/sample345/outs/all_contig_annotations.csv - Filtered contig sequences FASTA: /home/jdoe/runs/sample345/outs/filtered_contig.fasta - Filtered contig sequences FASTQ: /home/jdoe/runs/sample345/outs/filtered_contig.fastq - Filtered contigs (CSV): /home/jdoe/runs/sample345/outs/filtered_contig_annotations.csv - Clonotype consensus FASTA: /home/jdoe/runs/sample345/outs/consensus.fasta - Clonotype consensus FASTA index: /home/jdoe/runs/sample345/outs/consensus.fasta.fai - Clonotype consensus FASTQ: /home/jdoe/runs/sample345/outs/consensus.fastq - Concatenated reference sequences: /home/jdoe/runs/sample345/outs/concat_ref.fasta - Concatenated reference index: /home/jdoe/runs/sample345/outs/concat_ref.fasta.fai - Contig-consensus alignments: /home/jdoe/runs/sample345/outs/consensus.bam - Contig-consensus alignment index: /home/jdoe/runs/sample345/outs/consensus.bam.bai - Contig-reference alignments: /home/jdoe/runs/sample345/outs/concat_ref.bam - Contig-reference alignment index: /home/jdoe/runs/sample345/outs/concat_ref.bam.bai - Clonotype consensus annotations (JSON): /home/jdoe/runs/sample345/outs/consensus_annotations.json - Clonotype consensus annotations (CSV): /home/jdoe/runs/sample345/outs/consensus_annotations.csv - Clonotype info: /home/jdoe/runs/sample345/outs/clonotypes.csv - Barcodes that are declared to be targeted cells: /home/jdoe/runs/sample345/out/cell_barcodes.json - Loupe V(D)J Browser file: /home/jdoe/runs/sample345/outs/vloupe.vloupe首先我們來看看web.html對(duì)整個(gè)測(cè)序質(zhì)量的評(píng)估:
我們看到在Enrichment中reads映射到VDJ基因的比例為80.7%,其中TRA/TRB代表TCR α/β鏈 ,map到TRA的比例為24.4%,map到TRB的比例為56%。當(dāng)然,后面也會(huì)有蠻多指標(biāo)的,比如VDJ注釋,VDJ質(zhì)量及表達(dá)等。。。
當(dāng)然我們也會(huì)有很多表格,其中最重要的表格為contig_annotation和clonotype:
contig_annotation(BCR示例)
上面表格中的IGH和IGK/IGL代表BCR H和BCR L鏈 。看到這個(gè)表格,我第一反應(yīng)其實(shí)是為什么D區(qū)基因(d_gene)多數(shù)均為None,主要原因還是D區(qū)通常較短又突變較多,因技術(shù)限制而常常捕捉不到。數(shù)據(jù)中也提供了CDR3的蛋白序列和核苷酸序列。
clonotype(TCR示例)
從以上數(shù)據(jù)可以看出,有部分克隆是由單鏈決定的。
那么如何將VDJ的克隆表型和scRNA-seq結(jié)合起來呢?其實(shí)大佬已經(jīng)回答了這個(gè)問題:
add_clonotype <- function(tcr_location, seurat_obj){tcr <- read.csv(paste(tcr_folder,"filtered_contig_annotations.csv", sep=""))# Remove the -1 at the end of each barcode.# Subsets so only the first line of each barcode is kept,# as each entry for given barcode will have same clonotype.tcr$barcode <- gsub("-1", "", tcr$barcode)tcr <- tcr[!duplicated(tcr$barcode), ]# Only keep the barcode and clonotype columns.# We'll get additional clonotype info from the clonotype table.tcr <- tcr[,c("barcode", "raw_clonotype_id")]names(tcr)[names(tcr) == "raw_clonotype_id"] <- "clonotype_id"# Clonotype-centric info.clono <- read.csv(paste(tcr_folder,"clonotypes.csv", sep=""))# Slap the AA sequences onto our original table by clonotype_id.tcr <- merge(tcr, clono[, c("clonotype_id", "cdr3s_aa")])# Reorder so barcodes are first column and set them as rownames.tcr <- tcr[, c(2,1,3)]rownames(tcr) <- tcr[,1]tcr[,1] <- NULL# Add to the Seurat object's metadata.clono_seurat <- AddMetaData(object=seurat_obj, metadata=tcr)return(clono_seurat)}怎么用呢?舉個(gè)栗子吧:
數(shù)據(jù)下載
download.file("https://bioshare.bioinformatics.ucdavis.edu/bioshare/download/iimg5mz77whzzqc/vdj_v1_mm_balbc_pbmc.zip", "vdj_v1_mm_balbc_pbmc.zip")#這是小鼠的PBMC數(shù)據(jù)加載R包
library(Seurat) library(cowplot)加載數(shù)據(jù)
## Cellranger balbc_pbmc <- Read10X_h5("vdj_v1_mm_balbc_pbmc/vdj_v1_mm_balbc_pbmc_5gex_filtered_feature_bc_matrix.h5")s_balbc_pbmc <- CreateSeuratObject(counts = balbc_pbmc, min.cells = 3, min.features = 200, project = "cellranger")提取線粒體基因
s_balbc_pbmc$percent.mito <- PercentageFeatureSet(s_balbc_pbmc, pattern = "^mt-")增加T和B細(xì)胞的克隆信息
add_clonotype <- function(tcr_prefix, seurat_obj, type="t"){tcr <- read.csv(paste(tcr_prefix,"filtered_contig_annotations.csv", sep=""))# Remove the -1 at the end of each barcode.(注意,此步驟如果標(biāo)記使用不同的barcode,比如多了個(gè)-1,可以使用 tcr$barcode <- gsub("-1", "", tcr$barcode)進(jìn)行提取)# Subsets so only the first line of each barcode is kept,# as each entry for given barcode will have same clonotype.tcr <- tcr[!duplicated(tcr$barcode), ]# Only keep the barcode and clonotype columns.# We'll get additional clonotype info from the clonotype table.tcr <- tcr[,c("barcode", "raw_clonotype_id")]names(tcr)[names(tcr) == "raw_clonotype_id"] <- "clonotype_id"# Clonotype-centric info.clono <- read.csv(paste(tcr_prefix,"clonotypes.csv", sep=""))# Slap the AA sequences onto our original table by clonotype_id.tcr <- merge(tcr, clono[, c("clonotype_id", "cdr3s_aa")])names(tcr)[names(tcr) == "cdr3s_aa"] <- "cdr3s_aa"# Reorder so barcodes are first column and set them as rownames.tcr <- tcr[, c(2,1,3)]rownames(tcr) <- tcr[,1]tcr[,1] <- NULLcolnames(tcr) <- paste(type, colnames(tcr), sep="_")# Add to the Seurat object's metadata.clono_seurat <- AddMetaData(object=seurat_obj, metadata=tcr)return(clono_seurat) }s_balbc_pbmc <- add_clonotype("vdj_v1_mm_balbc_pbmc/vdj_v1_mm_balbc_pbmc_t_", s_balbc_pbmc, "t") s_balbc_pbmc <- add_clonotype("vdj_v1_mm_balbc_pbmc/vdj_v1_mm_balbc_pbmc_b_", s_balbc_pbmc, "b") head(s_balbc_pbmc[[]])我給解釋一下以上function中每一步都在干什么:
首先讀入contig_annotations.csv,并賦給tcr;
去除tcr中重復(fù)的barcode,即如果具有相同的barcode,將以第一次出現(xiàn)的barcode為主來去重;
將tcr中的barcode,raw_clonotype_id賦值于tcr;
讀入clonotypes.csv,并賦給clono;
將tcr和colono進(jìn)行merge(單細(xì)胞分析Seurat使用相關(guān)的10個(gè)問題答疑精選!),并賦給tcr;
將最后得到的,帶有barcode,raw_clonotype_id和colono的tcr對(duì)象以metadata的形式加入seurat object中。
發(fā)現(xiàn)很多的NA,非常正常啊,不是每個(gè)細(xì)胞都是T/B cell,然后還列出來了T/B CDR3的蛋白序列。
table(!is.na(s_balbc_pbmc$t_clonotype_id),!is.na(s_balbc_pbmc$b_clonotype_id))s_balbc_pbmc <- subset(s_balbc_pbmc, cells = colnames(s_balbc_pbmc)[!(!is.na(s_balbc_pbmc$t_clonotype_id) & !is.na(s_balbc_pbmc$b_clonotype_id))]) #刪除同時(shí)表達(dá)T、B克隆表型的細(xì)胞 s_balbc_pbmc進(jìn)行常規(guī)workflow
s_balbc_pbmc <- subset(s_balbc_pbmc, percent.mito <= 10)s_balbc_pbmc <- subset(s_balbc_pbmc, nCount_RNA >= 500 & nCount_RNA <= 40000) s_balbc_pbmc <- NormalizeData(s_balbc_pbmc, normalization.method = "LogNormalize", scale.factor = 10000) s_balbc_pbmc <- FindVariableFeatures(s_balbc_pbmc, selection.method = "vst", nfeatures = 2000)all.genes <- rownames(s_balbc_pbmc) s_balbc_pbmc <- ScaleData(s_balbc_pbmc, features = all.genes) s_balbc_pbmc <- RunPCA(s_balbc_pbmc, features = VariableFeatures(object = s_balbc_pbmc))use.pcs = 1:30 s_balbc_pbmc <- FindNeighbors(s_balbc_pbmc, dims = use.pcs) s_balbc_pbmc <- FindClusters(s_balbc_pbmc, resolution = c(0.5)) s_balbc_pbmc <- RunUMAP(s_balbc_pbmc, dims = use.pcs) DimPlot(s_balbc_pbmc, reduction = "umap", label = TRUE)讓我們看看T細(xì)胞的marker的表達(dá)情況:
t_cell_markers <- c("Cd3d","Cd3e") FeaturePlot(s_balbc_pbmc, features = t_cell_markers)比如我知道某個(gè)b_cdr3s_aa我感覺特有意思,想在UMAP 圖上進(jìn)行表示,于是我先把他的蛋白序列找了出來,IGH:CARWGGYGYDGGYFDYW;IGK:CGQSYSYPYTF,然后:
DimPlot(s_balbc_pbmc, cells.highlight = Cells(subset(s_balbc_pbmc, subset = b_cdr3s_aa + == "IGH:CARWGGYGYDGGYFDYW;IGK:CGQSYSYPYTF")))萬“綠”叢中一點(diǎn)紅!
當(dāng)然你也可以在metadata中納入更多的TCR/BCR中的有關(guān)信息,比如我們把annotation.csv中的chains也納入進(jìn)來的話,就是這個(gè)樣子滴:
p2 <-DimPlot(s_balbc_pbmc,group.by = "t_chain") p2參考來源
https://github.com/ucdavis-bioinformatics-training/2020-Advanced_Single_Cell_RNA_Seq/blob/master/data_analysis/VDJ_Analysis_fixed.md
你可能還想看
NBT:單細(xì)胞轉(zhuǎn)錄組新降維可視化方法PHATE
復(fù)現(xiàn)nature communication PCA原圖|代碼分析(一)
這篇Nature子刊文章的蛋白組學(xué)數(shù)據(jù)PCA分析竟花費(fèi)了我兩天時(shí)間來重現(xiàn)|附全過程代碼
一個(gè)R包玩轉(zhuǎn)單細(xì)胞免疫組庫(kù)分析,還能與Seurat無縫對(duì)接
往期精品(點(diǎn)擊圖片直達(dá)文字對(duì)應(yīng)教程)
后臺(tái)回復(fù)“生信寶典福利第一波”或點(diǎn)擊閱讀原文獲取教程合集
總結(jié)
以上是生活随笔為你收集整理的Seurat的单细胞免疫组库分析来了!的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 那个一年发四篇Cell的研究生,后来怎么
- 下一篇: R变量索引 - 什么时候使用 @或$