當(dāng)前位置：首頁(yè) > 编程资源 > 编程问答 >内容正文

编程问答

Seurat的单细胞免疫组库分析来了！

發(fā)布時(shí)間：2025/3/15 编程问答 12 豆豆

生活随笔收集整理的這篇文章主要介紹了 Seurat的单细胞免疫组库分析来了！小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

使用Seurat進(jìn)行單細(xì)胞VDJ免疫分析

NGS系列文章包括NGS基礎(chǔ)、轉(zhuǎn)錄組分析?（Nature重磅綜述|關(guān)于RNA-seq你想知道的全在這）、ChIP-seq分析?（ChIP-seq基本分析流程）、單細(xì)胞測(cè)序分析?(重磅綜述：三萬字長(zhǎng)文讀懂單細(xì)胞RNA測(cè)序分析的最佳實(shí)踐教程（原理、代碼和評(píng)述）)、DNA甲基化分析、重測(cè)序分析、GEO數(shù)據(jù)挖掘（典型醫(yī)學(xué)設(shè)計(jì)實(shí)驗(yàn)GEO數(shù)據(jù)分析 (step-by-step) - Limma差異分析、火山圖、功能富集）等內(nèi)容。

小劇場(chǎng)：

我：老板，你聽說沒有樓上做的單細(xì)胞實(shí)驗(yàn)加了VDJ分析？
老板：VCD分析？哦，，我早就聽過了！
我：老板，是VDJ分析？
老板：VDJ分析，哦對(duì)對(duì)就是VDJ分析，，，我早就說咱們也應(yīng)該做了，你看看，又遲了一步，都怪你不提醒我！
我瞟了一眼她桌子上的淘寶頁(yè)面（廉價(jià)燈芯褲），默默走開了。。。

其實(shí)我在介紹clonotypr (令我驚奇的是，當(dāng)我把推送推出去的當(dāng)天，我親愛的作者就把該包從github撤了下來啊！)時(shí)說明過VDJ免疫分析對(duì)免疫及抗體產(chǎn)生的重要意義，這也是為什么現(xiàn)在許多做新冠單細(xì)胞分析的都會(huì)使用5’端測(cè)序聯(lián)用VDJ測(cè)序分析。

1. 為什么要進(jìn)行單細(xì)胞免疫組庫(kù)的分析？

應(yīng)用方向一：探索腫瘤免疫微環(huán)境，輔助免疫治療。

每個(gè)人都擁有一個(gè)自己的適應(yīng)性免疫組庫(kù)，TCR和BCR通過基因重組和體細(xì)胞突變?nèi)〉枚鄻有?#xff0c;使得我們身體可以識(shí)別和抵御各種內(nèi)部和外部的入侵者。而腫瘤的發(fā)生往往躲避了人體T淋巴細(xì)胞而產(chǎn)生、增殖和轉(zhuǎn)移。

使用10X Genomics ChromiumTM Single Cell Immune Profiling Solution可以捕捉腫瘤發(fā)生時(shí)的免疫微環(huán)境變化，尋找免疫治療的靶點(diǎn)，從而輔助免疫治療更好地抗擊腫瘤。

應(yīng)用方向二：探索自身免疫性疾病和炎癥性疾病發(fā)生機(jī)制，輔助疫苗的研究

自身免疫性疾病發(fā)生起始和發(fā)展的中心環(huán)節(jié)被認(rèn)為是抗原特異性T細(xì)胞激活導(dǎo)致的，使用10X Genomics ChromiumTM Single Cell Immune Profiling Solution，可以解析自身免疫性疾病的發(fā)病機(jī)制，從而為疾病的診療提供依據(jù)。

應(yīng)用方向三：移植和免疫重建

器官或者骨髓移植時(shí)，經(jīng)常會(huì)誘發(fā)宿主的排斥反應(yīng)，從而發(fā)生慢性移植抗宿主病。同種異體反應(yīng)隨機(jī)分布在整個(gè)T細(xì)胞組庫(kù)的交叉反應(yīng)，因此延遲T細(xì)胞恢復(fù)和限制的T細(xì)胞受體多樣性與異體移植后感染和疾病復(fù)發(fā)風(fēng)險(xiǎn)增加相關(guān)。

而我比較注意的是在疫苗接種前后BCR/TCR CDR3免疫組庫(kù)的分析，最近medRxiv上發(fā)表的有關(guān)新冠的文獻(xiàn)Immune Cell Profiling of COVID-19 Patients in the recovery stage by Single-cell sequencing中對(duì)不同BCR/TCR的VDJ重排進(jìn)行分析，揭示了針對(duì)新冠特異的克隆擴(kuò)增。

2. 免疫組庫(kù)主要包括哪幾個(gè)方面？

T淋巴細(xì)胞（T cell）和B淋巴細(xì)胞（B cell）主要負(fù)責(zé)適應(yīng)性免疫應(yīng)答，其抗原識(shí)別主要依賴于T細(xì)胞受體（T cell recptor, TCR）和B細(xì)胞受體（B cell recptor, BCR），這兩類細(xì)胞表面分子的共同特點(diǎn)是其多樣性，可以識(shí)別多種多樣的抗原分子。BCR的輕鏈和TCRβ鏈由V、D、J、C四個(gè)基因片段組成，BCR的重鏈和TCRα鏈由V、J、C三個(gè)基因片段組成，這些基因片段在遺傳過程中發(fā)生重組、重排，組合成不同的形式，保證了受體多樣性。其中變化最大的就是CDR3區(qū)。

3. 10× Genomics VDJ測(cè)序進(jìn)行cellranger后的輸出形式是什么樣的？

Outputs: - Run summary HTML: /home/jdoe/runs/sample345/outs/web_summary.html - Run summary CSV: /home/jdoe/runs/sample345/outs/metrics_summary.csv - All-contig FASTA: /home/jdoe/runs/sample345/outs/all_contig.fasta - All-contig FASTA index: /home/jdoe/runs/sample345/outs/all_contig.fasta.fai - All-contig FASTQ: /home/jdoe/runs/sample345/outs/all_contig.fastq - Read-contig alignments: /home/jdoe/runs/sample345/outs/all_contig.bam - Read-contig alignment index: /home/jdoe/runs/sample345/outs/all_contig.bam.bai - All contig annotations (JSON): /home/jdoe/runs/sample345/outs/all_contig_annotations.json - All contig annotations (BED): /home/jdoe/runs/sample345/outs/all_contig_annotations.bed - All contig annotations (CSV): /home/jdoe/runs/sample345/outs/all_contig_annotations.csv - Filtered contig sequences FASTA: /home/jdoe/runs/sample345/outs/filtered_contig.fasta - Filtered contig sequences FASTQ: /home/jdoe/runs/sample345/outs/filtered_contig.fastq - Filtered contigs (CSV): /home/jdoe/runs/sample345/outs/filtered_contig_annotations.csv - Clonotype consensus FASTA: /home/jdoe/runs/sample345/outs/consensus.fasta - Clonotype consensus FASTA index: /home/jdoe/runs/sample345/outs/consensus.fasta.fai - Clonotype consensus FASTQ: /home/jdoe/runs/sample345/outs/consensus.fastq - Concatenated reference sequences: /home/jdoe/runs/sample345/outs/concat_ref.fasta - Concatenated reference index: /home/jdoe/runs/sample345/outs/concat_ref.fasta.fai - Contig-consensus alignments: /home/jdoe/runs/sample345/outs/consensus.bam - Contig-consensus alignment index: /home/jdoe/runs/sample345/outs/consensus.bam.bai - Contig-reference alignments: /home/jdoe/runs/sample345/outs/concat_ref.bam - Contig-reference alignment index: /home/jdoe/runs/sample345/outs/concat_ref.bam.bai - Clonotype consensus annotations (JSON): /home/jdoe/runs/sample345/outs/consensus_annotations.json - Clonotype consensus annotations (CSV): /home/jdoe/runs/sample345/outs/consensus_annotations.csv - Clonotype info: /home/jdoe/runs/sample345/outs/clonotypes.csv - Barcodes that are declared to be targeted cells: /home/jdoe/runs/sample345/out/cell_barcodes.json - Loupe V(D)J Browser file: /home/jdoe/runs/sample345/outs/vloupe.vloupe

首先我們來看看web.html對(duì)整個(gè)測(cè)序質(zhì)量的評(píng)估：

我們看到在Enrichment中reads映射到VDJ基因的比例為80.7%，其中TRA/TRB代表TCR α/β鏈，map到TRA的比例為24.4%，map到TRB的比例為56%。當(dāng)然，后面也會(huì)有蠻多指標(biāo)的，比如VDJ注釋，VDJ質(zhì)量及表達(dá)等。。。

當(dāng)然我們也會(huì)有很多表格，其中最重要的表格為contig_annotation和clonotype：

contig_annotation(BCR示例)

上面表格中的IGH和IGK/IGL代表BCR H和BCR L鏈。看到這個(gè)表格，我第一反應(yīng)其實(shí)是為什么D區(qū)基因（d_gene）多數(shù)均為None，主要原因還是D區(qū)通常較短又突變較多，因技術(shù)限制而常常捕捉不到。數(shù)據(jù)中也提供了CDR3的蛋白序列和核苷酸序列。

clonotype(TCR示例)

從以上數(shù)據(jù)可以看出，有部分克隆是由單鏈決定的。

那么如何將VDJ的克隆表型和scRNA-seq結(jié)合起來呢？其實(shí)大佬已經(jīng)回答了這個(gè)問題：

add_clonotype <- function(tcr_location, seurat_obj){tcr <- read.csv(paste(tcr_folder,"filtered_contig_annotations.csv", sep=""))# Remove the -1 at the end of each barcode.# Subsets so only the first line of each barcode is kept,# as each entry for given barcode will have same clonotype.tcr$barcode <- gsub("-1", "", tcr$barcode)tcr <- tcr[!duplicated(tcr$barcode), ]# Only keep the barcode and clonotype columns.# We'll get additional clonotype info from the clonotype table.tcr <- tcr[,c("barcode", "raw_clonotype_id")]names(tcr)[names(tcr) == "raw_clonotype_id"] <- "clonotype_id"# Clonotype-centric info.clono <- read.csv(paste(tcr_folder,"clonotypes.csv", sep=""))# Slap the AA sequences onto our original table by clonotype_id.tcr <- merge(tcr, clono[, c("clonotype_id", "cdr3s_aa")])# Reorder so barcodes are first column and set them as rownames.tcr <- tcr[, c(2,1,3)]rownames(tcr) <- tcr[,1]tcr[,1] <- NULL# Add to the Seurat object's metadata.clono_seurat <- AddMetaData(object=seurat_obj, metadata=tcr)return(clono_seurat)}

怎么用呢？舉個(gè)栗子吧：

數(shù)據(jù)下載

download.file("https://bioshare.bioinformatics.ucdavis.edu/bioshare/download/iimg5mz77whzzqc/vdj_v1_mm_balbc_pbmc.zip", "vdj_v1_mm_balbc_pbmc.zip")#這是小鼠的PBMC數(shù)據(jù)

加載R包

library(Seurat) library(cowplot)

加載數(shù)據(jù)

## Cellranger balbc_pbmc <- Read10X_h5("vdj_v1_mm_balbc_pbmc/vdj_v1_mm_balbc_pbmc_5gex_filtered_feature_bc_matrix.h5")s_balbc_pbmc <- CreateSeuratObject(counts = balbc_pbmc, min.cells = 3, min.features = 200, project = "cellranger")

提取線粒體基因

s_balbc_pbmc$percent.mito <- PercentageFeatureSet(s_balbc_pbmc, pattern = "^mt-")

增加T和B細(xì)胞的克隆信息

add_clonotype <- function(tcr_prefix, seurat_obj, type="t"){tcr <- read.csv(paste(tcr_prefix,"filtered_contig_annotations.csv", sep=""))# Remove the -1 at the end of each barcode.（注意，此步驟如果標(biāo)記使用不同的barcode，比如多了個(gè)-1,可以使用 tcr$barcode <- gsub("-1", "", tcr$barcode)進(jìn)行提取）# Subsets so only the first line of each barcode is kept,# as each entry for given barcode will have same clonotype.tcr <- tcr[!duplicated(tcr$barcode), ]# Only keep the barcode and clonotype columns.# We'll get additional clonotype info from the clonotype table.tcr <- tcr[,c("barcode", "raw_clonotype_id")]names(tcr)[names(tcr) == "raw_clonotype_id"] <- "clonotype_id"# Clonotype-centric info.clono <- read.csv(paste(tcr_prefix,"clonotypes.csv", sep=""))# Slap the AA sequences onto our original table by clonotype_id.tcr <- merge(tcr, clono[, c("clonotype_id", "cdr3s_aa")])names(tcr)[names(tcr) == "cdr3s_aa"] <- "cdr3s_aa"# Reorder so barcodes are first column and set them as rownames.tcr <- tcr[, c(2,1,3)]rownames(tcr) <- tcr[,1]tcr[,1] <- NULLcolnames(tcr) <- paste(type, colnames(tcr), sep="_")# Add to the Seurat object's metadata.clono_seurat <- AddMetaData(object=seurat_obj, metadata=tcr)return(clono_seurat) }s_balbc_pbmc <- add_clonotype("vdj_v1_mm_balbc_pbmc/vdj_v1_mm_balbc_pbmc_t_", s_balbc_pbmc, "t") s_balbc_pbmc <- add_clonotype("vdj_v1_mm_balbc_pbmc/vdj_v1_mm_balbc_pbmc_b_", s_balbc_pbmc, "b") head(s_balbc_pbmc[[]])

我給解釋一下以上function中每一步都在干什么：

首先讀入contig_annotations.csv，并賦給tcr；
去除tcr中重復(fù)的barcode，即如果具有相同的barcode，將以第一次出現(xiàn)的barcode為主來去重；
將tcr中的barcode，raw_clonotype_id賦值于tcr；
讀入clonotypes.csv，并賦給clono；
將tcr和colono進(jìn)行merge（單細(xì)胞分析Seurat使用相關(guān)的10個(gè)問題答疑精選！），并賦給tcr；
將最后得到的，帶有barcode，raw_clonotype_id和colono的tcr對(duì)象以metadata的形式加入seurat object中。

發(fā)現(xiàn)很多的NA，非常正常啊，不是每個(gè)細(xì)胞都是T/B cell，然后還列出來了T/B CDR3的蛋白序列。

table(!is.na(s_balbc_pbmc$t_clonotype_id),!is.na(s_balbc_pbmc$b_clonotype_id))

s_balbc_pbmc <- subset(s_balbc_pbmc, cells = colnames(s_balbc_pbmc)[!(!is.na(s_balbc_pbmc$t_clonotype_id) & !is.na(s_balbc_pbmc$b_clonotype_id))]) #刪除同時(shí)表達(dá)T、B克隆表型的細(xì)胞 s_balbc_pbmc

進(jìn)行常規(guī)workflow

s_balbc_pbmc <- subset(s_balbc_pbmc, percent.mito <= 10)s_balbc_pbmc <- subset(s_balbc_pbmc, nCount_RNA >= 500 & nCount_RNA <= 40000) s_balbc_pbmc <- NormalizeData(s_balbc_pbmc, normalization.method = "LogNormalize", scale.factor = 10000) s_balbc_pbmc <- FindVariableFeatures(s_balbc_pbmc, selection.method = "vst", nfeatures = 2000)all.genes <- rownames(s_balbc_pbmc) s_balbc_pbmc <- ScaleData(s_balbc_pbmc, features = all.genes) s_balbc_pbmc <- RunPCA(s_balbc_pbmc, features = VariableFeatures(object = s_balbc_pbmc))

use.pcs = 1:30 s_balbc_pbmc <- FindNeighbors(s_balbc_pbmc, dims = use.pcs) s_balbc_pbmc <- FindClusters(s_balbc_pbmc, resolution = c(0.5)) s_balbc_pbmc <- RunUMAP(s_balbc_pbmc, dims = use.pcs) DimPlot(s_balbc_pbmc, reduction = "umap", label = TRUE)

讓我們看看T細(xì)胞的marker的表達(dá)情況：

t_cell_markers <- c("Cd3d","Cd3e") FeaturePlot(s_balbc_pbmc, features = t_cell_markers)

比如我知道某個(gè)b_cdr3s_aa我感覺特有意思，想在UMAP 圖上進(jìn)行表示，于是我先把他的蛋白序列找了出來，IGH:CARWGGYGYDGGYFDYW;IGK:CGQSYSYPYTF，然后：

DimPlot(s_balbc_pbmc, cells.highlight = Cells(subset(s_balbc_pbmc, subset = b_cdr3s_aa + == "IGH:CARWGGYGYDGGYFDYW;IGK:CGQSYSYPYTF")))

萬“綠”叢中一點(diǎn)紅！

當(dāng)然你也可以在metadata中納入更多的TCR/BCR中的有關(guān)信息，比如我們把annotation.csv中的chains也納入進(jìn)來的話，就是這個(gè)樣子滴：

p2 <-DimPlot(s_balbc_pbmc,group.by = "t_chain") p2

參考來源

https://github.com/ucdavis-bioinformatics-training/2020-Advanced_Single_Cell_RNA_Seq/blob/master/data_analysis/VDJ_Analysis_fixed.md

你可能還想看

NBT：單細(xì)胞轉(zhuǎn)錄組新降維可視化方法PHATE
復(fù)現(xiàn)nature communication PCA原圖|代碼分析（一）
這篇Nature子刊文章的蛋白組學(xué)數(shù)據(jù)PCA分析竟花費(fèi)了我兩天時(shí)間來重現(xiàn)|附全過程代碼
一個(gè)R包玩轉(zhuǎn)單細(xì)胞免疫組庫(kù)分析，還能與Seurat無縫對(duì)接

往期精品(點(diǎn)擊圖片直達(dá)文字對(duì)應(yīng)教程)

后臺(tái)回復(fù)“生信寶典福利第一波”或點(diǎn)擊閱讀原文獲取教程合集

總結(jié)

以上是生活随笔為你收集整理的Seurat的单细胞免疫组库分析来了！的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇：那个一年发四篇Cell的研究生，后来怎么
下一篇： R变量索引 - 什么时候使用 @或$