Cibersort免疫浸润的在线分析及R语言代码实现
上期展示了ESITMATE(基于轉(zhuǎn)錄組數(shù)據(jù))計(jì)算免疫得分和腫瘤純度的一個例子,詳見ggplot2實(shí)現(xiàn)分半小提琴圖繪制基因表達(dá)譜和免疫得分。實(shí)際上計(jì)算腫瘤純度的方法還有InfiniumPurify(基于甲基化數(shù)據(jù))、ABSOLUTE(基于體細(xì)胞拷貝數(shù)變異)、PurityEst(基于突變數(shù)據(jù))等等,而計(jì)算免疫浸潤的有Cibersort、ssGSEA、TIMER等算法。
本期我們介紹一下簡單易上手的Cibersort算法。CIBERSORTx(https://cibersortx.stanford.edu/)是Newman等人開發(fā)的一種分析工具,可基于基因表達(dá)數(shù)據(jù)估算免疫細(xì)胞的豐度:
本文主要從以下兩個模塊進(jìn)行介紹:
一. Cibersort算法的純代碼實(shí)現(xiàn)及結(jié)果繪圖展示;
二. Cibersort算法的網(wǎng)頁版實(shí)現(xiàn)。
在模塊一繪圖部分,筆者還根據(jù)一篇文獻(xiàn),將得到的免疫細(xì)胞分為4大類:Total lymphocytes,Total dendritic cell,Total macrophage及Total mast cell。
沿用前幾期的示例數(shù)據(jù),本期使用的還是TCGA頭頸癌的43對癌和癌旁FPKM數(shù)據(jù)。參考文獻(xiàn)《Profiling tumor infiltrating immune cells with CIBERSORT》,可知Cibersort對于輸入數(shù)據(jù)的要求如下:
值得注意的是,雖然文獻(xiàn)建議輸入數(shù)據(jù)要求是non-log linear space,但是檢查Cibersort可看到有一段函數(shù)if(max(Y) < 50) {Y <- 2^Y}(見模塊一的源代碼),專門用于檢測輸入數(shù)據(jù)是否為log化后的輸入數(shù)據(jù),如果是(max < 50),則會自動還原為log前的數(shù)據(jù)。因此,用戶不必糾結(jié)于log與否。
總結(jié)來說,輸入數(shù)據(jù)是芯片數(shù)據(jù)還是RNA-seq數(shù)據(jù)都可以,RNA-seq數(shù)據(jù)不建議直接使用Count,應(yīng)使用FPKM和TPM或DESEq2標(biāo)準(zhǔn)化后的矩陣為宜。
但輸入數(shù)據(jù)的基因名字需要為Gene symbol
注意不能有重復(fù)的基因名和缺失/負(fù)值數(shù)據(jù),其他具體的一些細(xì)節(jié),還是需要用戶去瀏覽上面那篇文獻(xiàn)。
如何處理重復(fù)的基因名,如何得到TCGA的FPKM數(shù)據(jù),可以參考前幾期的推文,利用R代碼從UCSC XENA下載mRNA, lncRNA, miRNA表達(dá)數(shù)據(jù)并匹配臨床信息。
模塊一. Cibersort算法的純代碼實(shí)現(xiàn)及結(jié)果繪圖展示
remove(list = ls()) #一鍵清空 #加載包 library(ggplot2) library(reshape2) library(ggpubr) library(dplyr)1. Cibersort計(jì)算免疫細(xì)胞
加載cibersort的官方提供的源碼,指定基準(zhǔn)數(shù)據(jù)庫文件 (LM22.txt,這是22種免疫細(xì)胞的marker基因,下載自Cibersort官網(wǎng))。
source('./assist/Cibersort.R')# 設(shè)置分析依賴的基礎(chǔ)表達(dá)文件 # 每類免疫細(xì)胞的標(biāo)志性基因及其表達(dá) # 基因名字為Gene symbol LM22.file <- "./database/LM22.txt"加載自己的數(shù)據(jù)用于分析計(jì)算免疫細(xì)胞
# 1. CibersortTCGA_exp.file <- "./Rawdata/TCGA_HNSC_mRNA_fpkm_paired_43vs43.txt"TCGA_TME.results <- CIBERSORT(LM22.file ,TCGA_exp.file, perm = 50, QN = F) # perm置換次數(shù)=1000 # QN如果是芯片設(shè)置為T,如果是測序就設(shè)置為Fwrite.csv(TCGA_TME.results, "./Output/TCGA_CIBERSORT_Results_fromRcode.csv")Permutations for significance analysis是用來計(jì)算單個樣本估算免疫浸潤的p值,大多數(shù)文章會采用1000次。數(shù)值越大,運(yùn)行時間越久,這里筆者為了加速運(yùn)行的速度,選擇了50次 (筆記本運(yùn)行耗時5分鐘)。
2. 分組信息
## 2. 分組信息# TCGA的數(shù)據(jù)還可以從名字獲取 # group_list <- ifelse(as.numeric(substring(rownames(TCGA_TME.results),14,15)) < 10, # "Tumor","Normal") %>% # factor(.,levels = c("Normal","Tumor"))phenotype = read.csv("./Rawdata/TCGA_HNSC_paired_metadata.csv",header = T,row.names = 1)group_list <- phenotype$group %>% factor(.,levels = c("Nontumor","Tumor"))table(group_list) # Normal 43 Tumor 43 ## group_list ## Nontumor Tumor ## 43 433. 繪圖
3.1 數(shù)據(jù)轉(zhuǎn)換預(yù)處理,取前22列,忽略掉后面計(jì)算出的P-value,Correlation, RMSE單列信息。
## 3. 繪圖# 3.1 數(shù)據(jù)粗處理 TME_data <- as.data.frame(TCGA_TME.results[,1:22])TME_data$group <- group_list TME_data$sample <- row.names(TME_data)# 2.2 融合數(shù)據(jù) TME_New = melt(TME_data)## Using group, sample as id variablescolnames(TME_New)=c("Group","Sample","Celltype","Composition") #設(shè)置行名 head(TME_New)## Group Sample Celltype Composition ## 1 Tumor TCGA.CV.6943.01 B cells naive 0.007651678 ## 2 Tumor TCGA.CV.6959.01 B cells naive 0.019549031 ## 3 Nontumor TCGA.CV.7438.11 B cells naive 0.025349204 ## 4 Nontumor TCGA.CV.7242.11 B cells naive 0.032583659 ## 5 Tumor TCGA.CV.7432.01 B cells naive 0.000000000 ## 6 Nontumor TCGA.CV.6939.11 B cells naive 0.0742822933.2 按免疫細(xì)胞占比中位數(shù)排序繪圖(可選)
# 3.3 按免疫細(xì)胞占比中位數(shù)排序繪圖(可選) plot_order = TME_New[TME_New$Group=="Tumor",] %>% group_by(Celltype) %>% summarise(m = median(Composition)) %>% arrange(desc(m)) %>% pull(Celltype)## `summarise()` ungrouping output (override with `.groups` argument)TME_New$Celltype = factor(TME_New$Celltype,levels = plot_order)3.3 繪制箱線圖
# 3.3 出圖 if(T){mytheme <- theme(plot.title = element_text(size = 12,color="black",hjust = 0.5),axis.title = element_text(size = 12,color ="black"), axis.text = element_text(size= 12,color = "black"),panel.grid.minor.y = element_blank(),panel.grid.minor.x = element_blank(),axis.text.x = element_text(angle = 45, hjust = 1 ),panel.grid=element_blank(),legend.position = "top",legend.text = element_text(size= 12),legend.title= element_text(size= 12)) }box_TME <- ggplot(TME_New, aes(x = Celltype, y = Composition))+ labs(y="Cell composition",x= NULL,title = "TME Cell composition")+ geom_boxplot(aes(fill = Group),position=position_dodge(0.5),width=0.5,outlier.alpha = 0)+ scale_fill_manual(values = c("#1CB4B8", "#EB7369"))+theme_classic() + mytheme + stat_compare_means(aes(group = Group),label = "p.signif",method = "wilcox.test",hide.ns = T)box_TME;ggsave("./Output/TCGA_HNSCC_TME.pdf",box_TME,height=15,width=25,unit="cm")除了常規(guī)的結(jié)果展示,筆者還看到有一篇文獻(xiàn)《The Immune Subtypes and Landscape of Squamous Cell Carcinoma》,將Cibersort計(jì)算得到的20類免疫細(xì)胞大致分為4大類:Total lymphocytes,Total dendritic cell,Total macrophage及Total mast cell。此外,還計(jì)算了M1/M2巨噬細(xì)胞的比例。方法部分的描述如下:
4.1. 提取文獻(xiàn)用到的前20種免疫細(xì)胞
### 4.1 取20種免疫細(xì)胞 TCGA_TME_four = as.data.frame(TCGA_TME.results[,1:20]) head(TCGA_TME_four,3)## B cells naive B cells memory Plasma cells T cells CD8 T cells CD4 naive ## TCGA.CV.6943.01 0.007651678 0.00000000 0.04249975 0.33831313 0 ## TCGA.CV.6959.01 0.019549031 0.00000000 0.05950866 0.03842026 0 ## TCGA.CV.7438.11 0.025349204 0.01107915 0.06419163 0.14099744 0 ## T cells CD4 memory resting T cells CD4 memory activated T cells follicular helper ## TCGA.CV.6943.01 0.00000000 0.06796614 0.03913214 ## TCGA.CV.6959.01 0.11798455 0.07211551 0.00000000 ## TCGA.CV.7438.11 0.06486108 0.03390592 0.07309813 ## T cells regulatory (Tregs) T cells gamma delta NK cells resting NK cells activated ## TCGA.CV.6943.01 0.05035414 0.00000000 0.007625709 0.000000000 ## TCGA.CV.6959.01 0.00000000 0.00000000 0.037617016 0.001607423 ## TCGA.CV.7438.11 0.04039972 0.02312377 0.000000000 0.030996791 ## Monocytes Macrophages M0 Macrophages M1 Macrophages M2 Dendritic cells resting ## TCGA.CV.6943.01 0 0.04310127 0.2272938 0.08760478 0.010476052 ## TCGA.CV.6959.01 0 0.29903927 0.1140220 0.08606996 0.008655281 ## TCGA.CV.7438.11 0 0.12215035 0.1099711 0.05154016 0.104453345 ## Dendritic cells activated Mast cells resting Mast cells activated ## TCGA.CV.6943.01 0.00000000 0.07769942 0.0000000 ## TCGA.CV.6959.01 0.03261665 0.00000000 0.0919163 ## TCGA.CV.7438.11 0.03317884 0.06982843 0.00000004.2 根據(jù)文獻(xiàn)整理得到的免疫細(xì)胞分類
# 4.2 根據(jù)文獻(xiàn)整理得到的免疫細(xì)胞分類 immCell_four_type <- read.table("./database/Cibersort_four_types.txt", header = T, row.names = NULL, sep = "\t") colnames(TCGA_TME_four) == immCell_four_type$Immune.cells #T## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUEhead(immCell_four_type)## Immune.cells Types ## 1 B cells naive Lymphocytes ## 2 B cells memory Lymphocytes ## 3 Plasma cells Lymphocytes ## 4 T cells CD8 Lymphocytes ## 5 T cells CD4 naive Lymphocytes ## 6 T cells CD4 memory resting Lymphocytes4.3 計(jì)算每一個大類的免疫得分
# 4.3 數(shù)據(jù)預(yù)處理 TCGA_TME_four$group = group_list TCGA_TME_four$sample <- row.names(TCGA_TME_four) TME_four_new = melt(TCGA_TME_four)## Using group, sample as id variablescolnames(TME_four_new) = c("Group","Sample","Immune.cells","Composition")TCGA_TME_four_new2 = left_join(TME_four_new, immCell_four_type, by = "Immune.cells") %>% group_by(Sample,Group,Types) %>%summarize(Sum = sum(Composition))## `summarise()` regrouping output by 'Sample', 'Group' (override with `.groups` argument)# 出圖 box_four_immtypes <- ggplot(TCGA_TME_four_new2, aes(x = Group, y = Sum))+ labs(y="Cell composition",x= NULL,title = "TCGA")+ geom_boxplot(aes(fill = Group),position=position_dodge(0.5),width=0.5,size=0.4,outlier.alpha = 1, outlier.size = 0.5)+ theme_bw() + mytheme + scale_fill_manual(values = c("#1CB4B8","#EB7369"))+ scale_y_continuous(labels = scales::percent)+facet_wrap(~ Types,scales = "free",ncol = 4) + stat_compare_means(aes(group = Group),label = "p.format",method = "wilcox.test",size = 3.5,hide.ns = T) box_four_immtypes;ggsave("./Output/TCHA_HNSC_Cibersort_four_immune_cell_types.pdf",box_four_immtypes ,height= 10,width=25,unit="cm")模塊二. Cibersort網(wǎng)頁版實(shí)現(xiàn)
網(wǎng)頁版和代碼版輸出的結(jié)果是等價的。區(qū)別在于用R代碼運(yùn)行Cibersort非常耗時,但勝在比較自由方便;而網(wǎng)頁版的好處在于在線運(yùn)行數(shù)據(jù),上傳和運(yùn)行之后,即使關(guān)閉網(wǎng)頁也能拿到數(shù)據(jù),缺點(diǎn)在于網(wǎng)頁版不太穩(wěn)定,網(wǎng)絡(luò)不好的時候很難登錄和使用。
筆者再介紹一下網(wǎng)頁版的使用步驟:
首先是注冊和登錄,登錄網(wǎng)址是https://cibersortx.stanford.edu/,注冊需要edu的郵箱。
第二步是上傳數(shù)據(jù),如下圖所示,點(diǎn)擊Menu—Upload files—Add files上傳txt數(shù)據(jù),數(shù)據(jù)格式詳見示例數(shù)據(jù)。
第三步,配置參數(shù),準(zhǔn)備運(yùn)行,點(diǎn)擊Menu—Run CIBERSORTx—2.Impute Cell Fractions,具體的配置如下:
和代碼版的一樣,為了加速運(yùn)行的速度,Permutations for significance analysis這里選擇了50次:
第四步,運(yùn)行一段時間之后,可以看到結(jié)果,Menu—Job Results,點(diǎn)擊CSV或者XLSX可得到預(yù)測的結(jié)果,即為模塊一的輸出數(shù)據(jù)。
sessionInfo()## R version 4.0.4 (2021-02-15) ## Platform: x86_64-w64-mingw32/x64 (64-bit) ## Running under: Windows 7 x64 (build 7601) Service Pack 1 ## ## Matrix products: default ## ## locale: ## [1] LC_COLLATE=Chinese (Simplified)_People's Republic of China.936 ## [2] LC_CTYPE=Chinese (Simplified)_People's Republic of China.936 ## [3] LC_MONETARY=Chinese (Simplified)_People's Republic of China.936 ## [4] LC_NUMERIC=C ## [5] LC_TIME=Chinese (Simplified)_People's Republic of China.936 ## ## attached base packages: ## [1] parallel stats graphics grDevices utils datasets methods base ## ## other attached packages: ## [1] preprocessCore_1.50.0 e1071_1.7-3 dplyr_1.0.0 ggpubr_0.4.0 ## [5] reshape2_1.4.4 ggplot2_3.3.2 ## ## loaded via a namespace (and not attached): ## [1] tinytex_0.24 tidyselect_1.1.0 xfun_0.15 purrr_0.3.4 haven_2.3.1 ## [6] carData_3.0-4 colorspace_1.4-1 vctrs_0.3.1 generics_0.1.0 htmltools_0.5.0 ## [11] yaml_2.2.1 rlang_0.4.6 pillar_1.4.6 foreign_0.8-80 glue_1.4.1 ## [16] withr_2.2.0 readxl_1.3.1 lifecycle_0.2.0 plyr_1.8.6 stringr_1.4.0 ## [21] munsell_0.5.0 ggsignif_0.6.0 gtable_0.3.0 cellranger_1.1.0 zip_2.1.1 ## [26] evaluate_0.14 labeling_0.3 knitr_1.29 rio_0.5.16 forcats_0.5.0 ## [31] class_7.3-17 curl_4.3 fansi_0.4.1 broom_0.7.0 Rcpp_1.0.5 ## [36] scales_1.1.1 backports_1.1.7 abind_1.4-5 farver_2.0.3 hms_0.5.3 ## [41] digest_0.6.25 stringi_1.4.6 openxlsx_4.1.5 rstatix_0.6.0 grid_4.0.2 ## [46] cli_2.0.2 tools_4.0.2 magrittr_1.5 tibble_3.0.2 crayon_1.3.4 ## [51] tidyr_1.1.0 car_3.0-8 pkgconfig_2.0.3 ellipsis_0.3.1 data.table_1.12.8 ## [56] assertthat_0.2.1 rmarkdown_2.7 rstudioapi_0.11 R6_2.4.1 igraph_1.2.5 ## [61] compiler_4.0.2參考資料:
Newman AM, Steen CB, Liu CL, Gentles AJ, Chaudhuri AA, Scherer F, Khodadoust MS, Esfahani MS, Luca BA, Steiner D, Diehn M, Alizadeh AA. Determining cell type abundance and expression from bulk tissues with digital cytometry. Nat Biotechnol. 2019 Jul;37(7):773-782. doi: 10.1038/s41587-019-0114-2. Epub 2019 May 6. PMID: 31061481; PMCID: PMC6610714.
Chen B, Khodadoust MS, Liu CL, Newman AM, Alizadeh AA. Profiling Tumor Infiltrating Immune Cells with CIBERSORT. Methods Mol Biol. 2018;1711:243-259. doi: 10.1007/978-1-4939-7493-1_12. PMID: 29344893; PMCID: PMC5895181.
Li B, Cui Y, Nambiar DK, Sunwoo JB, Li R. The Immune Subtypes and Landscape of Squamous Cell Carcinoma. Clin Cancer Res. 2019 Jun 15;25(12):3528-3537. doi: 10.1158/1078-0432.CCR-18-4085. Epub 2019 Mar 4. PMID: 30833271; PMCID: PMC6571041.
數(shù)據(jù)和代碼下載:
https://gitee.com/ct5869/shengxin-baodian/tree/master/TCGA
作者:趙法明
編輯:生信寶典
往期精品(點(diǎn)擊圖片直達(dá)文字對應(yīng)教程)
機(jī)器學(xué)習(xí)
后臺回復(fù)“生信寶典福利第一波”或點(diǎn)擊閱讀原文獲取教程合集
總結(jié)
以上是生活随笔為你收集整理的Cibersort免疫浸润的在线分析及R语言代码实现的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 综述之我的十年本硕博生活
- 下一篇: 2020年度国家自然科学基金医学领域结果