Survival analysis
生活随笔
收集整理的這篇文章主要介紹了
Survival analysis
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
歡迎關注博客:http://blog.genesino.com/2017/08/Survival-analysis-plots/
準備工作
#安裝R包 #install.packages(c("survival", "survminer")) #加載R包 library(survival) library(survminer) #survival包里包含的數據集 data(package="survival") #以肺癌數據為例,顯示數據前六行 head(lung) ## inst time status age sex ph.ecog ph.karno pat.karno meal.cal wt.loss ## 1 3 306 2 74 1 1 90 100 1175 NA ## 2 3 455 2 68 1 0 90 90 1225 15 ## 3 3 1010 1 56 1 0 90 90 NA 15 ## 4 5 210 2 57 1 1 90 60 1150 11 ## 5 1 883 2 60 1 0 100 90 NA 0 ## 6 12 1022 1 74 1 1 50 80 513 0 #查看肺癌數據詳細說明 ?lung #inst: Institution code #time: Survival time in days #status: censoring status 1=censored, 2=dead #age: Age in years #sex: Male=1 Female=2 #ph.ecog: ECOG performance score (0=good 5=dead) #ph.karno: Karnofsky performance score (bad=0-good=100) rated by physician #pat.karno: Karnofsky performance score as rated by patient #meal.cal: Calories consumed at meals #wt.loss: Weight loss in last six months建立生存分析
#創建生存數據對象 surv_obj <- with(lung, Surv(time, status))#用survfit()估計KM生存曲線, 用數據集中所有患者作為分析對象 fit1 <- survfit(surv_obj ~ 1)#顯示壽命表 summary1 <- surv_summary(fit1) head(summary1) ## time n.risk n.event n.censor surv std.err upper lower ## 1 5 228 1 0 0.9956140 0.004395615 1.0000000 0.9870734 ## 2 11 227 3 0 0.9824561 0.008849904 0.9996460 0.9655619 ## 3 12 224 1 0 0.9780702 0.009916654 0.9972662 0.9592437 ## 4 13 223 2 0 0.9692982 0.011786516 0.9919508 0.9471630 ## 5 15 221 1 0 0.9649123 0.012628921 0.9890941 0.9413217 ## 6 26 220 1 0 0.9605263 0.013425540 0.9861367 0.9355811 #中位生存期及95%置信區間 #中位生存期:當累積生存率為0.5時所對應的生存時間,表示有且只有50%的個體可以活過這個時間. fit1 ## Call: survfit(formula = surv_obj ~ 1) ## ## n events median 0.95LCL 0.95UCL ## 228 165 310 285 363 ##25%, 50%, 75%生存期 quantile(fit1, probs=c(0.25, 0.5, 0.75), conf.int=FALSE) ## 25 50 75 ## 170 310 550 #50天和100天生存狀況 summary(fit1, times=c(200, 400)) ## Call: survfit(formula = surv_obj ~ 1) ## ## time n.risk n.event survival std.err lower 95% CI upper 95% CI ## 200 144 72 0.680 0.0311 0.622 0.744 ## 400 57 54 0.377 0.0358 0.313 0.454不同因子生存曲線的比較
例如我們想比較男性和女性肺癌患者生存曲線的差別
#用survfit()估計KM生存曲線, 方程右邊以性別為區分,分別估計男性和女性的生存曲線 fit2 <- survfit(surv_obj ~ sex, data = lung)#分別顯示男性和女性的壽命表 summary2 <- surv_summary(fit2) head(summary2) ## time n.risk n.event n.censor surv std.err upper lower ## 1 11 138 3 0 0.9782609 0.01268978 1.0000000 0.9542301 ## 2 12 135 1 0 0.9710145 0.01470747 0.9994124 0.9434235 ## 3 13 134 2 0 0.9565217 0.01814885 0.9911586 0.9230952 ## 4 15 132 1 0 0.9492754 0.01967768 0.9866017 0.9133612 ## 5 26 131 1 0 0.9420290 0.02111708 0.9818365 0.9038355 ## 6 30 130 1 0 0.9347826 0.02248469 0.9768989 0.8944820 ## strata sex ## 1 sex=1 1 ## 2 sex=1 1 ## 3 sex=1 1 ## 4 sex=1 1 ## 5 sex=1 1 ## 6 sex=1 1 #分別顯示男性和女性的中位生存期及95%置信區間 fit2 ## Call: survfit(formula = surv_obj ~ sex, data = lung) ## ## n events median 0.95LCL 0.95UCL ## sex=1 138 112 270 212 310 ## sex=2 90 53 426 348 550 #用lag-rank test檢驗不同組生存曲線的差異 #survdiff()函數最后一個參數為rho,用于指定檢驗的類型 #rho=0(默認),進行log-rank檢驗或Mantel?Haenszelχ2檢驗,比較各組期望頻數和實際觀察數。如果兩組間的差異水平太大,χ2會較大而P值較小,表示生存曲線有統計學差異. #當rho=1時,進行Gehan-Wilcoxon的Peto校正檢驗,該檢驗賦予早期終點事件較大的權重 surv_diff <- survdiff(surv_obj ~ sex, data = lung) surv_diff ## Call: ## survdiff(formula = surv_obj ~ sex, data = lung) ## ## N Observed Expected (O-E)^2/E (O-E)^2/V ## sex=1 138 112 91.6 4.55 10.3 ## sex=2 90 53 73.4 5.68 10.3 ## ## Chisq= 10.3 on 1 degrees of freedom, p= 0.00131生存曲線可視化
繪制兩條生存曲線
ggsurv <- ggsurvplot(fit2, #survfit object with calculated statistics.data = lung, #data used to fit survival curves.main = "Survival curve", #add title xlab = "Time (Days)", #change the x axis label.font.main = c(16, "bold", "darkblue"), #change title font size, style and colorfont.x = c(14, "bold.italic", "red"), #change x axis label font size, style and colorfont.y = c(14, "bold.italic", "darkred"), #change y axis font size, style and colorfont.tickslab = c(12, "plain", "darkgreen"), #change ticks label font size, style and colorlegend = "bottom", #change legend position#legend = c(0.2, 0.2), #Specify legend position by its coordinateslegend.title = "Sex", #change legend titlelegend.labs = c("Male", "Female"), #change legend labelssize = 1, #change line sizelinetype = "strata", #change line type by groups (i.e. "strata")break.time.by = 250, #break X axis in time intervals by 250#xlim = c(0, 600)), #shorten the survival curvespalette = c("#E7B800", "#2E9FDF"), #custom color palette#palette = "Dark2", #Use brewer color palette "Dark2"#palette = "grey", #Use grey paletteconf.int = TRUE, #Add confidence intervalconf.int.style = "step", #customize style of confidence intervalspval = TRUE, #Add p-value of the Log-Rank test comparing the groupssurv.median.line = "hv", #add horizontal/vertical line at median survival.risk.table = TRUE, #Add risk table#risk.table = "absolute/percentage/abs_pct", #to show absolute number, percentage of subjects, and both at risk by time, respectively.risk.table.col = "strata", #Change risk table color by groupsrisk.table.y.text.col = TRUE, #colour risk table text annotations by strata#risk.table.y.text = FALSE, #show bars instead of names in text annotations in legend of risk table risk.table.height = 0.25, #the height of the risk table. Useful when you have multiple groupsncensor.plot = TRUE, # plot the number of censored subjects at time tncensor.plot.height = 0.25,ggtheme = theme_bw(), #customize plot and risk table with a theme.xlim = c(0, 600) #Change x axis limits (xlim)) ggsurv繪制多條生存曲線
fit3 <- survfit(Surv(time, status) ~ sex + rx + adhere, data = colon )ggsurvplot(fit3, pval = TRUE, #Add p-valuebreak.time.by = 800, #break time axis by 800risk.table = TRUE, #add risk tablerisk.table.col = "strata", #Change risk table color by groupsrisk.table.height = 0.5, #the height of the risk table. Useful when you have multiple groupsggtheme = theme_bw() #change plot theme#legend.labs = c("A", "B", "C", "D", "E", "F") #Change legend labels)Cox回歸
單因素COX回歸
covariates <- c("age", "sex", "ph.karno", "ph.ecog", "meal.cal", "wt.loss") univ_formulas <- sapply(covariates,function(x) as.formula(paste('Surv(time, status)~', x)))univ_models <- lapply( univ_formulas, function(x){coxph(x, data = lung)})# Extract results univ_results <- lapply(univ_models,function(x){ x <- summary(x)p.value<-signif(x$wald["pvalue"], digits=2)wald.test<-signif(x$wald["test"], digits=2)beta<-signif(x$coef[1], digits=2);#coeficient betaHR <-signif(x$coef[2], digits=2);#exp(beta)HR.confint.lower <- signif(x$conf.int[,"lower .95"], 2)HR.confint.upper <- signif(x$conf.int[,"upper .95"],2)HR <- paste0(HR, " (", HR.confint.lower, "-", HR.confint.upper, ")")res<-c(beta, HR, wald.test, p.value)names(res)<-c("beta", "HR (95% CI for HR)", "wald.test", "p.value")return(res)#return(exp(cbind(coef(x),confint(x))))}) results <- t(as.data.frame(univ_results, check.names = FALSE)) as.data.frame(results) ## beta HR (95% CI for HR) wald.test p.value ## age 0.019 1 (1-1) 4.1 0.042 ## sex -0.53 0.59 (0.42-0.82) 10 0.0015 ## ph.karno -0.016 0.98 (0.97-1) 7.9 0.005 ## ph.ecog 0.48 1.6 (1.3-2) 18 2.7e-05 ## meal.cal -0.00012 1 (1-1) 0.29 0.59 ## wt.loss 0.0013 1 (0.99-1) 0.05 0.83多因素COX回歸
多個自變量是如何共同影響因變量
cox_model <- coxph(Surv(time, status) ~ age + sex + ph.ecog + ph.karno, data = lung) summary(cox_model) ## Call: ## coxph(formula = Surv(time, status) ~ age + sex + ph.ecog + ph.karno, ## data = lung) ## ## n= 226, number of events= 163 ## (2 observations deleted due to missingness) ## ## coef exp(coef) se(coef) z Pr(>|z|) ## age 0.012868 1.012951 0.009404 1.368 0.171226 ## sex -0.572802 0.563943 0.169222 -3.385 0.000712 *** ## ph.ecog 0.633077 1.883397 0.176034 3.596 0.000323 *** ## ph.karno 0.012558 1.012637 0.009514 1.320 0.186842 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## exp(coef) exp(-coef) lower .95 upper .95 ## age 1.0130 0.9872 0.9945 1.0318 ## sex 0.5639 1.7732 0.4048 0.7857 ## ph.ecog 1.8834 0.5310 1.3338 2.6594 ## ph.karno 1.0126 0.9875 0.9939 1.0317 ## ## Concordance= 0.632 (se = 0.026 ) ## Rsquare= 0.129 (max possible= 0.999 ) ## Likelihood ratio test= 31.27 on 4 df, p=2.695e-06 ## Wald test = 30.61 on 4 df, p=3.676e-06 ## Score (logrank) test = 31.06 on 4 df, p=2.974e-06COX回歸結果解釋
1. z結果是wald檢驗的結果,z = coef/se(coef),用于檢驗回歸系數是否顯著區別于0 2. 回歸系數(coef) 3. 風險???(hazard ratio, HR): HR = exp(coef), 衡量協變量影響的程度 4. 風險比的95%置信區間 5. 對模型的整體分析,即評價模型是否有意義的三種檢驗(Likelihood ratio test, Wald test, Score (logrank) test)COX模型等比性檢驗
方法一:通過圖形展示,對縱軸和橫軸分別取對數,繪制不同自變量水平下的生存曲線。如果兩條曲線平行,則符合比例風險假定. 方法二:利用cox.zph函數進行檢驗 cox.zph(cox_model) ## rho chisq p ## age 0.0168 0.0502 0.8227 ## sex 0.0881 1.2539 0.2628 ## ph.ecog 0.0600 0.5122 0.4742 ## ph.karno 0.1843 4.4660 0.0346 ## GLOBAL NA 8.0414 0.0901 #檢驗結果中ph.karno具有顯著性(p<0.05),即這一自變量不符合“等比性”要求,這一回歸模型不準確根據檢驗結果調整模型,將不可比的自變量進行分層, 也就是說排除混雜因素的影響,分析研究因素的影響
cox_model1 <- coxph(Surv(time, status) ~ age + sex + ph.ecog + strata(ph.karno), data = lung) summary(cox_model1) ## Call: ## coxph(formula = Surv(time, status) ~ age + sex + ph.ecog + strata(ph.karno), ## data = lung) ## ## n= 226, number of events= 163 ## (2 observations deleted due to missingness) ## ## coef exp(coef) se(coef) z Pr(>|z|) ## age 0.014807 1.014918 0.009651 1.534 0.124972 ## sex -0.591683 0.553395 0.173049 -3.419 0.000628 *** ## ph.ecog 0.483879 1.622356 0.196558 2.462 0.013825 * ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## exp(coef) exp(-coef) lower .95 upper .95 ## age 1.0149 0.9853 0.9959 1.0343 ## sex 0.5534 1.8070 0.3942 0.7768 ## ph.ecog 1.6224 0.6164 1.1037 2.3848 ## ## Concordance= 0.601 (se = 0.054 ) ## Rsquare= 0.085 (max possible= 0.986 ) ## Likelihood ratio test= 19.99 on 3 df, p=0.0001708 ## Wald test = 19.14 on 3 df, p=0.0002557 ## Score (logrank) test = 19.57 on 3 df, p=0.0002083再次檢驗模型
cox.zph(cox_model1) ## rho chisq p ## age 0.0454 0.363 0.547 ## sex 0.0919 1.289 0.256 ## ph.ecog 0.0358 0.184 0.668 ## GLOBAL NA 1.958 0.581 #哈哈哈,bingo~總結
以上是生活随笔為你收集整理的Survival analysis的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: mac使用的正确操作与注意事项(人体工程
- 下一篇: 轻松使用终端开启macOS系统的隐藏功能