aMCMC for Horseshoe: algorithms
aMCMC for Horseshoe: algorithms
- Exact Algorithm
- Step 1: sampling η\etaη
- Step 2: sampling ξ\xiξ
- Step 3: sampling σ2\sigma^2σ2
- Step 4: sampling β\betaβ
- Approximate Algorithm
- Correlation and Traceplot
A limitation of Bayesian shrinkage: lack of computational efficiency under very-high dimension.
Framework
L(z∣Wβ,σ2)=(2πσ2)?N2e?12σ2(z?Wβ)′(z?Wβ)βj∣σ2,ηj,ξ~iidN(0,σ2ξηj),j=1,?,pη?1/2~iidC+(0,1),ξ~C+(0,1),σ2~I(xiàn)G(w2,w2)L(z|W\beta,\sigma^2)=(2\pi \sigma^2)^{-\frac{N}{2}}e^{-\frac{1}{2\sigma^2}(z-W\beta)'(z-W\beta)} \\ \beta_j|\sigma^2,\eta_j,\xi \sim_{iid} N(0,\frac{\sigma^2}{\xi \eta_j}),j=1,\cdots,p \\ \eta^{-1/2} \sim_{iid}C^+(0,1),\xi \sim C^+(0,1),\sigma^2 \sim IG(\frac{w}{2},\frac{w}{2}) L(z∣Wβ,σ2)=(2πσ2)?2N?e?2σ21?(z?Wβ)′(z?Wβ)βj?∣σ2,ηj?,ξ~iid?N(0,ξηj?σ2?),j=1,?,pη?1/2~iid?C+(0,1),ξ~C+(0,1),σ2~IG(2w?,2w?)
Exact Algorithm
A blocked Metropolis-within-Gibbs algorithm:
where
D=diag(ηj?1),Mξ=IN+ξ?1WDW′p(ξ∣η)=∣Mξ∣?1/2(w2+w+z′Mξ?1z2)?w+N21ξ(1+ξ)D = diag(\eta_j^{-1}),M_{\xi}=I_N+\xi^{-1}WDW' \\ p(\xi|\eta)=|M_{\xi}|^{-1/2}(\frac{w}{2}+\frac{w+z'M_{\xi}^{-1}z}{2})^{-\frac{w+N}{2}}\frac{1}{\sqrt{\xi}(1+\xi)}D=diag(ηj?1?),Mξ?=IN?+ξ?1WDW′p(ξ∣η)=∣Mξ?∣?1/2(2w?+2w+z′Mξ?1?z?)?2w+N?ξ?(1+ξ)1?
Step 1: sampling η\etaη
p(ηj∣ξ,β,σ2)∝p(β∣ηj,ξ,σ2)p(ηj)∝ηje?β22σ2ηjξp(ηj)p(\eta_j|\xi,\beta,\sigma^2)\propto p(\beta|\eta_j,\xi,\sigma^2)p(\eta_j) \\ \propto \sqrt{\eta_j}e^{-\frac{\beta^2}{2\sigma^2 \eta_j \xi}}p(\eta_j)p(ηj?∣ξ,β,σ2)∝p(β∣ηj?,ξ,σ2)p(ηj?)∝ηj??e?2σ2ηj?ξβ2?p(ηj?)
ηj?1/2~C+(0,1)=dX\eta_j^{-1/2} \sim C^+(0,1) =_d Xηj?1/2?~C+(0,1)=d?X,
p(ηj)=2π11+(ηj?1/2)2∣?12ηj?3/2∣=1π1ηj1/2+ηj3/2p(\eta_j) = \frac{2}{\pi}\frac{1}{1+(\eta_j^{-1/2})^2} |-\frac{1}{2}\eta_j^{-3/2}| \\ = \frac{1}{\pi} \frac{1}{\eta_j^{1/2}+\eta_j^{3/2}}p(ηj?)=π2?1+(ηj?1/2?)21?∣?21?ηj?3/2?∣=π1?ηj1/2?+ηj3/2?1?
p(ηj∣ξ,β,σ2)∝ηje?β22σ2ηjξ1ηj1/2+ηj3/2=11+ηje?β2ηjξ2σ2p(\eta_j|\xi,\beta,\sigma^2) \propto \sqrt{\eta_j}e^{-\frac{\beta^2}{2\sigma^2 \eta_j \xi}}\frac{1}{\eta_j^{1/2}+\eta_j^{3/2}} = \frac{1}{1+\eta_j}e^{-\frac{\beta^2 \eta_j \xi}{2\sigma^2}}p(ηj?∣ξ,β,σ2)∝ηj??e?2σ2ηj?ξβ2?ηj1/2?+ηj3/2?1?=1+ηj?1?e?2σ2β2ηj?ξ?
Rejection sampler (Appendix S1)
Given ?∈(0,1)\epsilon \in (0,1)?∈(0,1), this sampler is used to sample from
h?(t)=C?e??t1+t,t>0h_{\epsilon}(t)=C_{\epsilon} \frac{e^{-\epsilon t}}{1+t},t>0h??(t)=C??1+te??t?,t>0
where
∫t>0h?(t)dt=∫t>0C?e??t1+tdt=1?C?=e??E1(?)E1(?)=∫?+∞e?ttdt\int_{t>0}h_{\epsilon}(t)dt=\int_{t>0}C_{\epsilon} \frac{e^{-\epsilon t}}{1+t}dt=1 \Rightarrow C_{\epsilon}=\frac{e^{-\epsilon}}{E_1(\epsilon)} \\ E_1(\epsilon)=\int_{\epsilon}^{+\infty} \frac{e^{-t}}{t}dt∫t>0?h??(t)dt=∫t>0?C??1+te??t?dt=1?C??=E1?(?)e???E1?(?)=∫?+∞?te?t?dt
Algorithm
Step 1: Draw z and u independently
z~hL,u~U(0,1)z \sim h_L,u \sim U(0,1)z~hL?,u~U(0,1). To draw z~hLz \sim h_Lz~hL?,
Step 2: accept z as sample from h with acceptance rate
Here acceptance rate is e?(f?fL)(z)e^{-(f-f_L)(z)}e?(f?fL?)(z).
Step 2: sampling ξ\xiξ
Transition: log?(ξ?)~N(log?(ξ),s)\log(\xi^*) \sim N(\log(\xi),s)log(ξ?)~N(log(ξ),s)
Acceptance rate is
α=min?(g(ξ∣ξ?)p(ξ?∣η,z)g(ξ?∣ξ)p(ξ∣η,z),1)\alpha = \min(\frac{g(\xi|\xi^*)p(\xi^*|\eta,z)}{g(\xi^*|\xi)p(\xi|\eta,z)},1)α=min(g(ξ?∣ξ)p(ξ∣η,z)g(ξ∣ξ?)p(ξ?∣η,z)?,1)
Calculate p(ξ∣η,z)p(\xi|\eta,z)p(ξ∣η,z) (only normalization parameter left)
p(ξ∣η,z)∝?p(z∣β,σ2)p(β∣σ2,η,ξ)p(σ2)p(ξ)dβdσ2∝∣Mξ∣?1/2(w2+12z′Mξz)?N+w21(ξ)1/2+(ξ)3/2=p(ξ∣η)p(\xi|\eta,z)\propto \iint p(z|\beta,\sigma^2)p(\beta|\sigma^2,\eta,\xi)p(\sigma^2)p(\xi)d\beta d\sigma^2 \\ \propto |M_{\xi}|^{-1/2}(\frac{w}{2}+\frac{1}{2}z'M_{\xi}z)^{-\frac{N+w}{2}}\frac{1}{(\xi)^{1/2}+(\xi)^{3/2}}=p(\xi|\eta)p(ξ∣η,z)∝?p(z∣β,σ2)p(β∣σ2,η,ξ)p(σ2)p(ξ)dβdσ2∝∣Mξ?∣?1/2(2w?+21?z′Mξ?z)?2N+w?(ξ)1/2+(ξ)3/21?=p(ξ∣η)
Calculate
g(ξ∣ξ?)p(ξ?∣η,z)g(ξ?∣ξ)p(ξ∣η,z)=12πsξe?(log?ξ?log?ξ?)22sp(ξ?∣η)12πsξ?e?(log?ξ??log?ξ)22sp(ξ∣η)=p(ξ?∣η)ξ?p(ξ∣η)ξ\frac{g(\xi|\xi^*)p(\xi^*|\eta,z)}{g(\xi^*|\xi)p(\xi|\eta,z)}=\frac{\frac{1}{\sqrt{2\pi s}\xi}e^{-\frac{(\log \xi - \log \xi^*)^2}{2s}}p(\xi^*|\eta)}{\frac{1}{\sqrt{2\pi s}\xi^*}e^{-\frac{(\log \xi^* - \log \xi)^2}{2s}}p(\xi|\eta)} = \frac{p(\xi^*|\eta)\xi^*}{p(\xi|\eta)\xi}g(ξ?∣ξ)p(ξ∣η,z)g(ξ∣ξ?)p(ξ?∣η,z)?=2πs?ξ?1?e?2s(logξ??logξ)2?p(ξ∣η)2πs?ξ1?e?2s(logξ?logξ?)2?p(ξ?∣η)?=p(ξ∣η)ξp(ξ?∣η)ξ??
Step 3: sampling σ2\sigma^2σ2
p(σ2∣η,ξ)∝∫p(z∣β,σ2)p(β∣σ2,η,ξ)p(σ2)dβ=IG(w+N2,w+z′Mξ?1z2)p(\sigma^2|\eta,\xi) \propto \int p(z|\beta,\sigma^2)p(\beta|\sigma^2,\eta,\xi)p(\sigma^2)d\beta \\ =IG(\frac{w+N}{2},\frac{w+z'M_{\xi}^{-1}z}{2})p(σ2∣η,ξ)∝∫p(z∣β,σ2)p(β∣σ2,η,ξ)p(σ2)dβ=IG(2w+N?,2w+z′Mξ?1?z?)
Step 4: sampling β\betaβ
p(β∣η,ξ,σ2,z)∝p(z∣β,σ2)p(β∣σ2,η,ξ)=N(z∣W′β,σ2)N(β∣0,σ2ξ?1D)=N(β∣(W′W+(ξ?1D)?1)?1W′z,σ2(W′W+(ξ?1D)?1)?1)p(\beta|\eta,\xi,\sigma^2,z)\propto p(z|\beta,\sigma^2)p(\beta|\sigma^2,\eta,\xi) \\ = N(z|W'\beta,\sigma^2)N(\beta|0,\sigma^2\xi^{-1}D) \\ = N(\beta|(W'W+(\xi^{-1}D)^{-1})^{-1}W'z,\sigma^2(W'W+(\xi^{-1}D)^{-1})^{-1})p(β∣η,ξ,σ2,z)∝p(z∣β,σ2)p(β∣σ2,η,ξ)=N(z∣W′β,σ2)N(β∣0,σ2ξ?1D)=N(β∣(W′W+(ξ?1D)?1)?1W′z,σ2(W′W+(ξ?1D)?1)?1)
Bhattacharya (2016)'s algorithm
Main computational bottleneck: Mξ=IN+ξ?1WDW′M_{\xi}=I_N+\xi^{-1}WDW'Mξ?=IN?+ξ?1WDW′, cost N2pN^2pN2p (if N>pN>pN>p, see Hahn et al (2018) otherwise use aMCMC).
Approximate Algorithm
key idea: β\betaβ is sparse, so not every column in WWW worth computing. This may help reduce computational cost.
Observation: βj~iidN(0,σ2ξ?1ηj?1)\beta_j \sim_{iid} N(0,\sigma^2\xi^{-1}\eta_j^{-1})βj?~iid?N(0,σ2ξ?1ηj?1?), if βj\beta_jβj? shrunk to 0, ξηj\xi \eta_jξηj? must be large;
ξ?1WDW′=∑j=1pξ?1ηj?1wjwj′\xi^{-1}WDW'=\sum_{j=1}^p \xi^{-1}\eta_j^{-1}w_jw_j'ξ?1WDW′=j=1∑p?ξ?1ηj?1?wj?wj′?
where wiw_iwi? is the i-th column of WWW, and in this case, it has no contribution. To avoid computing with these columns, import hard-threshold
Dδ=diag(ηj?11ξmax?1ηj?1>δ),ξmax=max?(ξ,ξ?)D_{\delta}=diag(\eta_j^{-1}1_{\xi_{max}^{-1}\eta_j^{-1}>\delta}), \xi_{max}=\max(\xi,\xi^*)Dδ?=diag(ηj?1?1ξmax?1?ηj?1?>δ?),ξmax?=max(ξ,ξ?)
Approximate MCMC:
Computational cost:
Exact O(N3)+O(N2p)O(N^3)+O(N^2p)O(N3)+O(N2p)
Approximate O(sδ2N)+O(Np),sδ=∣S∣O(s_{\delta}^2N)+O(Np),s_{\delta}=|S|O(sδ2?N)+O(Np),sδ?=∣S∣
Correlation and Traceplot
總結(jié)
以上是生活随笔為你收集整理的aMCMC for Horseshoe: algorithms的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: UA STAT675 统计计算I 随机数
- 下一篇: UA MATH567 高维统计专题1 稀