Paper Review: Bayesian Regularization and Prediction
Paper Review: Bayesian Regularization and Prediction
One-group Answers to Two-group questions
Two-group questions: I think this means two alternatives of βi=0\beta_i=0βi?=0 or βi≠0\beta_i \ne 0βi??=0.
Two-group answers: decrete mixture priors of βi\beta_iβi?
Multiple Testing
y∣β~N(β,σ2I),βp=(β1,?,βp)′y|\beta \sim N(\beta,\sigma^2I),\beta_p=(\beta_1,\cdots,\beta_p)'y∣β~N(β,σ2I),βp?=(β1?,?,βp?)′
Decrete mixture priors of βi\beta_iβi?:
βi~wg(βi)+(1?w)δ0\beta_i \sim wg(\beta_i)+(1-w)\delta_0βi?~wg(βi?)+(1?w)δ0?
Marginal density of yyy under β=0\beta = 0β=0 and β≠0\beta\ne 0β?=0,
f0(y)=N(y∣0,σ2),f1(y)=∫N(y∣β,σ2)g(β)dβf_0(y)=N(y|0,\sigma^2),\ f_1(y)=\int N(y|\beta,\sigma^2)g(\beta)d\betaf0?(y)=N(y∣0,σ2),?f1?(y)=∫N(y∣β,σ2)g(β)dβ
Interpret w(y)w(y)w(y) as posterior probability like P(β≠0)P(\beta \ne 0)P(β?=0) (yyy is a signal):
w(y)=P(β≠0∣y)=P(y,β≠0)P(y)=wf1(y)wf1(y)+(1?w)f0(y)w(y)=P(\beta \ne 0|y)=\frac{P(y,\beta \ne 0)}{P(y)} = \frac{wf_1(y)}{wf_1(y)+(1-w)f_0(y)}w(y)=P(β?=0∣y)=P(y)P(y,β?=0)?=wf1?(y)+(1?w)f0?(y)wf1?(y)?
Sparse Regression
Note that
Loss function of penalized regression:
l(β)=∥y?Xβ∥22+ν∑i=1pψ(βi2)l(\beta) = \left\| y - X\beta \right\|_2^2+\nu \sum_{i=1}^p \psi(\beta_i^2)l(β)=∥y?Xβ∥22?+νi=1∑p?ψ(βi2?)
Equivalent to Y~N(Xβ,σ2I)Y \sim N(X\beta,\sigma^2I)Y~N(Xβ,σ2I) with prior π(βi∣ν)∝exp?(νψ(βi2))\pi(\beta_i|\nu)\propto\exp \left( \nu \psi( \beta_i^2) \right)π(βi?∣ν)∝exp(νψ(βi2?)). See posterior probability of β\betaβ:
π(β∣y,σ2,ν)∝1σexp?(?∥y?Xβ∥222σ2+ν∑i=1pψ(βi2))\pi(\beta|y,\sigma^2,\nu)\propto \frac{1}{\sigma} \exp\left( -\frac{ \left\| y - X\beta \right\|_2^2}{2\sigma^2} + \nu\sum_{i=1}^p \psi(\beta_i^2)\right)π(β∣y,σ2,ν)∝σ1?exp(?2σ2∥y?Xβ∥22??+νi=1∑p?ψ(βi2?))
So maximization of loss function is equivalent to MAP pf posterior probability.
Global-local Shrinkage
Framework
Consider Y~N(Xβ,σ2I)Y \sim N(X\beta,\sigma^2I)Y~N(Xβ,σ2I) with priors
βi∣τ2,λi2~N(0,τ2λi2)λi2~π(λi2)(τ2,σ2)~π(τ2,σ2)\beta_i|\tau^2,\lambda_i^2 \sim N(0,\tau^2\lambda_i^2) \\ \lambda_i^2 \sim \pi(\lambda^2_i) \\ (\tau^2,\sigma^2) \sim \pi(\tau^2,\sigma^2)βi?∣τ2,λi2?~N(0,τ2λi2?)λi2?~π(λi2?)(τ2,σ2)~π(τ2,σ2)
Joint priors:
π(β,Λ,τ2,σ2)=∏i=1pN(0,τ2λi2)π(λi2)π(τ2,σ2)\pi(\beta,\Lambda,\tau^2,\sigma^2)=\prod_{i=1}^p N(0,\tau^2\lambda_i^2)\pi(\lambda^2_i)\pi(\tau^2,\sigma^2)π(β,Λ,τ2,σ2)=i=1∏p?N(0,τ2λi2?)π(λi2?)π(τ2,σ2)
Question: why (3)?
Transformation to orthogonal scheme under UUU: Z=XU,Z′Z=D,α=U′βZ = XU,\ Z'Z=D, \alpha = U'\betaZ=XU,?Z′Z=D,α=U′β and set α∣Λ,τ2,σ2~N(0,σ2τ2nD?1Λ)\alpha|\Lambda,\tau^2,\sigma^2\sim N(0,\sigma^2\tau^2nD^{-1}\Lambda)α∣Λ,τ2,σ2~N(0,σ2τ2nD?1Λ) so β∣Λ,τ2,σ2~N(0,σ2τ2nUD?1ΛU′)\beta|\Lambda,\tau^2,\sigma^2\sim N(0,\sigma^2\tau^2nUD^{-1}\Lambda U')β∣Λ,τ2,σ2~N(0,σ2τ2nUD?1ΛU′).
Question: how to understand this?
Question: how to get this?
To squelch the noise and shrink coefficients
Properties: why good performance?
Robust tail
Question: how to understand η\etaη?
Efficiency
Global Variance Component
Never choose π(τ2,σ2)\pi(\tau^2,\sigma^2)π(τ2,σ2) that forces σ2\sigma^2σ2 away from zero.
Numerical Examples
Regularized Regression
Wavelet denoising
總結
以上是生活随笔為你收集整理的Paper Review: Bayesian Regularization and Prediction的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: UA SIE545 优化理论基础1 例题
- 下一篇: UA PHYS515A 电磁理论IV 时