Statistical Learning – Formula Sheet (final) Auxiliary: ΣX,Y := E h (X − E [X]) (Y − E [Y ])> i , fX1,...,Xp (x) = exp − 1 2 (x − µ) >Σ −1 (x − µ) p (2π) p|Σ| , P(Y = k) = e −λλ k k! Intro: E h (y0− ˆf(x0))2 i = Var( ˆf(x0)) + Bias( ˆf(x0))2 + Var(), OLS: βˆ = (X>X) −1X>y Model selection: AIC = 2p − 2 log(L), AICc = AIC + 2p(p + 1) n − p − 1 , BIC = log(n)p − 2 log(L) Ridge: βˆ 0 = ¯y − X p j=1 βjx¯ (j) , βˆ∗ = [β1, · · · , βp] > = X∗>X∗ + λI −1 X∗>y ∗ , where x ∗ i,j = xi,j − x¯ (j) , x¯ (j) = 1 n Xn i=1 xi,j , y ∗ i = yi − y¯, ¯y = 1 n Xn i=1 yi Bayes: f(β|X, y) ∝ f(y|β, X)p(β), βˆ = arg max β f(β|X, y) Step: Cj (X) = 1{cj≤X i β, f(yi ; θi , φ) = exp yiθi − b(θi) ai(φ) + c(yi , φ) , θi = g(E[Yi ]) canonical FY (Y ) ∼ Uniform[0, 1], GAMs: g (E[Yi |Xi ]) = X p j=1 fj (Xi,j ), Classification: Yˆ = arg max c∈C P (Y = c|X = x), 1 − max c∈C P (Y = c|X = x) LDA: P(Y = c|X = x0) ∝ φc(x0)πc, πc = P (Y = c), φc(xi) := fX|Y (xi |c) yˆ0 = argmax c∈C cc + b > c x0, where cc := log(ˆπc) − 1 2 µˆ > c Σˆ −1µˆc , b > c = µˆ > c Σˆ −1 Σˆ j,k = 1 n Xn i=1 x˜i,jx˜i,k, x˜i := xi − µˆyi , ˆπc = card(Sc) n , µˆc = 1 card(Sc) X i∈Sc xi . QDA: ˆy0 = argmax c∈C c˜c + ˜b > c x0 + x > 0 A˜ cx0, where A˜ c := − 1 2Σˆ −1 c , ˜b > c := µˆ > c Σˆ −1 c , c˜c := log(ˆπc) − 1 2 log |Σˆ c| − 1 2 µˆ > c Σˆ −1 c µˆc , Σˆ c = 1 card(Sc) − 1 X i∈Sc (xi − µˆc ) (xi − µˆc ) > Logistic: ζ(z) = e z 1 + e z , P (Yi = 1|Xi = xi) = E [Yi |Xi = xi ] = ζ β0 + X p j=1 βjxi,j , Loss: ˆy0 = argmin c∈C E [L(Y, c)|X = x