From 766f6593c96263d46ef8ed84791835ed9c55aebb Mon Sep 17 00:00:00 2001 From: rnburn Date: Sun, 19 May 2024 11:55:11 -0700 Subject: [PATCH] add doc --- doc/binomial-priors.png | Bin 0 -> 16524 bytes doc/objective-bayesian-inference.md | 815 ++++++++++++++++++++++++++++ 2 files changed, 815 insertions(+) create mode 100644 doc/binomial-priors.png create mode 100644 doc/objective-bayesian-inference.md diff --git a/doc/binomial-priors.png b/doc/binomial-priors.png new file mode 100644 index 0000000000000000000000000000000000000000..4aeae704cfa3bae5140d901a76c04d24f724450e GIT binary patch literal 16524 zcmeHvXH=707iJ(7rRt?gw@?J>MQVbI2oVvaOBK8j0cnO3AV4AtR}jH)1?i%ZUInB> zxQJY&DP3v+A<`o?^f|%rn^|jS&7b)_KU^el&N=VyXYc3i<1>@n`s^$MED#8U9bs_Y z6as<5ArQKv6HMUAP5fSmUuq`Z{sT}6a>QQMEggVtCQmff#~cYu3t4r zr!9{Mz5df8^yt?%qpP*+y~t3r2Mn7c+b!fdGwQD(^zZ+m| z>>s-DP_JO%8eR;6Pk9(q%Nh?p5Z=2|UV^{HTkqGO_byQCFas#u5Omw4(uQx)@->Zs8$X|slwZBQ zSj^}!>0Wv0I9_Q0=veTEDfSnc-Ql+7)$^4prB_F#prYSi47eEm_NSLWjfLHBmi~NS zExB8-65f=v`#^pC3I=vJRdT#J3mGlx7HiF>_CG1v(Jw)@70iF}`43$H)Og;OX}L>` zz~(%W4l7>?=GB$or0z4RpFt~7q)1QRUgRjOeA`+EuNob&c?w>k5^@O@qFZDk^EPAt zms_Qc;A`tK(h?ZP=)hi2Xw2u;)1nsURzgbuAZ?qXLDq$0r<31*&+Xl5tW<=-8?@VU zq?UdGeVXdHaiT^~jF}?O#k7;kH~#amCRDG=q}8(MmCX5wo?ee_>KP26uAft&>aE}z zOl>pin3CRVo^c|nk9~+75M@(9vl7=F-_)eX%{hXDvjKbmdDNEWm68+?7uQT9?|>*G z+70m)bGx@`c>G||ca1v@66 znF)OSk3h4$Y~lkbcTabey8IScUwNB8>xb-_MT@?4AOW#~PSU#_R*QO`YDLlF;xcZU z6-q2;c`=E0ji!u&CrrPeh=`J@4<-hG1hi?!94nJ#q9!}+Ut@2NJWUfjo6tfwqiw?I zfa>5`Fr(`ecyp<0jel+`cuO$wm`C?gF%RP6=DM=HXfk%u^22z7ubxI;RzubiV311s zmO_=FE|1!vP5U1TRKK2-G0lD>Ft#DL>{2bRIVV+UK?2yg!{G3-Wf^%qs;m7h0A>>A z3%iSe9zIA^arh?yyOkf2sv7FIY5ha-c$xolp}9;#c>+f~9bvq!ek|Swz`b6XMAhhL zrT^LwHyhlBdl74S80;VGPctzyMe2hoU_pvy*FG-@7}DZgwln41+q((2RspoPm4E@? z3fq?#1UzVxIsVbt<=Fb9BjR|w{>ujpcprahC|rg5pn0T@v|{JMLzA3vrOs<_#j>+j z?|}0xE{v_1{`YxQP0>k4QT!%w6>``Jh*jo7bF%`KZD2yeIK@V}-`IS0<}MDe5QQyITDbApgC0H^*z4-@OTe=`!riK4wDzY2!0-D2k&?m zfOp_1EH2{!4^9HlH9cWb&u^W|8CX)z3m!dtM!Fvt$7>%_Jt|UZc@8`=V)F<75VI0J zHgZfw?f02?;`eErm+7)I|HpgI;fN4S#n1m3Ug?EWm&4tKB?DNqwBga9@f^@%3krOTo>>PAUXeGkzu)T;BASWnJABA?u18L#)>&D z>dOXi))Gl(qQ}ZwP*GS90pD_fg#h}1@w=I@cZ$G)S`v=cn{QUvmh+2M5Zjz{d%2Z; z>o1q#{NJSiH6ltJ1+w!OFMYY;P5K>aKi17{#Z_8<6KBdyc_Qug=XrarF`7e<4|2f6 z^DCp}6si3kLA02a52C*P%$MWHP-`9Oa!bbR9_1U}L*56d8OWj5P5uqUOS}u_tsP<9 z@OvMku^`88F=p-|ZtqOVvkGC14RtS~B_?vQE52KG~o0~U1oMSAh;!!)#&IjMJ1rRY7bfJ8hrC)iI;>1Js9wc0Z znZfjHkWcQkcaA!Fpe9t7@)B&fRndZ1JaCf=R}#d;5exU3ME%FnUhHsGV98jLYjbJ-dJ>_h!InTpX{ItWHiIJvj2+aJ zv8A{5NE&@u#q4)aWCTV^H(p4bIzdt);Z0hztRpYrh9FURIAai4R9{nM#({7S6-U3t z;KdZio^}EQH~%&G0+k6(yJAC}F3yUsGFlH6J(4+`N`Ov>hQT$ybLGa zCVtC`mh5r$jj)c)p_gY`;aq~)F{W`~Kngb|)h&8H{12Xd>rYBTEcGJjGo^wQ@2u#O zGF?PqgsAICV+LSO1R2Cm~x3a(PE^i2RlIgf-QjbBVgbxeM+#DWwxKLk zfuc-0Yr1+80%XErF$SgHYg{PEiVpF!yWLYII^vtzJvzGWF)B=jP^9kln6wHbvgQP9 zK1(M2)73?=m=55yutNyk+_vt0a(DsD>90L$L=6AY(v&(E;>a$P&e`eQ}For3= z4dj@cb$_cAdU%+w5*-(0a~9#(SYRdW7W+PJn}WE^RwZpZDi5XFI+FQ`bVdV%>O&|- z!~ECm@}+A$-b`|_F2jFNA2jZM526p>OS8qT&I%lHqOZ!dXkFux@5#Y%p!pSm7CdsN zEfO-e`FCoP{CbWM}eVvGDyUa9?H-jxh#Ia6al9PYJX!fm5{n z$t>nx%3|Ev2KDEGkr}tMqWPf6mv961w==uz1~91(EYiq#2xMS05=H8PpLTXbT>NSd*4bbrMm1*%K7j-Vf4b9t(2oq28Dm=z6B~Gomb>sRI z=mU#?p$O;vdYHXdzkOj97rL2}^XMPK-aaSAnE0)XO)`PkSREFX@KiH2(xoBXK;TQ@ zs&(XSRuE!B0se2Zn_pUB=uGZrp{$O z9lE6%UY6kql8oV38)5HTv7VY=$O7&vqzF+p_u9PK@ zB6l^qK0m%3-=#YZ(z|~uqG}F+`6hyJt`vd^ZdrY#^#BI9`mwd5!M5=>AeLEWF_491-hOCCSpBJnn)7^^*G1qu5^}0WRtoC$>7)vEOjgHy& zdQKvp_ZOqq*9Og^U98h3*>1ldUtEtPa;*=UaahE-Si^NeC`E;zmBqoWbc1$&w)fO< zeVw?nA-IknOHR9KvlgsDsg=qIVI_6#P%89nK7lR4s7M$SX6wPDszBYqnzRbyw%n~E z1{c?FnsR<_z=^ggX=lThWD}iYtRwe6Db=TQbFB6*Pp>_iDG}>3)}RIasjJIau8vX} z(eU^R^9{e|pOc8JSvo>SEO5>oTg1xbef}45E@+HQ3&-*$z(3;)N!cwQ<-oB$qxa$rNYjh20k~szx3lRt3wu@;UFy3 zCTyKYY*BMBM|}7&DlAb=W=F(HG@NS^rNBs^ts@H%-YY$lMLr+fe(VJp=1!~O_~IXT zyfbJpUqr>K&CZ02$-2Q@Vr%$je!M7fwRwF{kW@oBh#3S>2w#T}RLUBbpu$Xc$%?kR zb#$t4m)os;fD0$+%s+?^`^@?8WQSr_^icPVuNpamK(T_XiibK5;V)BQk=(%!DN&3! z@dWXBo$AUq{YsGF`E2v|EI2Mt*{T^eo-_ZYMKBhf&`p{2RW40Y(OzI9EfaBB(M_3{ zNw1R@+8ouAl+&bZy*}j@1V)zI7ucr+MmoRTmn3iTW)EMlXgNV)GNtT5=aE$whr4JJ zT}{SLH6vP#B1gjS4_iSU89Cb5|KhdJc(-BKJC~b7?_q$dlY=(02?^v=zt4*Tb;~}s zH&`vRxrO-&R{)?>BIh?g!o_G|0~kXAfJkBkpG3 zXZp09y=dqVuHM`?WWAepq42$28CwDF{`wTzT}BV&XT@9l88|^2DB#e`0pIQ{uv5Jz z)6z{my5L{n#c)_K_}D!@VloP$Ti4BLv46D-AZCPSl~TjIEuR|s^Cw#g?dzlX%fU;F zM(@`$)N+)|q{N92a;HVJYYb;TDS-?D9UviVUHYuz#A8rj2rvD*a{xBJ`{zT|oT^8a zdIb;tiN;uDlyu|!34E;=W2u^(csUA}6Db|mGy$^hUdSbsRy+(Y=k-BVPv5u5Yw*E3gdY5r$c8kd01J=X4B6WIQ7pfq9 zLw1IqAh?hpk4@wZN`qhextF(J@kL5|6;|Q}_%XpfK{c@9SL3vIomfd7UH2$@>k4S& z!r9^mi15wC^FZKC54Ndx3Nv@vk(7C0{(`Sx`vv&KzNaBJ{7b)@Nqq&~9m~zf;0OWE z*HH>}^MwLX4)(KtY$7ll6n1z*m-@x3a1M+5?7^Gv0An&D&u)>9AqCST)N5PXlV!)w z1808Ap}T*9@w+3a_eh4{p((#SAw zReH!erX_m#w%aBcl%9dMNtB(tgzGh z!}*nX5%4B^ou1_eP#Kq0{z0o`-)%Afgz9v%v%z7P%zr7FCjE(Ppw)9JT6~v(g5jRX z*{!R;E3ihqKHICEuOgyg{S{=5Dh-}+ljArSE3|8tJI&+I*9VH_hUG01SBFWz%`+NR z&sxSL6sa2@QP@~w2GQQV7ll9Rk6*>k)+|~g)D+w~_`+!|J^D@S?3{KHINLfJC zf@c|<7L<9eC3EnuA;5z+n2(Lwbta2u%vM7-h&b>tN_2MXTp3%3N6~dKr=ioys}HCs z$`NV>9i`{Yt)XEQo5fqvUL2@YfcP-4X=etM#I?{2&rcG>1g!4*l$-xxZwwp!pyjx3 zPH8_)OeE@MMTZyGBxc;F%!@9Q))mM*G}*GmwJ#qH+UUgrhlx9a7Smfoq6X|@z+%C1 z%}5YU;HUGePpYem!asJ~fT#Vo&ipI%e-kB0QJa>1so)*6bv2=O$2dUVkhW$O2muaO z9^$Ysq&~+H7(D8e(vsydq_iNeKR5WB8BnQS!mh7Py;M$<1}04hfJa8I^^itKq%6SN z`oe`)Xb|4*ZRP-GFJi1=jtga82ar6RwgRG-%5cWtqpOk96QbZgUfYK7dqmZA z;hS@gG-%k0TdDTbXHH(oA={frz_yx7ps85qHl-!2Kt*73HjjGRm!E;9U=hZd0Q417 zmlcKA9c#a$0?33xB$d+`cu&}Uid!VtqrG}-rXwTprj*5G!ZrmNDSh~H#d-IcH$HVr z2e*hjz_xfC`=ntGtaEu1ghft%jKEoQUl>=6kapAWbyb}CIhahLk-3R(C2$ zs-jr)-Ov2pPtmxH- zRD^NF6KRK*Pv@EC3y!fs*%#K4x>)9TURCwR3HRNk=A0n-%h4tPqZA!WyyA?*t5I(- z2cV{*z;YF`qLI)hNFr1|zeWfcj_ehg77?QI+LS)1xF-2ADf2Tmp)N8lmw&(fDVl#C zx~zPSC)^=W@c_)f1=yr!$wa0nFfUP91@Kv~BvP3JB*}pweT8 zCv!RiR|5;Al;6i4;0ka(gLZ}&fNYlU6E-I}K`x0NZ>2?t^);2JYUCEqr_vMFNSKim z#jO;SPq>-2sk((iTAVH4GJnG`s8FW=j`v?uALi)R!luUg#Hl;}d^E|=104`;*QVO8 z@eB>zHBLB#5s)+g6-Ok4`Rn1p$Hg|Y-(mOY)ku-=0V=J0=8(UdU!4tY*Hw8On}Wov z!n>nHfen7_4o>+@QifdPIikl%i;gUU<@-}J^ixJrvJ`p1zBW^4vm!u;L?HYA$Fo5_ z*AhU~TAvC5FCV;X%*Bskx}H1zipab+)dZ>q6nzc4X*g`q*Yfa|m~y4xfvZe=yW&lh#=O!M;D4GU>QLG^IvisMGtIG?3FC1SLj_ z_Z2#fqWLdL)ujFhV9eh~W}z=PGMAKty3jyI{zL0nX8h$#p}Gz4RWh5AJ_i?kgne{K zuT+PC1*@2>J!V?%iCP7)SYdZC6RsI~Wp=h`G*hNtD{-p$*acMX!C98*_gSduu1YqB zh@N|U^c1sujFff&D$fN2P6gx>rw~MCoDp&IskB#a4FPwL!hbI)DK5-uP^zd*QxLVO?U_ zWfi4>>xUXIhU{a+@_0oVwwS`0>l@~5n4+8Y;}4-3j7t#AIr*YoxibR%S`!NA5WV;Q za#2sIKpL~6opPslQw#+zHa7DFy(=U`)ac@v9PW7QnZ7Gx4ukGDu3Q5!F({h&C_5M` z&ZM<(u@HHxCOigm!eZZ3rbW=8AR2QvajGpVnmlNq)UPu0u0@CWOYLI3n7^@z5sJQ) z%ZGz97<@_K7N~;yB(dJ~9k>#Z06C&)wqZnxBi=W7=aDfl?X?NAuqsxIhX4f0g5L|iz zJz`H&0E^{Ft)S#dmBe>g7%mm(2$iJI6crV2^nWqgre(loWDGgvK&!f0EkoDtA916N zS1OY+lNYdkSi{fyQBgo4FBlx4LN`ZprYeCHM3XTSD*oE|iZY55A$-&Wr;Lk>jpN9? z+!Z=q|_d=bd!ezK7rvG6$2MCm09X{%mMhrE@q;<0YZqHe8BocTkA1&)6 zQ~I)^ao=#E$iVCbL9P4fO0ZeO1)odv>??PO+fg?@PgEqg4f9hO=VNz7h@J!c2s|7Eh}Cg~he5@!a6eoUR&{$>erWQPCUlxd5g`_WG12TEz{^do1zEDMJ; zvTD;=-x$epvINUdK=&c)0>kXpidFh_I>6{cwp~T0od{0cJu1GeApDN}#mzQ)sYw&5is6)n@!vRJ4e_yM<(z28pw`eB*+lk#PMQ_CQ069!(& znKGp~ag&ADn8|W(Pn6E?)`I8GF7SyD8vPr{d)(>_!(kXdG>;Tjh-Ke9tW%yn_PJ3d zVkHWToUNUtTT+B;)(3+Dj6R$s7?CGEkVc@A)Y#aqUHfrZIOiyRoVcur&-JjS9rVq{ zdfs6tFjzHjh#W%DacBdJE$eiJ1~S7T7?xoa_(|XN?B3_O48I3|E?Lvyib?GF+CiX; zv(;+VkM-b)T z5{z?P&HAEps=Y4w1aL@^=i0P|szZ5Ffo`Cn2Tcb1g{l zD6`-9u!y(~RF)Cn6vxLF`@G>k#*e!89V04iN#{Mk-v}%_92nA;)lP^MP7R+Zmi|zX z)B(&}ezkhFi}+p{=EU)Ic_J+V40QbNd*p$&qDhEeJA(URlO|kKdqDK-C7k5*IF8A` zS4aQh`5~6aA$p$Os7VBElRF)&I;2bi$Af>bN&X>!wrl%4k2*k*8e1a!?I$~zb&=|1 z{q(FH*{J-0o==YQHxze_j`I5YH-vgiKR6n3z?9V=tHim~D{tHi?v6o(n0dbH$t>6; z>Sv?JQj>q`1&^q}tmEP8uL~j&4L9S1YqHn6$5uGU!c$O(MZzE^Z~oSOLYGk{f4=(S4vUQdu)9%wuwTZtl!y-(0uPVshuAoO zwRDYb-Rf=Hv2&M3$gf2u%#9U_Jxxh=5BBkDf<@(M79LoGdCHqV3coDFQRtfyHDfJM7Orqisy>AE1X=m8tD-4|Z zMS4E{`vm0m2_7=~!ff!AJGs;1Wo*XKMqn5l`Xu&EOVweZ$}O>moksn(0X4kk)zRyK zT-H+O$f#tur zf!t$)AbaK`bJ&v@{tcmsI9GA=Ur($f8?fO_aG?&LySdXt3&kE^flH=I>wkMP*e?fgh4og|2)J3@QXva9abrpPg`w2wu;tF%IE`EYC}7kuBx-17~w z8i}i1b?qJvIZ4w41k{%abLA_n`@HD<)W5ex_|(4IJ*Uq+8#*28EYtGNqZoN=`I?P{ zK<|RHSx~2p2>En*wP$Tx=8{}Z!@c~k349QAzsj~JbCqQ0TFpIYjtX?M)#uOLY1K{3_jzY9 zDjqd;TLZ-KmvdGa<<%R7D>WI!t^1a!!C(4G>z$AcA<(ft@pG z_Tj|O66m1^_+b^h3S6TKP$_ad<*ypmQ;2nc*O@&6<&ih(K5#N`VNqcRWw75vDSz}* z{`J77OHTTNSoRNKy(iu>cW(@^V~^_ri#R4jyT-9ijB;ReKlE$tDFV|=#11$_PNr#s zP$xkm@9bIMu=%w#PY4YNpz}(d-R}L!i{1tc1e3}Efog!j@z@GC_p_d9nwo8GX?Kbb zWQkS=hA!dU=pkURs$fN}m)CmA*XF$)etZ>2q`b3^^u!)9h_H`Aw9lC<4%)tQ1-)c* zGUmT{NqNL0yyg5ZXw88x7Ui!B-$*G}?0et9+^vJ3K#MY@X@T?(I?uQrj+mNeEJrs6 zlJ}m|sew(w-WW(i$48a20mOJW)<{HDxELEpZn1(40SfI|?O69&rOt!~M@FI9-5Aq~ zuz_?|z^v?(n%ZKpRS5wNE0}I?d}9wg!O0IvN7F;GgsfyJ?>xC8dPkPmhs8~Y5C`!< z!J3#w^li#k7=us)da58%vLH3=P%MbAc920lTriW>E;7i(|L7={&C4M_BQ$A+^W;-w zf(v@*fN{zLIPj&!_oMiV;P!10#4J7TK%IeNZkd{BKCO7NsYJR-QYIKFAj_jVRf$;y zwJodoNI3^4jEFj#raz6r4R_O$t{{SVVbvGMCZ;p4ks?WeUt6%_#GN~Uhh7>gf6Ptn zrk}WFqw2MG@Zy5(a{x{MvoBPzwjCfq(Y*HnY|`+8v`PcmjS}(AFj>ii!ZvIa$OFUO zMB+3q0fl;>3k^oI%p*&cs&XK4-BpHocf6XP&Qp~&7np)jKH8?i!tF8up^t7Df{sY2 z+r=;WmIal?E5S^l``~78YI6%B6DfTSMK|#M9&6W_Bol53jv27Pa)xaFcI^U9GX}ZS>;#^gM(9IU3^fL=Ua8Bp1SElO!$iBwR=CZAUc8gB@CKmno@v)*Y9D`*^>#XbTM zT}Xff0t>qrFH`k|AO=Usq8B3Ur z84Zc8=b#2>Fo2c&t(^HYSoWV()c#~7d_~;>>o(erT`@erx``5UQ!R9iPYkNBpU4MA z&Sm+;th1gH2N{@wJR(3}J=`zdWOb#r3*}BH85pj8OfrIllBs#ql7=#s*X#F$E1V97 z->_-$H1gDQy8||0aZu4g+`JS2sD4#(awbQYXK$6y7%!%3BhxaDElJ>2tY$|C>lMuHOc9u?nC8Ht5WM$fJJJTF<6=ISibW z4cI30+V(Jr-edHEQ=)5GzfFTu<*9VoDKwi8kB=iM3Fx7n>jYp697;Nclk|xm>lQ~W zLZ|boIlKA2rt|F@aPVI2JKLhc_P((RE))jX2@COk3i+o!663Y4TfDoBta|7ftU~l` z zy9g4)5P6eZP6f5BKT)G}YW?=tn}SB2w*RayamZoqTaitw2y5xHX{+$NW$Ua* zi!1v3`70v=_d>@RI}=fueq|7IPGIP&tr{1^NCUSO{uV>5129zFZK`k^&)Yk4W#sk7 z0XQdEC^Q;JQtlW7cN0LHH9pka<(`5Oqrd@7UioF<1Ug_H28@upisR;eQfpSUMW{vI zZ5S8ml(s%86wlcsE_(?r;NuCpmS+ekb~rmYpF2f4Nt)WppQJBgvDnYt%?F0rAeZQ5 zObR80Lg@zSydiMD{OY`q|CrY~u)y`>A9w#LSz%K|HqsmdILz81yoUjPV(DcGoROMC zs|aZw)(aXz{r?u=s+1I&_ZUb3(@q+&d>wqFhas~@fy$O>9VrYVyF=$khRgJsj1&C% zXub4G5FCO*$xSB&trr~DaXdiv8obP;O?%O-4?2;juH55BKWbcHB0&i|Ix9J}lce?w zx+2B`g8`G6%~|3JpSLX|D#>6lXsS6)AogVxqQxm;q`s`^&~n0CfN_D=xfTN)nSvS6 zBspT^cAavw?^EtLe#uyhLn!} z$6Q-%xTNf3Qnm{~$4adud59>3f|?0s$53TAy&+D3uI3ew^iQ(E-3j2l3YP;ZVHO$l z?W;1lk84_Wmj{yoX4!zVg2>YabXt`_;f$E!Z-O=#BjN;P<5=O(xv_^V7uSsVUOE^U z1IVa2dkw^t2Ro}97Yv-pCAgwBqi_hEMTjpS9cnb2=fnmVI8(;9Ef$L3u)DX1*)$?1 zVU0lpjIEU?ICR8=V|GM%oMY!FpUYTLI#H%Zp8V>v7bE?4Orq>*w63yAdUFpKm=g%h6i`I?Q=#S_y66U|sQoi&-6sQmG&GM0==;bdoK;%2r zJR!HwJ$++#@CjJS4rrLeR&0JMe{CJaL$V_&6O)9xv%NqoB>-8A0eD4*ZZB8Q%M*;V z8rjnL@`uBZoPbGw03>4X1O1;9RvU+4UU8$lw^s9O+esIU{^;;gkb)e@mZI8>_~;zz z8o~MQqQK1ru{`8#){hnfaa~N}hUr84M*R_+X>K(>E6`Qh3y|uM5)D|Lhzzs_{6Md@8!wnADH|0kLs6H2x6)Ui zQcqTq|CzMH364KMXg^GfaM zQmq2U>N(VTsG_7gKvF=%6$F0nT%wZ(#RoKXDU}`(%;XpjW4HstT!!!H0ExzkB{~V3 za(*>ibJ<4Z0N_wg>coLMXct+ryol)S*9`W5(V2AN_3)BluO+7rLnGt!r_v5%vwys2 zRHhmjo=tROCE*ATZ`$ZM>7!qm4JI67F{0}1>XS8B@dU3gwi?=z&q`5Ah#Wh{=P&>1 z-X8PNKj;nV&t?<|{zMSHiWg{6tVQ4*8ee zLYM+jeE5%QKDP&(PJ}n8DyiECs+4z_ovYwb(*BZZK8QaHib3M(x`=Z>vxvG_3$wIc z57QHxTvWzW%RJzu9YD!^Qa`lSt2iN+w=_MueB`*Uk|n{ZUtsy|yreT2f=CkIP4)PV zlMZ{2wxw=2JlkMiS$fd{4($Z%%=5+EPW~w3TG1B}ro6!<*Po&zZW1{E>81~3OFNza zb5BePh&}XTqLZ4b@mF0$LECtyRu30_HJy4C_v7w!v9rZsr&u=4!xQd)L0#FFc$(JT z(P);|_^iU{pB{s&WF%oIP-h9B?L>JgaCVzzJYA(fD^%l&Ru2z-^(&QIEA*uOF$a>m zH)0{fIug$b*t0g91Ttxv(D$Y&z)Iv7E_*}3Ja@Icn`(`4 zc1;J*k5kuKZoE|Se$a*IyM$nW#aCLRB}E$3s@ncCh4wt*e z2kH6n!IM1~o~Aj11B;6O&6^Sbk}f<0b*Vu(VHEfC%GvnODaJ@)DTV_ksy!K$Y@b5Wd>(-=wXfJ>|_)u^$()lgd)3 zW;zfY!Ft*t~7dlcaub?u&SO>A~oX#3dp#S=# zqewn_^lcav@BT7rX_9C7g5y@_vcs6-@DjWus{ZA(E6O`cV90^A$wb8_K?GIR>P)=sM`yWbd)`F8Ep9Le(gl`?kEraN%9WOO$G&xgP|~yuf9z zFT&QBmAi`O{aEvcsJ&wWVdGvUl|wfKv;niB;w4wA__(e<$nS6DBt+YpG~HSe*{tGaioIF_iYf}*Aces4$Ueo6B^RD1`LDbe&wDnfSscpO%X-KI>d?n3VPVpNP>~zKSK*pEpCu8Q ziHE{h?r?K5IzWW9+k(aNN`7dtSdlv(|H#N`cX|+F+H-~Mk}6f4x}daE?{cd?&+&8r z)q>Ey=%-;i>?pxJNiSV5DWvBV(8KqbAoDlXAG|C#0qHeg06gbuMC^q!4|2&d(L^``}Kn0#vx}$ zV`*&(y=k4UQ5&O`Hbv}2dE5obaX&Gi&*`gl1dEX^>FBqp zZ+in{O_|epfsAZr2U9`&&m6dG2t2m(Vstt}5WjMAp9Sq)Fif&Ir*eFY3%?8QK7n6MXO1uJiK%A%|F8dF8vna7 zcv=L~9TZ=TrXOxx(FEp45U>MRg>C%#6a|?wpN@)H+Xm8ZGMci3d%p_+bBKnYd;se> z?iFJ%ZJht5Nb6WbxNGyN(Qf(upI4nT83D-SqIbv6j=Qx>@6P^*i_uX-_6cID?RE~x zn78V0FGde?Cw)j&H8eHj`Cq8e&gN!rtw)M%L=Qc%b~b<^HV_Z6qR>eYSTi*j@q}z^ z;^Gnv2h);kvj0a(+%3Kk(#`B}C-59@mzJT0@20W~^@u)S=7Go#8y=X)H28R5o^+;w9!OZ1djGiSP;LL!_OZ!-FUjmH) zcS(y&HGm0lC3-vNE&2zZ`yyiD5>w{RC?m*{oqIQ}AR&HDLJ0S?o%(LD6Q4l~P|Qhc zL@-z#lNbHB*k)AGg4`QAi8vma?CRTimGJVgbk9ngrgKlzKPtz(t-eIjBo~A5)P_aJW_HoIs)ebg0<*ko_hcrob{`FWupyuYd3M! zVDI{W0uo{33+Ur?{iRtaDRrV!mKoq9>|@tiuC##VN5>0n@B+(p1W)J){O?gNEZM&y zMZ-t*_f2@R(C@$hPdDO^uYCSrZ_^)%8oW7Wu>0tC0{EW+APBwN*Nd;+d;C8DI&9oO literal 0 HcmV?d00001 diff --git a/doc/objective-bayesian-inference.md b/doc/objective-bayesian-inference.md new file mode 100644 index 0000000..36ad573 --- /dev/null +++ b/doc/objective-bayesian-inference.md @@ -0,0 +1,815 @@ +# Introduction to Objective Bayesian Inference +There's a common misconception that Bayesian inference is primarily a subjective approach to statistics and +frequentist inference provides objectivity. While Bayesian statistics can certainly be used to incorporate +subjective knowledge, it also provides powerful methods to do objective analysis. Moreover, objective Bayesian +inference probably has a stronger argument for objectivity than many frequentist methods such as P-values +([[13](https://si.biostat.washington.edu/sites/default/files/modules/BergerBerry.pdf)]). + +This blog post provides a brief introduction to objective Bayesian inference. It discusses the +history of objective Bayesian inference from inverse probability up to modern reference priors, how +metrics such as frequentist matching coverage provide a way to quantify what it means for a prior to be objective, +and how to build objective priors. Finally, it revists a few classical problems studied by Bayes and Laplace to +show how they might be solved with a more modern approach to objective Bayesian inference. + +1. [History](#history) +2. [Priors and Frequentist Matching](#priors-and-frequentist-matching) + * [Example 1: A Normal Distribution with Unknown Mean](#example-1-a-normal-distribution-with-unknown-mean) + * [Example 2: A Normal Distribution with Unknown Variance](#example-2-a-normal-distribution-with-unknown-variance) +3. [The Binomial Distribution Prior](#the-binomial-distribution-prior) +4. [Applications from Bayes and Laplace](#applications-from-bayes-and-laplace) + * [Example 3: Observing only 1s](#example-3-observing-only-1s) + * [Example 4: A Lottery](#example-4-a-lottery) + * [Example 5: Birth Rates](#example-5-birth-rates) +5. [Discussion](#discussion) +6. [Conclusion](#conclusion) +# History +In 1654 Pascal and Fermat worked together to solve the [problem of the points](https://en.wikipedia.org/wiki/Problem_of_points) +and in so doing developed an early theory for deductive reasoning with direct probabilities. Thirty years later, Jacob +Bernoulli worked to extend probability theory to solve inductive problems. +He recognized that unlike in games of chance, it was futile to +*a priori* enumerate possible cases and find out "how much more easily can some occur than the others": +> But, who from among the mortals will be able to determine, for example, the +> number of diseases, that is, the same number of cases which at each age +> invade the innumerable parts of the human body and can bring about our +> death; and how much easier one disease (for example, the plague) can +> kill a man than another one (for example, rabies; or, the rabies than +> fever), so that we would be able to conjecture about the future state +> of life or death? And who will count the innumerable cases of changes +> to which the air is subjected each day so as to form a conjecture about +> its state in a month, to say nothing about a year? Again, who knows the +> nature of the human mind or the admirable fabric of our body shrewdly +> enough for daring to determine the cases in which one or another +> participant can gain victory or be ruined in games completely or partly +> depending on acumen or agility of body? +[[1](http://www.sheynin.de/download/bernoulli.pdf), p. 18] + +The way forward, he reasoned, was to determine probabilities *a posteriori* +> Here, however, another way for attaining the desired is really opening +> for us. And, what we are not given to derive a priori, we at least can +> obtain a posteriori, that is, can extract it from a repeated +> observation of the results of similar examples. +[[1](http://www.sheynin.de/download/bernoulli.pdf), p. 18] + +To establish the validity of the approach, Bernoulli proved a version of the law of large numbers for the binomial +distribution. Let $X_n$ represent a sample from a Bernoulli distribution with parameter $r/t$ ($r$ and $t$ integers). Then +if $c$ represents some positive integer, Bernoulli showed that for $N$ large enough +```math + P\left(|\frac{}{N} - \frac{r}{t} | < \frac{2}{t}\right) + > + c \cdot P\left(|\frac{X_1 + \cdots + X_N}{N} - \frac{r}{t} | > \frac{2}{t}\right). +``` +In other words, *the probability the sampled ratio from a binomial distribution is contained within the bounds (r−1)/t to (r+1)/t is at least c times more likely than the the probability it is outside the bounds*. Thus, by taking enough samples, “we determine the [parameter] a posteriori almost as though it was known to us a prior”. + +Bernoulli, additionally, derived lower bounds, given $r$ and $t$, for how many samples would be needed to achieve +a desired levels of accuracy. For example, if $r=30$ and $t=50$, he showed +> having made 25550 experiments, it will be more than a thousand times more +> likely that the ratio of the number of obtained fertile observations to +> their total number is contained within the limits 31/50 and 29/50 +> rather than beyond them [[1](http://www.sheynin.de/download/bernoulli.pdf), p. 30] + +This suggested an approach to inference, but it came up short in several respects. 1) The bounds derived were +conditional on knowledge of the true parameter. It didn't provide a way to quantify uncertainty +when the parameter was unknown. And 2) the number of experiments required to reach a high level of confidence in an +estimate, *moral certainty* in Bernoulli's words, was quite large, limiting the approach's practicality. +Abraham de Moivre would later improve on +Bernoulli's work in his highly popular textbook *The Doctrine of Chances*. +He derive considerably tighter bounds, but again failed to provide a way to quantify uncertainty when +the binomial distribution's parameter was unknown, offering only this qualitative guidance +> if after taking a great number of Experiments, it should be perceived that the +> happenings and failings have been nearly in a certain proportion, such +> as of 2 to 1, it may safely be concluded that the Probabilities of +> happening or failing at any one time assigned will be very near that +> proportion, and that the greater the number of Experiments has been, so +> much nearer the Truth will the conjectures be that are derived from +> them. +[[2](https://www.ime.usp.br/~walterfm/cursos/mac5796/DoctrineOfChances.pdf), p. 242] +
+
+ +INSPIRED BY DE MOIVRE'S book, Thomas Bayes took up the problem of inference with the +binomial distribution. He reframed the goal to +> Given the number of times in which an unknown event has happened and failed: Required the chance +> that the probability of its happening in a single trial lies somewhere between any two degrees of +> probability that can be named. +[[3](https://web.archive.org/web/20110410085940/http://www.stat.ucla.edu/history/essay.pdf), p. 4] + +Recognizing that a solution would depend on prior probability, Bayes sought to +give an answer for +> the case of an event concerning the probability of which we absolutely know nothing antecedently +> to any trials made concerning it +[[3](https://web.archive.org/web/20110410085940/http://www.stat.ucla.edu/history/essay.pdf), p. 11] + +He reasoned that knowing nothing was equivalent to a uniform prior distribution [4, p. 184-188]. +Using the +uniform prior and a geometric analogy with balls, Bayes succeeded in approximating integrals of posterior +distributions of the form +```math + \frac{\Gamma(n+2)}{\Gamma(y+1) \Gamma(n - y + 1)} \int_a^b \theta^y (1 - \theta)^{n - y} d\theta +``` +and was able to answer questions like "if I observe $y$ success and $n - y$ failures from a binomial +distribution with unknown parameter $\theta$, what is the probability that $\theta$ is between $a$ and $b$?". + +Despite Bayes' success answering inferential questions, his method was not widely adopted +and his work, published posthumously in 1763, remained obscure up until De Morgan renewed attention +to it over fifty years later. A major obstacle was Bayes' geometric treatment of integration; as +mathematical historian Stephen Stigler writes, +> Bayes essay 'Towards solving a problem in the doctrine of chances' is extremely +> difficult to read today--even when we know what to look for. [4, p. 179] +
+
+ +A DECADE AFTER Bayes' death and likely unaware of his discoveries, Laplace pursued similar problems +and independently arrive at the same approach. Laplace revisited the famous problem of the points, but this time considered +the case of a skilled game where the probability of a player winning a round was modeled by a Bernoulli +distribution with unknown parameter $p$. Like Bayes, Laplace assumed +a uniform prior, noting only +> because the probability that A will win a point is unknown, we may +> suppose it to be any unspecified number whatever between $0$ and $1$. [[5](https://www.york.ac.uk/depts/maths/histstat/memoir1774.pdf)] + +Unlike Bayes, though, Laplace did not use a geometric approach. He approached the problems with a much more developed +analytical toolbox and was able to derive more usable formulas with integrals and clearer notation. + +Following Laplace and up until the early 20th century, +using a uniform prior together with Bayes' theorem became a popular approach to statistical inference. +In 1837, De Morgan introduced the term *inverse probability* +to refer to such methods and acknowledged Bayes' earlier work +> De Moivre, nevertheless, did not discover the inverse method. This was +> first used by the Rev. T. Bayes, in Phil. Trans. liii. 370.; and the author, +> though now almost forgotten, deserves the most honourable rememberance from +> all who read the history of this science. [[6](https://archive.org/details/134257988/page/n7/mode/2up)] +
+
+ +IN THE EARLY 20TH CENTURY, inverse probability came under serious attack for its use of a uniform prior. +Ronald Fisher, one of the fiercest critics, wrote +> I know only one case in mathematics of a doctrine which has +> been accepted and developed by the most eminent men of their +> time, and is now perhaps accepted by men now living, which at the +> same time has appeared to a succession of sound writers to be +> fundamentally false and devoid of foundation. Yet that is quite +> exactly the position in respect of inverse probability +[[7](https://errorstatistics.com/wp-content/uploads/2016/02/fisher-1930-inverse-probability.pdf)] + +Fisher criticized inverse probability as "extremely arbitrary". Reviewing +Bayes' essay, he pointed out how naive use of a uniform prior leads to solutions that depend on +the scale used to measure probability. He gave a concrete example [[8](http://www.medicine.mcgill.ca/epidemiology/hanley/bios601/Likelihood/Fisher1922.pdf)]: Let $p$ denote the unknown parameter for a +binomial distribution. Suppose that instead of $p$ we parameterize by +```math + \theta = \arcsin \left(2p-1\right), \quad -\frac{\pi}{2} \le \theta \le \frac{\pi}{2}, +``` +and apply the uniform prior. Then the probability that $\theta$ is between $a$ and $b$ after observing +$S$ successes and $F$ failures is +```math + \frac{1}{\pi} \int_a^b \left(\frac{\sin \theta + 1}{2}\right)^S \left(\frac{1 - \sin \theta}{2}\right)^F d\theta. +``` +A change of variables back to $p$ shows us this is equivalent to +```math + \frac{1}{\pi} \int_{(\sin a + 1)/2}^{(\sin b + 1)/2} \left(p\right)^{S-1/2} \left(1 - p\right)^{F-1/2} dp. +``` +Hence, the uniform prior in $\theta$ is equivalent to the prior $\frac{1}{\pi}p^{-1/2} (1-p)^{-1/2}$ in $p$. As +an alternative to inverse probability, Fisher promoted maximum likelihood methods, p-values, and a frequentist +definition for probability. +
+
+ +WHILE FISHER and others advocated for abandoning inverse probability in favor of frequentist methods, +Harold Jeffreys worked to put inverse probability on a firmer foundation. He acknowledged previous +approaches to inverse probability had lacked consistency, but he agreed with their goal of delivering +statistical results in terms of degree of belief and thought frequentist definitions of probability +to be hopelessly flawed: +> frequentist definitions themselves lead to no results of the kind that we need until the notion of +> reasonable degree of belief is reintroduced, and that since the whole purpose of these definitions +> is to avoid this notion they necessarily fail in their object. [[9](https://archive.org/details/in.ernet.dli.2015.2608), p. 34] + +Jeffreys pointed out that inverse probability needn't be tied to the uniform prior: +> There is no more need for [the idea that the uniform distribution of the prior probability was a necessary +> part of the principle of inverse probability] than there is to say that an oven that has once cooked +> roast beef can never cook anything but roast beef. [[9](https://archive.org/details/in.ernet.dli.2015.2608), p. 103] + +Seeking to achieve results that would be consistent under reparameterizations, Jeffreys proposed priors +based on the Fisher information matrix, +```math + \begin{align*} + \pi(\boldsymbol{\theta}) &\propto \left[\det{\mathcal{I}(\boldsymbol{\theta})}\right]^{1/2} \\ + \mathcal{I}(\boldsymbol{\theta})_{st} &= \mathbb{E}_{\boldsymbol{y}}\left\{ + \left(\frac{\partial}{\partial \theta_s} \log P(\boldsymbol{y}\mid\boldsymbol{\theta})\right) + \left(\frac{\partial}{\partial \theta_t} \log P(\boldsymbol{y}\mid\boldsymbol{\theta})\right) + \mid\boldsymbol{\theta} + \right\}, + \end{align*} +``` +writing +> If we took the prior probability density for the parameters to be proportional to +> [(det I(θ))^(1/2)$] +> ... any arbitrariness in the choice of the parameters could make no difference to the results, +> and it is proved that for this wide class of laws a consistent theory of probability can be +> constructed. [[9](https://archive.org/details/in.ernet.dli.2015.2608), p. 159] + +Twenty years later, Welch and Peers investigated priors from a different perspective ([[10](https://academic.oup.com/jrsssb/article-abstract/25/2/318/7035245?redirectedFrom=PDF)]). +They analyzed one-tailed credible sets from posterior distributions and asked how closely probability +mass coverage matched frequentist coverage. They found that for the case of a single parameter +the prior Jeffreys proposed was asymptotically optimal, providing further justification for the +prior that aligned with how intuition suggests we might quantify Bayes criterion of +"knowing absolutely nothing". +> Note: Deriving good priors in the multi-parameter case is considerably more involved. Jeffreys himself +> was dissatisfied with the prior his rule produced for multi-parameter models and proposed an +> alternative known as *Jeffreys independent prior* but never developed a rigorous approach. +> José-Miguel Bernardo and James Berger would later develop *reference priors* as a refinement of +> Jeffreys prior. Reference priors provide a general mechanism to produce good priors that works for +> multi-parameter models and cases +> where the Fisher information matrix doesn't exist. See [[11](https://projecteuclid.org/journals/annals-of-statistics/volume-37/issue-2/The-formal-definition-of-reference-priors/10.1214/07-AOS587.full)] and +> [[12](https://www.amazon.com/Objective-Bayesian-Inference-James-Berger/dp/9811284903), part 3]. +
+
+ +IN AN UNFORTUNATE turn of events, mainstream statistics mostly ignored Jeffreys approach +to inverse probability to chase a mirage of objectivity that frequentist methods seemed to provide. +> Note: Development of inverse probability in the direction Jeffreys outlined +> continued under the name objective Bayesian analysis; however, it hardly +> occupies the center stage of statistics, and many people mistakenly think of +> Bayesian analysis as more of a subjective theory. +> +> See [[13](https://si.biostat.washington.edu/sites/default/files/modules/BergerBerry.pdf)] for background on +> why the objectivity that many perceive frequentist methods to have is largely +> false. + + +But much as Jeffreys had anticipated with his criticism that frequentist +definitions of probability couldn’t provide “results of the kind that we need”, +a majority of practitioners filled in the blank by misinterpreting frequentist +results as providing belief probabilities. Goodman coined the term *P value +fallacy* to refer to this common error and described just how prevalent it is + +> In my experience teaching many academic physicians, when physicians are +> presented with a single-sentence summary of a study that produced a surprising +> result with P = 0.05, the overwhelming majority will confidently state that +> there is a 95% or greater chance that the null hypothesis is incorrect. [[14](https://pubmed.ncbi.nlm.nih.gov/10383371/)] + +James Berger and Thomas Sellke established theoretical and simulation results +that show how *spectacularly wrong* this notion is + +> it is shown that actual evidence against a null (as measured, say, by posterior +> probability or comparative likelihood) can differ by an order of magnitude from +> the P value. For instance, data that yield a P value of .05, when testing a +> normal mean, result in a posterior probability of the null of at least .30 for +> any objective prior distribution. [[15](https://www2.stat.duke.edu/courses/Spring07/sta215/lec/BvP/BergSell1987.pdf)] + +They concluded + +> for testing “precise” hypotheses, p values should not be used directly, because +> they are too easily misinterpreted. The standard approach in teaching–of +> stressing the formal definition of a p value while warning against its +> misinterpretation–has simply been an abysmal failure. [[16](https://hannig.cloudapps.unc.edu/STOR654/handouts/SellkeBayarriBerger2001.pdf)] + +In this post, we’ll look closer at how priors for objective Bayesian analysis +can be justified by matching coverage; and we’ll reexamine the problems Bayes +and Laplace studied to see how they might be approached with a more modern +methodology. + +# Priors and Frequentist Matching +The idea of matching priors intuitively aligns with how we might think about +probability in the absence of prior knowledge. We can think of the frequentist +coverage matching metric as a way to provide an answer to the question “How +accurate are the Bayesian credible sets produced with a given prior?”. + +> Note: For more background on frequentist coverage matching and its relation to +> objective Bayesian analysis, see [[17](https://www.uv.es/~bernardo/OBayes.pdf)] and [[12](https://www.amazon.com/Objective-Bayesian-Inference-James-Berger/dp/9811284903), ch. 5]. +
+
+ +CONSIDER A PROBABILITY model with a single parameter $\theta$. If we're given a prior, +$\pi(\theta)$, how do we test if the prior reasonably expresses +Bayes' requirement of knowing nothing? Let's pick a size $n$, a value $\theta_\textrm{true}$, and +randomly sample observations $\boldsymbol{y}=(y_1$, $\ldots$, $y_n)^\top$ from the distribution +$P(\cdot\vert\theta_\textrm{true})$. Then let's compute the two-tailed credible set $[\theta_a, \theta_b]$ +that contains 95% of the probability mass of the posterior, +```math + \pi(\theta\mid\boldsymbol{y}) \propto P(\boldsymbol{y}\mid \theta) \times \pi(\theta), +``` +and record whether or not the credible set contains $\theta_\textrm{true}$. Now +suppose we repeat the experiment many times and vary $n$ and $\theta_\textrm{true}$. If $\pi(θ)$ is +a good prior, then the fraction of trials where $\theta_\textrm{true}$ is contained within the +credible set will consistently be close to 95%. + +Here’s how we might express this experiment as an algorithm: +``` +function coverage-test(n, θ_true, α): + cnt ← 0 + N ← a large number + for i ← 1 to N do + y ← sample from P(·|θ_true) + t ← integrate_{-∞}^θ_true π(θ | y)dθ + if (1 - α)/2 < t < 1 - (1 - α)/2: + cnt ← cnt + 1 + end if + end for + return cnt / N +``` +## Example 1: a normal distribution with unknown mean +Suppose we observe n normally distributed values, y, with variance 1 and unknown mean, μ. Let’s consider the prior +```math +\pi(\mu) \propto 1 +``` +> Note: In this case Jeffreys prior and the constant prior in μ are the same. + +Then +```math +\begin{align*} + P(\boldsymbol{y}\mid\mu) + &\propto \exp\left\{-\frac{1}{2}\left(\boldsymbol{y} - \mu \boldsymbol{1}\right)'\left(\boldsymbol{y} - \mu \boldsymbol{1}\right)\right\} \\ + &\propto \exp\left\{-\frac{1}{2}\left(n \mu^2 - 2 \mu n \bar{y}\right)\right\} \\ + &\propto \exp\left\{-\frac{n}{2}\left(\mu - \bar{y}\right)^2\right\}. +\end{align*} +``` +Thus, +```math + \int_{-\infty}^t \pi(\mu\mid\boldsymbol{y})d\mu = + \frac{1}{2} \left[1 + \textrm{erf}\left(\frac{t - \bar{y}}{\sqrt{2/n}}\right)\right]. +``` +I ran a 95% coverage test with 10000 trials and various values of μ and n. As +the table below shows, the results are all close to 95%, indicating the +constant prior is a good choice in this case. [Source code for example](https://github.com/rnburn/bbai/blob/master/example/09-coverage-simulations.ipynb). + +| $\mu_\textrm{true}$ | n=5 | n=10 | n=20 | +| ---- | -------- | -------- | ------- | +| 0.1 | 0.9502 | 0.9486 | 0.9485 | +| 0.5 | 0.9519 | 0.9478 | 0.9487 | +| 1.0 | 0.9516 | 0.9495 | 0.9519 | +| 2.0 | 0.9514 | 0.9521 | 0.9512 | +| 5.0 | 0.9489 | 0.9455 | 0.9497 | + +## Example 2: a normal distribution with unknown variance +Now suppose we observe n normally distributed values, y, with unknown variance +and zero mean, μ. Let’s test the constant prior and Jeffreys prior, +```math + \pi_C(\sigma^2) \propto 1 \quad\textrm{and}\quad\pi_J(\sigma^2)\propto\frac{1}{\sigma^2}. +``` +We have +```math + P(\boldsymbol{y}\mid\sigma^2) \propto \left(\frac{1}{\sigma^2}\right)^{n/2} \exp\left\{-\frac{n s^2}{2 \sigma^2}\right\} +``` +where $s^2 = \frac{\boldsymbol{y}' \boldsymbol{y}}{n}$. Put $u = \frac{n s^2}{2 \sigma^2}$. Then +```math + \begin{align*} + \int_0^t + \left(\frac{1}{\sigma^2}\right)^{n/2} \exp\left\{-\frac{n s^2}{2 \sigma^2}\right\} d\sigma^2 + &\propto \int_{\frac{n s^2}{2 t}}^\infty u^{n/2-2} \exp\left\{-u\right\} du \\ + &= \Gamma(\frac{n - 2}{2}, \frac{n s^2}{2 t}). + \end{align*} +``` +Thus, +```math + \int_0^t \pi_C(\sigma^2\mid\boldsymbol{y}) d\sigma^2 = + \frac{1}{\Gamma(\frac{n - 2}{2})} + \Gamma(\frac{n - 2}{2}, \frac{n s^2}{2 t}). +``` +Similarly, +```math + \int_0^t \pi_J(\sigma^2\mid\boldsymbol{y}) d\sigma^2 = + \frac{1}{\Gamma(\frac{n}{2})} + \Gamma(\frac{n}{2}, \frac{n s^2}{2 t}). +``` +The table below shows the results for a 95% coverage test with the constant +prior. We can see that coverage is notably less than 95% for smaller values of +n. +| sigma_true^2 | n = 5 | n = 10 | n = 20 | +| ------------- | -------- | -------- | ------- | +| 0.1 | 0.9014 | 0.9288 | 0.9445 | +| 0.5 | 0.9035 | 0.9309 | 0.9429 | +| 1.0 | 0.9048 | 0.9303 | 0.9417 | +| 2.0 | 0.9079 | 0.9331 | 0.9418 | +| 5.0 | 0.9023 | 0.9295 | 0.9433 | + +In comparison, coverage is consistently close to 95% for all values of n if we +use Jeffreys prior. [Source code for example](https://github.com/rnburn/bbai/blob/master/example/09-coverage-simulations.ipynb). +| sigma_true^2 | n = 5 | n = 10 | n = 20 | +| ------------- | -------- | -------- | ------- | +| 0.1 | 0.9516 | 0.9503 | 0.9533 | +| 0.5 | 0.9501 | 0.9490 | 0.9537 | +| 1.0 | 0.9505 | 0.9511 | 0.9519 | +| 2.0 | 0.9480 | 0.9514 | 0.9498 | +| 5.0 | 0.9506 | 0.9497 | 0.9507 | + +# The Binomial Distribution Prior +Let’s apply Jeffreys approach to inverse probability to the binomial distribution. + +Suppose we observe n values from the binomial distribution. Let y denote the +number of successes and θ denote the probability of success. The likelihood +function is given by +```math + L(\theta; y) \propto \theta^y (1 - \theta)^{n - y}. +``` +Taking the log and differentiating, we have +```math +\begin{align*} + \frac{\partial}{\partial \theta} \log L(\theta;y) &= \frac{y}{\theta} - \frac{n - y}{1 - \theta} \\ + &= \frac{y - n \theta}{\theta(1 - \theta)}. +\end{align*} +``` +Thus, the Fisher information matrix for the binomial distribution is +```math +\begin{align*} + \mathcal{I}(\theta) &= \mathbb{E}_{y} \left\{ + \left(\frac{\partial}{\partial\theta} \log L(\theta;y) \right)^2\mid\theta\right\} \\ + &= \mathbb{E}_{y}\left\{\left(\frac{y - n \theta}{\theta(1 - \theta)}\right)^2\mid\theta\right\} \\ + &= \frac{n \theta (1 - \theta)}{\theta^2 (1 - \theta)^2} \\ + &= \frac{n}{\theta (1 - \theta)}, +\end{align*} +``` +and Jeffreys prior is +```math +\begin{align*} + \pi(\theta) &\propto \mathcal{I}(\theta)^{1/2} \\ + &\propto \theta^{-1/2} (1 - \theta)^{-1/2}. +\end{align*} +``` + +![Jeffreys prior and Laplace prior for the binomial model](./binomial-priors.png) + +The posterior is then +```math + \pi(\theta\mid y) \propto \theta^{y-1/2} (1 - \theta)^{n - y - 1/2}, +``` +which we can recognize as the [beta distribution](https://en.wikipedia.org/wiki/Beta_distribution) with parameters y+1/2 and n-y+1/2. + +To test frequentist coverages, we can use an exact algorithm. +``` +function binomial-coverage-test(n, θ_true, α): + cov ← 0 + for y ← 0 to n do + t ← integrate_0^θ_true π(θ | y)dθ + if (1 - α)/2 < t < 1 - (1 - α)/2: + cov ← cov + binomial_coefficient(n, y) * θ_true^y * (1 - θ_true)^(n-y) + end if + end for + return cov +``` +Here are the coverage results for α=0.95 and various values of p and n using the Bayes-Laplace uniform prior: +| θ_true | n = 5 | n = 10 | n = 20 | n = 100 | +| ----------- | -------- | -------- | -------- | -------- | +| 0.0001 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | +| 0.0010 | 0.0000 | 0.0000 | 0.0000 | 0.9048 | +| 0.0100 | 0.9510 | 0.9044 | 0.8179 | 0.9206 | +| 0.1000 | 0.9185 | 0.9298 | 0.9568 | 0.9364 | +| 0.2500 | 0.9844 | 0.9803 | 0.9348 | 0.9513 | +| 0.5000 | 0.9375 | 0.9785 | 0.9586 | 0.9431 | + +and here are the coverage results using Jeffreys prior: + +| θ_true | n = 5 | n = 10 | n = 20 | n = 100 | +| ----------- | -------- | -------- | -------- | -------- | +| 0.0001 | 0.9995 | 0.9990 | 0.9980 | 0.9900 | +| 0.0010 | 0.9950 | 0.9900 | 0.9802 | 0.9048 | +| 0.0100 | 0.9510 | 0.9044 | 0.9831 | 0.9816 | +| 0.1000 | 0.9914 | 0.9872 | 0.9568 | 0.9557 | +| 0.2500 | 0.9844 | 0.9240 | 0.9348 | 0.9513 | +| 0.5000 | 0.9375 | 0.9785 | 0.9586 | 0.9431 | + +We can see coverage is identical for many table entries. For smaller values of +n and p_true, though, the uniform prior gives no coverage while Jeffreys prior +provides decent results. [source code for experiment](https://github.com/rnburn/bbai/blob/master/example/15-binomial-coverage.ipynb) + +# Applications from Bayes and Laplace +Let’s now revisit some applications Bayes and Laplace studied. Given that the +goal in all of these problems is to assign a belief probability to an interval +of the parameter space, I think that we can make a strong argument that +Jeffreys prior is a better choice than the uniform prior since it has +asymptotically optimal frequentist coverage performance. This also addresses +Fisher’s criticism of arbitrariness. + +> Note: See [12, p. 105–106] for a more through discussion of the uniform prior vs Jeffreys prior for the binomial distribution + +In each of these problems, I’ll show both the answer given by Jeffreys prior +and the original uniform prior that Bayes and Laplace used. One theme we’ll see +is that many of the results are not that different. A lot of fuss is often made +over minor differences in how objective priors can be derived. The differences +can be important, but often the data dominates and different reasonable choices +will lead to nearly the same result. + +## Example 3: Observing Only 1s +In an appendix Richard Price added to Bayes’ essay, he considers the following problem: +> Let us then first suppose, of such an event as that called M in the essay, or +> an event about the probability of which, antecedently to trials, we know +> nothing, that it has happened once, and that it is enquired what conclusion we +> may draw from hence with respct to the probability of it’s happening on a +> second trial. [[3](https://web.archive.org/web/20110410085940/http://www.stat.ucla.edu/history/essay.pdf), p. 16] + +Specifically, Price asks, “what’s the probability that θ is greater than 1/2?” +Using the uniform prior in Bayes’ essay, we derive the posterior distribution +```math + \pi_\textrm{B}(\theta\mid y) = 2 x. +``` + +Integrating gives us the answer +```math + \int_{\frac{1}{2}}^1 2x dx = \left. x^2 \right\rvert_{\frac{1}{2}}^1 = \frac{3}{4}. +``` +Using Jeffreys prior, we derive a beta distribution for the posterior +```math + \pi_J(\theta\mid y) = \frac{\Gamma(2)}{\Gamma(3/2) \Gamma(1/2)} \theta^{1/2} (1 - \theta)^{-1/2}, +``` + +and the answer +```math + \int_{\frac{1}{2}}^1 \pi_J(\theta\mid y) dx = \frac{1}{2} + \frac{1}{\pi} \approx 0.818 +``` + +Price then continues with the same problem but supposes we see two 1s, three +1s, etc. The table below shows the result we’d get up to ten 1s. [source code](https://github.com/rnburn/bbai/blob/master/example/16-bayes-examples.ipynb) + +| 1s observed | p > 0.5 (Bayes) | p > 0.5 (Jeffreys) | +| ------------ | ---------------- | ------------------- | +| 1 | 0.7500 | 0.8183 | +| 2 | 0.8750 | 0.9244 | +| 3 | 0.9375 | 0.9669 | +| 4 | 0.9688 | 0.9850 | +| 5 | 0.9844 | 0.9931 | +| 6 | 0.9922 | 0.9968 | +| 7 | 0.9961 | 0.9985 | +| 8 | 0.9980 | 0.9993 | +| 9 | 0.9990 | 0.9997 | +| 10 | 0.9995 | 0.9998 | + +## Example 4: A Lottery +Price also considers a lottery with an unknown chance of winning: + +> Let us then imagine a person present at the drawing of a lottery, who knows +> nothing of its scheme or of the proportion of Blanks to Prizes in it. Let it +> further be supposed, that he is obliged to infer this from the number of blanks +> he hears drawn compared with the number of prizes; and that it is enquired what +> conclusions in these circumstances he may reasonably make. [3, p. 19–20] + +He asks this specific question: + +> Let him first hear ten blanks drawn and one prize, and let it be enquired what +> chance he will have for being right if he gussses that the proportion of blanks +> to prizes in the lottery lies somewhere between the proportions of 9 to 1 and +> 11 to 1. [[3](https://web.archive.org/web/20110410085940/http://www.stat.ucla.edu/history/essay.pdf), p. 20] + +With Bayes prior and θ representing the probability of drawing a blank, we derive the posterior distribution +```math + \pi_B(\theta\mid y) = \frac{\Gamma(13)}{\Gamma(11) \Gamma(2)} \theta^{10} (1 - \theta)^1, +``` +and the answer +```math + \int_{\frac{9}{10}}^{\frac{11}{12}} \pi_B(\theta\mid y) d\theta \approx 0.0770. +``` +Using Jeffreys prior, we get the posterior +```math + \pi_J(\theta\mid y) = \frac{\Gamma(12)}{\Gamma(21/2) \Gamma(3/2)} \theta^{19/2} (1 - \theta)^{1/2} +``` +and the answer +```math + \int_{\frac{9}{10}}^{\frac{11}{12}} \pi_J(\theta\mid y) d\theta \approx 0.0804. +``` + +Price then considers the same question (what’s the probability that θ lies +between 9/10 and 11/12) for different cases where an observer of the lottery +sees w prizes and 10w blanks. Below I show posterior probabilities using both +Bayes’ uniform prior and Jeffreys prior for various values of w. [source code](https://github.com/rnburn/bbai/blob/master/example/16-bayes-examples.ipynb) +| Blanks | Prizes | 9/10 < p < 11/12 (Bayes) | 9/10 < p < 11/12 (Jeffreys) | +| ------- | -------- | -------------------------- | ----------------------------- | +| 10 | 1 | 0.0770 | 0.0804 | +| 20 | 2 | 0.1084 | 0.1107 | +| 40 | 4 | 0.1527 | 0.1541 | +| 100 | 10 | 0.2390 | 0.2395 | +| 1000 | 100 | 0.6628 | 0.6618 | + +## Example 5: Birth Rates +Let’s now turn to a problem that fascinated Laplace and his contemporaries: The +relative birth rate of boys-to-girls. Laplace introduces the problem, + +> The consideration of the [influence of past events on the probability of future +> events] leads me to speak of births: as this matter is one of the most +> interesting in which we are able to apply the Calculus of probabilities, I +> manage so to treat with all care owing to its importance, by determining what +> is, in this case, the influence of the observed events on those which must take +> place, and how, by its multiplying, they uncover for us the true ratio of the +> possibilities of the births of a boy and of a girl. [[18](http://www.probabilityandfinance.com/pulskamp/Laplace/memoir_probabilities.pdf), p. 1] + +Like Bayes, Laplace approaches the problem using a uniform prior, writing + +> When we have nothing given a priori on the possibility of an event, it is +> necessary to assume all the possibilities, from zero to unity, equally +> probable; thus, observation can alone instruct us on the ratio of the births of +> boys and of girls, we must, considering the thing only in itself and setting +> aside the events, to assume the law of possibility of the births of a boy or of +> a girl constant from zero to unity, and to start from this hypothesis into the +> different problems that we can propose on this object. [[18](http://www.probabilityandfinance.com/pulskamp/Laplace/memoir_probabilities.pdf), p. 26] + +Using data collection from Paris between 1745 and 1770, where 251527 boys and +241945 girls had been born, Laplace asks, what is “the probability that the +possibility of the birth of a boy is equal or less than 1/2“? + +With a uniform prior, B = 251527, G = 241945, and θ representing the probability that a boy is born, we obtain the posterior +```math + \pi_L(\theta\mid y) = \frac{\Gamma(B + G + 2)}{\Gamma(B + 1) \Gamma(G + 1)} + \theta^B (1 - \theta)^G +``` +and the answer +```math + \int_0^{1/2} \pi_L(\theta\mid y) d\theta \approx 1.1460\times 10^{-42}. +``` +With Jeffreys prior, we similarly derive the posterior +```math + \pi_J(\theta\mid y) = \frac{\Gamma(B + G + 1)}{\Gamma(B + 1/2) \Gamma(G + 1/2)} + \theta^{B -1/2} (1 - \theta)^{G-1/2} +``` +and the answer +```math + \int_0^{1/2} \pi_J(\theta\mid y) d\theta \approx 1.1458\times 10^{-42}. +``` +Here’s some simulated data using p_true = B / (B + G) that shows how the +answers might evolve as more births are observed. +| Boys | Girls | p < 0.5 (Laplace) | p < 0.5 (Jeffreys) | +| ------ | ------- | ------------------ | ------------------- | +| 0 | 0 | 0.5000 | 0.5000 | +| 749 | 751 | 0.5206 | 0.5206 | +| 1511 | 1489 | 0.3440 | 0.3440 | +| 2263 | 2237 | 0.3492 | 0.3492 | +| 3081 | 2919 | 0.0182 | 0.0182 | +| 3810 | 3690 | 0.0829 | 0.0829 | +| 4514 | 4486 | 0.3839 | 0.3839 | +| 5341 | 5159 | 0.0379 | 0.0379 | +| 6139 | 5861 | 0.0056 | 0.0056 | +| 6792 | 6708 | 0.2349 | 0.2349 | +| 7608 | 7392 | 0.0389 | 0.0389 | +| 8308 | 8192 | 0.1833 | 0.1832 | +| 9145 | 8855 | 0.0153 | 0.0153 | +| 9957 | 9543 | 0.0015 | 0.0015 | +| 10618 | 10382 | 0.0517 | 0.0517 | + +# Discussion +## Q1: Where does objective Bayesian analysis belong in statistics? +I think Jeffreys was right and standard statistical procedures should deliver +“results of the kind we need”. While Bayes and Laplace might not have been +fully justified in their choice of a uniform prior, they were correct in their +objective of quantifying results in terms of degree of belief. The approach +Jeffreys outlined (and was later evolved with reference priors) gives us a +pathway to provide “results of the kind we need” while addressing the +arbitrariness of a uniform prior. Jeffreys approach isn’t the only way to get +to results as degrees of belief, and a more subjective approach can also be +valid if the situation allows, but his approach give us good answers for the +common case “of an event concerning the probability of which we absolutely know +nothing” and can be used as a drop-in replacement for frequentist methods. + +To answer more concretely, I think when you open up a standard +introduction-to-statistics textbook and look up a basic procedure such as a +hypothesis test of whether the mean of normally distributed data with unknown +variance is non-zero, you should see a method built on objective priors and +Bayes factor like [[19](https://www.jstor.org/stable/2670175)] rather than a method based on P values. + +## Q2: But aren’t there multiple ways of deriving good priors in the absence of prior knowledge? +I highlighted frequentist coverage matching as a benchmark to gauge whether a +prior is a good candidate for objective analysis, but coverage matching isn’t +the only valid metric we could use and it may be possible to derive multiple +priors with good coverage. Different priors with good frequentist properties, +though, will likely be similar, and any results will be determined more by +observations than the prior. If we are in a situation where multiple good +priors lead to significantly differing results, then that’s an indicator we +need to provide subjective input to get a useful answer. Here’s how Berger +addresses this issue: + +Inventing a new criterion for finding “the optimal objective prior” has proven +to be a popular research pastime, and the result is that many competing priors +are now available for many situations. This multiplicity can be bewildering to +the casual user. I have found the reference prior approach to be the most +successful approach, sometimes complemented by invariance considerations as +well as study of frequentist properties of resulting procedures. Through such +considerations, a particular prior usually emerges as the clear winner in many +scenarios, and can be put forth as the recommended objective prior for the +situation. [[20](https://www2.stat.duke.edu/~berger/papers/obayes-debate.pdf)] + +### Q3. Doesn’t that make inverse probability subjective, whereas frequentist methods provide an objective approach to statistics? + +It’s a common misconception that frequentist methods are objective. Berger and +Berry provides this example to demonstrate [[21](https://si.biostat.washington.edu/sites/default/files/modules/BergerBerry.pdf)]: +Suppose we watch a research +study a coin for bias. We see the researcher flip the coin 17 times. Heads +comes up 13 times and tails comes up 4 times. Suppose θ represents the +probability of heads and the researcher is doing a standard P-value test with +the null hypothesis that the coin is not bias, θ=0.5. What P-value would they +get? We can’t answer the question because the researcher would get remarkably +different results depending on their experimental intentions. + +If the researcher intended to flip the coin 17 times, then the probability of +seeing a value *less extreme* than 13 heads under the null hypothesis is given by +summing binomial distribution terms representing the probabilities of getting 5 +to 12 heads, +```math +\sum_{k=5}^{12} \binom{17}{k} 0.5^{17} \approx 0.951 +``` +which gives us a P-value of 1–0.951=0.049. + +If, however, the researcher intended to continue flipping until they got at +least 4 heads and 4 tails, then the probability of seeing a value *less extreme* +than 17 total flips under the null hypothesis is given by summing negative +binomial distribution terms representing the probabilities of getting 8 to 16 +total flips, +```math +\sum_{k=7}^{15} 2 \binom{k}{3} 0.5^{k+1} \approx 0.979 +``` +which gives us a P-value of 1–0.979=0.021 + +The result is dependent on not just the data but also on the hidden intentions +of the researcher. As Berger and Berry argue “objectivity is not generally +possible in statistics and … standard statistical methods can produce +misleading inferences.” [[21](https://si.biostat.washington.edu/sites/default/files/modules/BergerBerry.pdf)] [source code for example](https://github.com/rnburn/bbai/blob/master/example/14-p-value-objectivity.ipynb) + +## Q4. If subjectivity is unavoidable, why not just use subjective priors? + +When subjective input is possible, we should incorporate it. But we should also +acknowledge that Bayes’ “event concerning the probability of which we +absolutely know nothing” is an important fundamental problem of inference that +needs good solutions. As Edwin Jaynes writes + +> To reject the question, [how do we find the prior representing “complete +> ignorance”?], as some have done, on the grounds that the state of complete +> ignorance does not “exist” would be just as absurd as to reject Euclidean +> geometry on the grounds that a physical point does not exist. In the study of +> inductive inference, the notion of complete ignorance intrudes itself into the +> theory just as naturally and inevitably as the concept of zero in arithmetic. +> If one rejects the consideration of complete ignorance on the grounds that the +> notion is vague and ill-defined, the reply is that the notion cannot be evaded +> in any full theory of inference. So if it is still ill-defined, then a major +> and immediate objective must be to find a precise definition which will agree +> with intuitive requirements and be of constructive use in a mathematical +> theory. [[22](https://bayes.wustl.edu/etj/articles/prior.pdf)] + +Moreover, systematic approaches such as reference priors can certainly do much +better than pseudo-Bayesian techniques such as choosing a uniform prior over a +truncated parameter space or a vague proper prior such as a Gaussian over a +region of the parameter space that looks interesting. Even when subjective +information is available, using reference priors as building blocks is often +the best way to incorporate it. For instance, if we know that a parameter is +restricted to a certain range but don’t know anything more, we can simply adapt +a reference prior by restricting and renormalizing it [[12](https://www.worldscientific.com/worldscibooks/10.1142/13640#t=aboutBook), p. 256]. + +> Note: The term pseudo-Bayesian comes from [[20](https://www2.stat.duke.edu/~berger/papers/obayes-debate.pdf)]. See that paper for a more through discussion and comparison with objective Bayesian analysis. + +# Conclusion +The common and repeated misinterpretation of statistical results such as P +values or confidence intervals as belief probabilities shows us that there is a +strong natural tendency to want to think about inference in terms of inverse +probability. It’s no wonder that the method dominated for 150 years. + +Fisher and others were certainly correct to criticize naive use of a uniform +prior as arbitrary, but this is largely addressed by reference priors and +adopting metrics like frequentist matching coverage that quantify what it means +for a prior to represent ignorance. As Berger puts it, + +> We would argue that noninformative prior Bayesian analysis is the single most +> powerful method of statistical analysis, in the sense of being the ad hoc +> method most likely to yield a sensible answer for a given investment of effort. +> And the answers so obtained have the added feature of being, in some sense, the +> most “objective” statistical answers obtainable [[23](https://www.amazon.com/Statistical-Decision-Bayesian-Analysis-Statistics/dp/0387960988), p. 90] + + + +## References +[1] Bernoulli, J. (1713). [On the Law of Large Numbers, Part Four of Ars Conjectandi.](http://www.sheynin.de/download/bernoulli.pdf) Translated by Oscar Sheynin. + +[2] De Moivre, A. (1756). [The Doctrine of Chances](https://www.ime.usp.br/~walterfm/cursos/mac5796/DoctrineOfChances.pdf). + +[3] Bayes, T. (1763). [An essay towards solving a problem in the doctrine of chances. by the late rev. mr. bayes, f. r. s. communicated by mr. price, in a letter to john canton, a. m. f. r. s](https://web.archive.org/web/20110410085940/http://www.stat.ucla.edu/history/essay.pdf) . Philosophical Transactions of the Royal Society of London 53, 370–418. + +[4] Stigler, S. (1990). The History of Statistics: The Measurement of Uncertainty before 1900. Belknap Press. + +[5] Laplace, P. (1774). [Memoir on the probability of the causes of events](https://www.york.ac.uk/depts/maths/histstat/memoir1774.pdf) +. Translated by S. M. Stigler. + +[6] De Morgan, A. (1838). [An Essay On Probabilities: And On Their Application To Life Contingencies And Insurance Offices](https://archive.org/details/134257988/page/n7/mode/2up). + +[7] Fisher, R. (1930). [Inverse probability](https://errorstatistics.com/wp-content/uploads/2016/02/fisher-1930-inverse-probability.pdf). Mathematical Proceedings of the Cambridge Philosophical Society 26(4), 528–535. + +[8] Fisher, R. (1922). [On the mathematical foundations of theoretical statistics. Philosophical Transactions of the Royal Society of London](http://www.medicine.mcgill.ca/epidemiology/hanley/bios601/Likelihood/Fisher1922.pdf). Series A, Containing Papers of a Mathematical or Physical Character 222, 309–368. + +[9] Jeffreys, H. (1961). [Theory of Probability (3 ed.)](https://archive.org/details/in.ernet.dli.2015.2608). Oxford Classic Texts in the Physical Sciences. + +[10]: Welch, B. L. and H. W. Peers (1963). [On formulae for confidence points based on integrals of weighted likelihoods](https://academic.oup.com/jrsssb/article-abstract/25/2/318/7035245?redirectedFrom=PDF). Journal of the Royal Statistical Society Series B-methodological 25, 318–329. + +[11]: Berger, J. O., J. M. Bernardo, and D. Sun (2009). [The formal definition of reference priors](https://projecteuclid.org/journals/annals-of-statistics/volume-37/issue-2/The-formal-definition-of-reference-priors/10.1214/07-AOS587.full). The Annals of Statistics 37 (2), 905–938. + +[12]: Berger, J., J. Bernardo, and D. Sun (2024). [Objective Bayesian Inference](https://www.amazon.com/Objective-Bayesian-Inference-James-Berger/dp/9811284903). World Scientific. + +[13]: Berger, J. O. and D. A. Berry (1988). [Statistical analysis and the illusion of objectivity](https://si.biostat.washington.edu/sites/default/files/modules/BergerBerry.pdf). American Scientist 76(2), 159–165. + +[14]: Goodman, S. (1999, June). Toward evidence-based medical statistics. 1: [The p value fallacy](https://pubmed.ncbi.nlm.nih.gov/10383371/). Annals of Internal Medicine 130 (12), 995–1004. + +[15]: Berger, J. and T. Sellke (1987). [Testing a point null hypothesis: The irreconcilability of p values and evidence](https://www2.stat.duke.edu/courses/Spring07/sta215/lec/BvP/BergSell1987.pdf). Journal of the American Statistical Association 82(397), 112–22. + +[16]: Selke, T., M. J. Bayarri, and J. Berger (2001). [Calibration of p values for testing precise null hypotheses](https://hannig.cloudapps.unc.edu/STOR654/handouts/SellkeBayarriBerger2001.pdf). The American Statistician 855(1), 62–71. + +[17]: Berger, J., J. Bernardo, and D. Sun (2022). [Objective bayesian inference and its relationship to frequentism](https://www.uv.es/~bernardo/OBayes.pdf). + +[18]: Laplace, P. (1778). [Mémoire sur les probabilités](http://www.probabilityandfinance.com/pulskamp/Laplace/memoir_probabilities.pdf). Translated by Richard J. Pulskamp. + +[19]: Berger, J. and J. Mortera (1999). [Default bayes factors for nonnested hypothesis testing](https://www.jstor.org/stable/2670175). Journal of the American Statistical Association 94 (446), 542–554. + +[20]: Berger, J. (2006). [The case for objective Bayesian analysis](https://www2.stat.duke.edu/~berger/papers/obayes-debate.pdf). Bayesian Analysis 1(3), 385–402. + +[21]: Berger, J. O. and D. A. Berry (1988). [Statistical analysis and the illusion of objectivity](https://si.biostat.washington.edu/sites/default/files/modules/BergerBerry.pdf). American Scientist 76(2), 159–165. + +[22]: Jaynes, E. T. (1968). [Prior probabilities](https://bayes.wustl.edu/etj/articles/prior.pdf). Ieee Transactions on Systems and Cybernetics (3), 227–241. + +[23]: Berger, J. (1985). [Statistical Decision Theory and Bayesian Analysis](https://www.amazon.com/Statistical-Decision-Bayesian-Analysis-Statistics/dp/0387960988). Springer.