Skip to content
/ bbai Public

Deterministic algorithms for objective Bayesian inference and hyperparameter optimization

License

Notifications You must be signed in to change notification settings

rnburn/bbai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

bbai

PyPI version License: CC BY 4.0 API Reference

Deterministic, exact algorithms for objective Bayesian inference and hyperparameter optimization.

Installation

bbai supports both Linux and OSX on x86-64.

pip install bbai

Usage

Fully Bayesian Single-variable Logistic Regression with Reference Prior

https://www.objectivebayesian.com/p/election-2024

Build a fully Bayesian logistic regression model with a single unknown weight using Jeffreys prior (or reference prior, which are the same for only a single parameter).

from bbai.glm import BayesianLogisticRegression1

x = [-5, 2, 8, 1]
y = [0, 1, 0, 1]
model = BayesianLogisticRegression1()

# Fit a posterior distribution for w with the logistic
# regression reference prior
model.fit(x, y)

# Print the posterior probability that w < 0.123
print(model.cdf(0.123))

Hypothesis testing using Expected Encompassing Intrinsic Bayes Factors (EEIBF)

https://www.objectivebayesian.com/p/hypothesis-testing

The EEIBF method is described in the paper Default Bayes Factors for Nonnested Hypothesis Testing by James Berger and Julia Mortera (postscript).

The python code below shows how to test these three hypotheses for the mean of normally distributed data with unknown variance.

H_equal: mean = 0
H_left: mean < 0
H_right: mean > 0
from bbai.stat import NormalMeanHypothesis
import numpy as np

np.random.seed(0)
data = np.random.normal(0.123, 1.5, size=9)
probs = NormalMeanHypothesis().test(data)
print(probs.equal) # posterior probability for H_equal 0.235
print(probs.left) # posterior probability for H_left 0.0512
print(probs.right) # posterior probability for H_right 0.713

See example/19-hypothesis-first-t.ipynb for an example and example/18-hypothesis-eeibf-validation.ipynb for a step-by-step validation of the method against the paper.

Objective Bayesian inference for comparing binomial proportions

https://www.objectivebayesian.com/p/binomial-comparison

Fit a posterior distribution with a reference prior to compare binomial proportions:

from bbai.model import DeltaBinomialModel

# Some example data
a1, b1, a2, b2 = 5, 3, 2, 7

# Fit a posterior distribution with likelihood function
#     L(theta, x) = (theta + x)^a1 * (1 - theta - x)^b1 * x^a2 (1-x)^b2
# where theta represents the difference of the two binomial distribution probabilities
model = DeltaBinomialModel(prior='reference')
model.fit(a1, b1, a2, b2)

# Print the probability that theta < 0.123
print(model.cdf(0.123))
     # Prints 0.10907436812863071

Efficient approximation of multivariable functions using adaptive sparse grids at Chebyshev nodes.

from bbai.numeric import SparseGridInterpolator
import numpy as np

# A test function
def f(x, y, z):
    t1 = 0.68 * np.abs(x - 0.3)
    t2 = 1.25 * np.abs(y - 0.15)
    t3 = 1.86 * np.abs(z - 0.09)
    return np.exp(-t1 - t2 - t3)

# Fit a sparse grid to approximate f
ranges = [(-2, 5), (1, 3), (-2, 2)]
interp = SparseGridInterpolator(tolerance=1.0e-4, ranges=ranges)
interp.fit(f)
print('num_pts =', interp.points.shape[1])
    # prints 10851

# Test the accuracy at a random point of the domain
print(interp.evaluate(1.84, 2.43, 0.41), f(1.84, 2.43, 0.41))
#    prints 0.011190847391188667 0.011193746554063376

# Integrate the approximation over the range
print(interp.integral)
#    prints 0.6847335267327939

Objective Bayesian Inference for Gaussian Process Models

Construct prediction distributions for Gaussian process models using full integration over the parameter space with a noninformative, reference prior.

import numpy as np
from bbai.gp import BayesianGaussianProcessRegression, RbfCovarianceFunction

# Make an example data set
def make_location_matrix(N):
    res = np.zeros((N, 1))
    step = 1.0 / (N - 1)
    for i in range(N):
        res[i, 0] = i * step
    return res
def make_covariance_matrix(S, sigma2, theta, eta):
    N = len(S)
    res = np.zeros((N, N))
    for i in range(N):
        si = S[i]
        for j in range(N):
            sj = S[j]
            d = np.linalg.norm(si - sj)
            res[i, j] = np.exp(-0.5*(d/theta)**2)
        res[i, i] += eta
    return sigma2 * res
def make_target_vector(K):
    return np.random.multivariate_normal(np.zeros(K.shape[0]), K)
np.random.seed(0)
N = 20
sigma2 = 25
theta = 0.01
eta = 0.1
params = (sigma2, theta, eta)
S = make_location_matrix(N)
K = make_covariance_matrix(S, sigma2, theta, eta)
y = make_target_vector(K)

# Fit a Gaussian process model to the data
model = BayesianGaussianProcessRegression(kernel=RbfCovarianceFunction())
model.fit(S, y)

# Construct the prediction distribution for x=0.1
preds, pred_pdfs = model.predict([[0.1]], with_pdf=True)
high, low = pred_pdfs.ppf(0.75), pred_pdfs.ppf(0.25)

# Print the mean and %25-%75 credible set of the prediction distribution
print(preds[0], '(%f to %f)' % (low, high))

Ridge Regression

Fit a ridge regression model with the regularization parameter exactly set so as to minimize mean squared error on a leave-one-out cross-validation of the training data set

# load example data set
from sklearn.datasets import load_boston
from sklearn.preprocessing import StandardScaler
X, y = load_boston(return_X_y=True)
X = StandardScaler().fit_transform(X)

# fit model
from bbai.glm import RidgeRegression
model = RidgeRegression()
model.fit(X, y)

Logistic Regression

Fit a logistic regression model with the regularization parameter exactly set so as to maximize likelihood on an approximate leave-one-out cross-validation of the training data set

# load example data set
from sklearn.datasets import load_breast_cancer
from sklearn.preprocessing import StandardScaler
X, y = load_breast_cancer(return_X_y=True)
X = StandardScaler().fit_transform(X)

# fit model
from bbai.glm import LogisticRegression
model = LogisticRegression()
model.fit(X, y)

Bayesian Ridge Regression

Fit a Bayesian ridge regression model where the hyperparameter controlling the regularization strength is integrated over.

# load example data set
from sklearn.datasets import load_boston
from sklearn.preprocessing import StandardScaler
X, y = load_boston(return_X_y=True)
X = StandardScaler().fit_transform(X)

# fit model
from bbai.glm import BayesianRidgeRegression
model = BayesianRidgeRegression()
model.fit(X, y)

Logistic Regression MAP with Jeffreys Prior

Fit a logistic regression MAP model with Jeffreys prior.

# load example data set
from sklearn.datasets import load_breast_cancer
from sklearn.preprocessing import StandardScaler
X, y = load_breast_cancer(return_X_y=True)
X = StandardScaler().fit_transform(X)

# fit model
from bbai.glm import LogisticRegressionMAP
model = LogisticRegressionMAP()
model.fit(X, y)

How it works

Examples

  • 01-digits: Fit a multinomial logistic regression model to predict digits.
  • 02-iris: Fit a multinomial logistic regression model to the Iris data set.
  • 03-bayesian: Fit a Bayesian ridge regression model with hyperparameter integration.
  • 04-curve-fitting: Fit a Bayesian ridge regression model with hyperparameter integration.
  • 05-jeffreys1: Fit a logistic regression MAP model with Jeffreys prior and a single regressor.
  • 06-jeffreys2: Fit a logistic regression MAP model with Jeffreys prior and two regressors.
  • 07-jeffreys-breast-cancer: Fit a logistic regression MAP model with Jeffreys prior to the breast cancer data set.
  • 08-soil-cn: Fit a Bayesian Gaussian process with a non-informative prior to a data set of soil carbon-to-nitrogen samples.
  • 11-meuse-zinc: Fit a Bayesian Gaussian process with a non-informative prior to a data set of zinc concetrations taken in a flood plain of the Meuse river.
  • 13-sparse-grid: Build adaptive sparse grids for interpolation and integration.

Documentation