Skip to content

Bayesian Models with SQL

Because conjugate-models works with general numerical inputs, we can use Bayesian models in SQL with the SQL builder, PyPika.

For the example, we will estimate use normal model to estimate the total sales amount by group.

The example table is called events and we will assume a normal model for the column sales for each value of the column group.

We can create the sufficient statistics needed for normal_normal_inverse_gamma directly with the SQL builder.

from pypika import Query, Table, functions as fn

event_table = Table("events")

sales = event_table.sales
sales_squared = sales**2

# Sufficient statistics
x_total = fn.Sum(sales)
x2_total = fn.Sum(sales_squared)
n = fn.Count("*")

# Start a query for a groupby
query = (
    Query.from_(event_table)
    .groupby(event_table.group)
    .select(
        event_table.group,
    )
)

Perform the Bayesian inference as usual, but using the variables reflecting the columns.

from conjugate.distributions import NormalInverseGamma
from conjugate.models import (
    normal,
    normal_predictive,
)

# Bayesian Inference
prior = NormalInverseGamma(mu=0, nu=1 / 10, alpha=1 / 10, beta=1)
posterior = normal(
    x_total=x_total,
    x2_total=x2_total,
    n=n,
    prior=prior,
)
posterior_predictive = normal_predictive(distribution=posterior)

Then add the columns we want from the inference

# Add the posterior predictive estimate
query = query.select(
    posterior_predictive.mu.as_("mu"),
    posterior_predictive.sigma.as_("sigma"),
    posterior_predictive.nu.as_("nu"),
)

Which results in this query:

SELECT "group",
       (0.0+COUNT(*)*SUM("sales")/COUNT(*))/(0.1+COUNT(*)) "mu",
       POW((1+0.5*(0.0+SUM(POW("sales", 2))-POW((0.0+COUNT(*)*SUM("sales")/COUNT(*))/(0.1+COUNT(*)), 2)*(0.1+COUNT(*))))*(0.1+COUNT(*)+1)/((0.1+COUNT(*))*(0.1+COUNT(*)/2)), 0.5) "sigma",
       2*(0.1+COUNT(*)/2) "nu"
FROM "events"
GROUP BY "group"