Bayesian Models with SQL
Because conjugate-models
works with general numerical inputs, we can use Bayesian models in SQL
with the SQL builder, PyPika
.
For the example, we will estimate use normal model to estimate the total sales amount by group.
The example table is called events
and we will assume a normal model for the
column sales
for each value of the column group
.
We can create the sufficient statistics needed for normal_normal_inverse_gamma
directly with the SQL builder.
from pypika import Query, Table, functions as fn
event_table = Table("events")
sales = event_table.sales
sales_squared = sales**2
# Sufficient statistics
x_total = fn.Sum(sales)
x2_total = fn.Sum(sales_squared)
n = fn.Count("*")
# Start a query for a groupby
query = (
Query.from_(event_table)
.groupby(event_table.group)
.select(
event_table.group,
)
)
Perform the Bayesian inference as usual, but using the variables reflecting the columns.
from conjugate.distributions import NormalInverseGamma
from conjugate.models import (
normal,
normal_predictive,
)
# Bayesian Inference
prior = NormalInverseGamma(mu=0, nu=1 / 10, alpha=1 / 10, beta=1)
posterior = normal(
x_total=x_total,
x2_total=x2_total,
n=n,
prior=prior,
)
posterior_predictive = normal_predictive(distribution=posterior)
Then add the columns we want from the inference
# Add the posterior predictive estimate
query = query.select(
posterior_predictive.mu.as_("mu"),
posterior_predictive.sigma.as_("sigma"),
posterior_predictive.nu.as_("nu"),
)
Which results in this query:
SELECT "group",
(0.0+COUNT(*)*SUM("sales")/COUNT(*))/(0.1+COUNT(*)) "mu",
POW((1+0.5*(0.0+SUM(POW("sales", 2))-POW((0.0+COUNT(*)*SUM("sales")/COUNT(*))/(0.1+COUNT(*)), 2)*(0.1+COUNT(*))))*(0.1+COUNT(*)+1)/((0.1+COUNT(*))*(0.1+COUNT(*)/2)), 0.5) "sigma",
2*(0.1+COUNT(*)/2) "nu"
FROM "events"
GROUP BY "group"