Determining Limit Size for API Requests
Over the 10 hours, I have collected the number of new issues that have come in. Each hour is as follows:
For a next time period, I want to estimate the limit
size that I should use
to get number of new issues that will come in.
Note
I am not interested in the mean of the data, but rather in a quantile of the data in order to determine the upper bound for what might come in.
I will using a Poisson-Gamma model to determine the issue distribution.
The 90th percentile of the predictive distribution will be used to determine the limit.
Modeling the Data
The Gamma
distribution is used as the prior and the poisson_gamma
function
is used to calculate the posterior distribution (just another Gamma
distribution).
from conjugate.distributions import Gamma
from conjugate.models import poisson_gamma, poisson_gamma_predictive
prior = Gamma.from_occurrences_in_intervals(occurrences=3, intervals=1)
posterior: Gamma = poisson_gamma(n=len(data), x_total=sum(data), prior=prior)
Determining the Limit Size from the Predictive Distribution
I have scenario where the limit
needs to be determined for the next 35
minutes. Despite having measured intervals of 1 hour, the predictive
distribution can be used to determine the number of issues that will come in
the next 35 minutes by scaling down n
by the fraction of the hour.
minutes = 35
window_size = minutes / 60
posterior_predictive = poisson_gamma_predictive(
distribution=posterior,
n=window_size,
)
With this predictive distribution, any statistics can be calculated. For our
case, we are interested in the 90th percentile so we can use the dist
attribute to access the numpy distribution which has the ppf
method for the
quantiles.
The limit size for the next 35 minutes is 3 issues.
Visualizing the Predictive Distributions
We will compare the prior and posterior predictive distributions to see where the limit size falls.
Tip
Use the poisson_gamma_predictive
function with the prior distribution for
the prior predictive distribution.
We can plot the prior and posterior predictive distributions using either
plot_pmf
or plot_cdf
methods and the picked limit.
import matplotlib.pyplot as plt
def plot_picked_limit(limit: float, percentile: float, ax: plt.Axes) -> plt.Axes:
alpha = 0.25
kwargs = {
"color": "black",
}
x_kwargs = {
"marker": "x",
"markersize": 10,
"label": "picked limit",
}
line_kwargs = {
"alpha": alpha,
"linestyle": "--",
}
ax.plot([limit], [percentile], **x_kwargs, **kwargs)
ax.plot([0, limit], [percentile, percentile], **line_kwargs, **kwargs)
ax.plot([limit, limit], [0, percentile], **line_kwargs, **kwargs)
return ax
max_value = limit * 2.0
ax = plt.gca()
prior_predictive.set_max_value(max_value).plot_cdf(ax=ax, label="prior")
posterior_predictive.set_max_value(max_value).plot_cdf(ax=ax, label="posterior")
plot_picked_limit(limit, percentile, ax)
padding = 0.025
ax.set(
ylim=(0 - padding, 1 + padding),
xlabel="Number of Issues",
title=f"The predictive {percentile:.0%} in next {minutes} minutes",
)
ax.legend(title="Predictive Distribution")