Explore this snippet here
Once it is possible to generate a uniform distribution of numbers, it is possible to generate any other distribution. This snippet shows how to generate a sample of numbers that are binomially distributed using rejection sampling.
Note - rejection sampling is simple to implement, but potentially very inefficient
The steps here are:
- Generate a random list of
k= ‘number of successes’ from n trials, with each trial having an expected
p= ‘probability of success’.
- For each
k, calculate the binomial probability of
ksuccesses using the binomial formula, and another random number
ubetween zero and one.
uis smaller than the expected probability, accept the value of
k, otherwise reject it
with params as ( select 0.3 as p, -- Probability of success per trial 20 as n, -- Number of trials 100000 as num_samples -- Number of samples to test (the actual samples will be fewer) ), samples as ( select -- Generate a random integer k = 'number of successes' in [0, n] mod(abs(farm_fingerprint(cast(indices as string))), n + 1) as k, -- Generate a random number u uniformly distributed in [0, 1] mod(abs(farm_fingerprint(cast(indices as string))), 1000000) / 1000000 as u from params, unnest(generate_array(1, num_samples)) as indices ), probs as ( select -- Hack in a factorial function for n! / (k! * (n - k)!) (select exp(sum(ln(i))) from unnest(if(n = 0, , generate_array(1, n))) as i) / (select exp(sum(ln(i))) from unnest(if(k = 0, , generate_array(1, k))) as i) / (select exp(sum(ln(i))) from unnest(if(n = k, , generate_array(1, n - k))) as i) * pow(p, k) * pow(1 - p, n - k) as threshold, k, u from params, samples ), after_rejection as ( -- If the random number for this value of k is too high, reject it -- Note - the binomial PDF is normalised to a max value of 1 to improve the sampling efficiency select k from probs where u <= threshold / (select max(threshold) from probs) ) select * from after_rejection