Histogram Bins

Histogram Bins

View an interactive version of this snippet here.

Description

Creating histograms using SQL can be tricky. This template will let you quickly create the bins needed to create a histogram:

SELECT 
   COUNT(1) / (SELECT COUNT(1) FROM <schema.table> ) AS percentage_of_results
   FLOOR(<column>/ <bin-size>) * <bin-size> AS bin
FROM
   <schema.table>
GROUP BY 
   bin

where:

  • <column> is your column you want to turn into a histogram
  • <bin-size> is the width of each bin. You can adjust this to get the desired level of detail for the histogram.

Example

Adjust the bin-size until you can see the ‘shape’ of the data. The following example finds the percentage of fruits with 5-point review groupings. The reviews are on a 0-100 scale:

WITH data AS (
    SELECT CASE
        WHEN RANDOM() <= 0.25 THEN 'tennessee'
        WHEN RANDOM() <= 0.5 THEN 'colonel'
        WHEN RANDOM() <= 0.75 THEN 'pikesville'
      ELSE 'michters'
    END AS whiskey,
    CASE
        WHEN RANDOM() <= 0.20 THEN 1
        WHEN RANDOM() <= 0.40 THEN 2
        WHEN RANDOM() <= 0.60 THEN 3
        WHEN RANDOM() <= 0.80 THEN 4
      ELSE 5
    END AS star_rating,
    review_date
    FROM generate_series('2021-01-01'::TIMESTAMP,'2021-01-31'::TIMESTAMP,  '3 DAY') AS review_date
)

SELECT 
  COUNT(*) / (SELECT COUNT(1)::FLOAT FROM data) AS percentage_of_results,
  FLOOR(data.star_rating / 1) * 1 bin
FROM data
GROUP BY bin