Explore this snippet with some demo data here.
Cumulative distribution functions (CDF) are a method for analysing the distribution of a quantity, similar to histograms. They show, for each value of a quantity, what fraction of rows are smaller or greater.
One method for calculating a CDF is as follows:
select -- Use a row_number window function to get the position of this row (row_number() over (order by <quantity> asc)) / (select count(*) from <table>) cdf, <quantity>, from <table>
quantity- the column containing the metric of interest
table- the table name
Using total Spotify streams as an example data source, let’s identify:
table- this is called
quantity- this is the column
then the query becomes:
select (row_number() over (order by streams asc)) / (select count(*) from raw) frac, streams, from raw