Explore this snippet with some demo data here.
Taking a random sample of a table is often useful when the table is very large, and you are still working in an exploratory or investigative mode. A query against a small fraction of a table runs much more quickly, and consumes fewer resources.
Here are some example queries to select a random or pseudo-random subset of rows.
The RANDOM function returns a value in [0, 1) (i.e. including zero but not 1). You can use it to sample tables, e.g.:
select * from PUBLIC.SPOTIFY_DAILY_TRACKS QUALIFY PERCENT_RANK() OVER (ORDER BY RANDOM()) <= 0.01
select * from PUBLIC.SPOTIFY_DAILY_TRACKS order by random() limit 10
A hash is a deterministic mapping from one value to another, and so is not random, but can appear random ‘enough’ to produce a convincing sample. Use a supported hash function in Snowflake to produce a pseudo-random ordering of your table:
select * from PUBLIC.SPOTIFY_DAILY_TRACKS order by md5(SPOTIFY_DAILY_TRACKS.TRACK_ID) limit 10
select * from PUBLIC.SPOTIFY_DAILY_TRACKS order by sha1(SPOTIFY_DAILY_TRACKS.TRACK_ID) limit 10
The TABLESAMPLE has a major benefit - it doesn’t require a full table scan, and therefore can be much cheaper and quicker than the methods above. The downside is that the percentage value must be a literal, so this query is less flexible than the ones above.
select * from PUBLIC.SPOTIFY_DAILY_TRACKS tablesample (1) -- 1% of the data