MySQL — Group Ranges Into Buckets

We’re working on a running project and want to calculate statistics. For example, how many 10k runs happened during the competition? Or how many half-marathons and marathons did people run?

Runners in the event use Strava to track their activities. We’re using Strava webhooks to process and store the activities. We’re also calculating statistics and tables of results. When creating statistics for the number of 10k, half-marathons, or marathons, we feel it would be unfair to look for the exact distance value. Allowing a leeway of 500 meters for each category is more fair. That means we’re grouping a range of permitted distances into a bucket.

This tutorial shows you how to group data ranges into buckets in MySQL.

MySQL Series Overview

Group Data Ranges Into Buckets Using the MySQL CASE Operator

MySQL has built-in control flow operators, like CASE, IF, IFNULL, and others. MySQL’s CASEoperator allows you to combine other SQL operators to detect whether a given value is a match or not.

We want to detect if an activity’s distance matches one of our selected buckets. For example, we want to count an activity into the 10k bucket if the distance is between 10.000 and 10.500 meters. Our database stores the distance in meters for granularity reasons.

Here’s a sample query that puts activities into buckets and counts the number of items in that bucket:

SELECT  
    COUNT(CASE WHEN distance BETWEEN 10000 AND 10500 THEN 1 END) `10k`,
    COUNT(CASE WHEN distance BETWEEN 21097.5 AND 21597.5 THEN 1 END) `halfmarathon`,
    COUNT(CASE WHEN distance BETWEEN 42195 AND 42695 THEN 1 END) `marathon`
FROM activities  

The query above puts items into a bucket. You can also merge column values into a single bucket. For example, Strava allows you to track a Run and VirtualRun. A virtual run can be a run on a treadmill in your gym. Let’s say we want to group both types, run and virtual run, into the Run bucket.

Here’s a sample SQL select statement adding a new column activity_type_bucket containing the value for the bucket we’re putting each activity in:

SELECT  
    *,
    CASE when activity_type IN ("Run", "VirtualRun")
        THEN "Run"
        ELSE activity.type END as `activity_type_bucket`
FROM activities  

That’s it!


Mentioned Resources

Explore the Library

Find interesting tutorials and solutions for your problems.