We’re working on a running project and want to calculate statistics. For example, how many 10k runs happened during the competition? Or how many half-marathons and marathons did people run?
Runners in the event use Strava to track their activities. We’re using Strava webhooks to process and store the activities. We’re also calculating statistics and tables of results. When creating statistics for the number of 10k, half-marathons, or marathons, we feel it would be unfair to look for the exact distance value. Allowing a leeway of 500 meters for each category is more fair. That means we’re grouping a range of permitted distances into a bucket.
This tutorial shows you how to group data ranges into buckets in MySQL.
MySQL Series Overview
- Truncate a Foreign Key Constrained Table
- Add Days to a Datetime
- Find Value in Column with Comma-Separated Values
- Ideas to Debug High-Load Situations
- Rename a Column (Coming soon)
- Order By Columns Whichever is Not Null (Coming soon)
- Add New Column After an Existing Column (Coming soon)
- Add New Column as the First Column in a Table (Coming soon)
- Group by Date or Time
- Group Ranges Into Buckets
Group Data Ranges Into Buckets Using the MySQL CASE
Operator
MySQL has built-in control flow operators, like CASE
, IF
, IFNULL
, and others. MySQL’s CASE
operator allows you to combine other SQL operators to detect whether a given value is a match or not.
We want to detect if an activity’s distance matches one of our selected buckets. For example, we want to count an activity into the 10k
bucket if the distance is between 10.000 and 10.500 meters. Our database stores the distance in meters for granularity reasons.
Here’s a sample query that puts activities into buckets and counts the number of items in that bucket:
SELECT
COUNT(CASE WHEN distance BETWEEN 10000 AND 10500 THEN 1 END) `10k`,
COUNT(CASE WHEN distance BETWEEN 21097.5 AND 21597.5 THEN 1 END) `halfmarathon`,
COUNT(CASE WHEN distance BETWEEN 42195 AND 42695 THEN 1 END) `marathon`
FROM activities
The query above puts items into a bucket. You can also merge column values into a single bucket. For example, Strava allows you to track a Run
and VirtualRun
. A virtual run can be a run on a treadmill in your gym. Let’s say we want to group both types, run and virtual run, into the Run
bucket.
Here’s a sample SQL select statement adding a new column activity_type_bucket
containing the value for the bucket we’re putting each activity in:
SELECT
*,
CASE when activity_type IN ("Run", "VirtualRun")
THEN "Run"
ELSE activity.type END as `activity_type_bucket`
FROM activities
That’s it!