Generating graphs from MySQL table data

Graphs and charts are a useful visual way to view historical data – they make it easier to detect trends and get a big-picture view of data. All we need is timestamped data – table rows that are stamped with a specific date/time format, that can be used to group rows into time periods.

Time stamped data

In order to aggregate table data by time periods / ranges, we need a date/time column in the table we want to analyze. Appropriate types for such a column include TIMESTAMP, DATETIME and DATE, but we can also use string / numeric types for grouping data together if they contain some sort of date/time information – though those will be much less flexible than native date/time types.

I usually opt to go with the TIMESTAMP format, for a couple of reasons –

  • TIMESTAMP data is stored internally as UTC timezone – this makes such data portable between servers without needing to adjust for timezones.
  • TIMESTAMP type have auto-initialization and updating features – using the CURRENT_TIMESTAMP and ON UPDATE CURRENT_TIMESTAMP values. As of MySQL ver. 5.6.5 you can use those properties with DATETIME as well, but since most servers and linux distributions use lower (considered more stable) versions, it’s usually not an option.

The TIMESTAMP format is limited by its date range (1970-01-01 to 2038-01-19), so you might need to use DATETIME if you’re using dates far in the future or past.

Grouping time-stamped rows into time periods

Assuming we have time-stamped data in our table, we can now start aggregating it into specific time periods. The most commonly used time periods for grouping are:

  • Day
  • Week
  • Month
  • Year

Lets take an example table schema and see how we group the data for each period.

- (int) `id`
- (varchar) `name`
- (smallint) `role`
- (timestamp) `created`

A very simple users table. Our time-stamped column is obviously `created`, and we’ll be using it to group our data.

Daily data

Our first task is to extract daily new users data from the table. We’ll do this by extracting the date portion of the timestamp using the DATE() function.

    DATE(`users`.`created`) AS `date`,
    COUNT(`users`.`id`) AS `count`
FROM `users`
WHERE `users`.`created` BETWEEN '2012-01-01 00:00:00' AND '2012-01-31 23:59:59'
GROUP BY `date`
ORDER BY `date`

This returns data that looks like the following:

date count
2012-01-01 12
2012-01-02 12
2012-01-03 36
2012-01-04 34
2012-01-05 36
2012-01-06 29

Note that I limited the query to a time range of 1 month (January 1st to January 31st). I used this particular format for a couple of reasons:

  • It allows MySQL to use an index for filtering the result. Had I used the DATE() function in the WHERE clause ( DATE(`users`.`created`) BETWEEN ... ), an index could not have been used resulting in degraded performance depending on the number of rows in the table.
  • In order to include all the data in the edge dates, I specified the time as well – from the beginning of the first day to the end of the last day.

Weekly data

This time we want to group data by week, while still preserving the year in case our range spans more than one year in dates (for example December – January). There are a couple of ways to do this –

  • DATE_FORMAT(`users`.`created`,'%X-%V') – returns data in the format 2012-23 (the second part is the week number)
  • YEARWEEK(`users`.`created`) – returns data in the format 201223 (last two digits are the week number).

I recommend using YEARWEEK() as it is faster than DATE_FORMAT() which makes a difference if you have a large table. Both methods can accommodate weeks  starting on Sunday or Monday – check the MySQL documentation for the details.

Monthly data

This time around we do not have any native functions that extract both the month and year (though we could extract those separately and concatenate). We use formatting functions instead:

  • DATE_FORMAT(`users`.`created`,'%Y-%m') – returns data in the format 2012-02 (year-month)
  • EXTRACT(YEAR_MONTH FROM `users`.`created`) – returns data in the format 201202 (last 2 digits are the month)

Again, DATE_FORMAT() is the slower of the two, so I’d recommend using EXTRACT() for better performance.

Yearly data

For yearly data we do have a native function – YEAR() – as well as other formatting methods we used earlier. Our options are:

  • YEAR(`users`.`created`)
  • EXTRACT(YEAR FORM `users`.`created`)
  • DATE_FORMAT(`users`.`created`)

All three return the same format – a 4 digit year. YEAR() and EXTRACT() have roughly the same performance, while DATE_FORMAT() is the slowest by a significant margin.

Aggregate functions and advanced usage

So far we’ve dealt with the very simple case of counting rows from a single table. Real world use-cases often involve multiple tables and extracting other types of aggregated data. MySQL gives us many functions to extract aggregate data – including COUNT(), SUM(), AVG() and others.

There are a few common requirements and pitfalls that are not completely obvious in implementation that bear mentioning.

Counting distinct values

When we use the COUNT() function, we typically want to get the number of different rows for each period. This works well for single table queries, but when we start joining tables in our queries we might get surprising results. The reason for that is that one-to-many relationships cause rows to appear multiple times in the result, and thus be counted multiple times.

In that case, we want to use the DISTINCT operator before the column we are counting – COUNT(DISTINCT `users`.`created`).

This is also true if we want to get distinct values when counting a non-unique column. If we wanted to get the number of different roles in our example schema from the previous section, we would use COUNT(DISTINCT `users`.`role`). Using a COUNT() on `role` without the DISTINCT operator would have given the number of rows instead.

Counting occurrences of specific values

The COUNT() function gives us a simple solution to count either distinct values or the number of rows. Sometimes though, we would like to count the occurrences of specific values. Suppose the `role` column in the example users table from the previous section can contain the following values:

  • 1 – Regular user
  • 2 – Author
  • 3 – Moderator
  • 4 – Administrator

We want to know how many users have specific roles. MySQL does not provide a cookie cutter function for this scenario, so we need to combine some logic statements with the functions we do have. We’ll use the SUM() function on a CASE statement for each specific role:

    DATE(`users`.`created`) AS `date`,
    SUM(CASE WHEN `users`.`role` = 1 THEN 1 ELSE 0 END) AS `Regular users`,
    SUM(CASE WHEN `users`.`role` = 2 THEN 1 ELSE 0 END) AS `Authors`,
    SUM(CASE WHEN `users`.`role` = 3 THEN 1 ELSE 0 END) AS `Moderators`,
    SUM(CASE WHEN `users`.`role` = 4 THEN 1 ELSE 0 END) AS `Administrator`
FROM `users`
WHERE `users`.`created` BETWEEN '2012-01-01 00:00:00' AND '2012-01-31 23:59:59'
GROUP BY `date`
ORDER BY `date`

This will generate results that look the following:

date Regular users Authors Moderators Administrator
2012-01-01 102 12 2 1
2012-01-02 87 8 1 0
2012-01-03 104 4 3 1
2012-01-04 150 19 5 2

Joining with accurate data aggregation

I’ve mentioned the DISTINCT operator as a method to COUNT() distinct values when joining multiple tables. This does not work for other aggregate functions, such as SUM() and AVG(), since the same value may occur more than once – however, we still would like to avoid the same row appearing more than once and ruining the accuracy of our aggregate data.

For this purpose we can use derived subqueries, which are subqueries appearing in the FROM clause of the query. Those are not the same as  correlated subqueries, which appear in the WHERE clause, and are often a source for low query performance. I’ve previously shown this technique in detail in a previous article on fetching specific rows from a group, even though the purpose was a bit different.

Formatting and displaying data in graphs and charts

So far we’ve discussed fetching the data for usage in our graphs, but that’s only the beginning. Now we need to format and process this data using a server-side language (I use PHP), and prepare it for outputting using a graphing / charting solution in Javascript or Flash. My favorite charting solution is a neat Javascript library called flot, which is very capable and customizable.

I will go over those parts in detail in future articles, and in the meantime you can check a component I developed that combines all three aspects into a Google Analytics style charting package for those who rather skip the grunt work and just generate graphs right away.

If you have any questions about this article, I’d love to hear about it in the comments!

To know when the next article is published, please subscribe to new articles using your Email below or follow me on Twitter.

Subscribe to Blog via Email

Enter your email address to receive notification about new posts.

  • frustrated

    Not accurate. The posted sql code does not retrieve the data you claim it does. All it does is retrieve one date and one count as it stands. So were you assuming to loop through it?

  • Eran Galperin

    There is no mistake – the SQL does exactly what I wrote, and retrieves an entire month worth of daily data. Naturally, it is a rowset, and not one row, and if you’re using a server-side language to parse it you will need to iterate over the set of rows.

  • Christoph Tilley

    I have been trying to get my head around this for a time now and can’t find a solution: I am querying for time-entries with a result like this:

    2015-02-10: 13
    2015-02-11: 16
    2015-02-13: 11

    As you can see I am missing two days in the array because there are no entries for these days. My google-fu brought me some solutions for this problem but none seem to work for my specific code:

    DATE(time_entries.start) AS date,
    COUNT( AS entries,
    SUM(CASE WHEN user_id = 4 THEN TIMESTAMPDIFF(MINUTE, start, end) ELSE 0 END) AS me,
    SUM(CASE WHEN user_id = 3 THEN TIMESTAMPDIFF(MINUTE, start, end) ELSE 0 END) AS ph
    FROM time_entries
    LEFT JOIN calendar_table
    ON time_entries.start=calendar_table.dt
    GROUP BY date
    ORDER BY date

    I created the calendar_table with this help:

    Please help! Best, Chris

  • Eran Galperin

    Hi Chris,

    I would usually take care of empty dates in a programming language that parses the results. Doing it in SQL would be a bit convoluted.

    If you feel like you must, what you need to do is select from the calendar_table and then join it to the actual data table (time_entries in your case). The SQL you have right now will not find any rows for the empty date rows because it selects from the time_entries, where those dates don’t exist. You might need to make additional modifications to the query to make it work.

    Hope that makes sense!

  • Christoph Tilley


    thanks for your reply. Unfortunately I think I need to go the SQL-way because I want to graph the data with chartjs and thus I need to plot the dates parallel to the data.

    I’ll look deeper into SQL or ask for help somewhere else.


  • Ripulryu

    Thanks for this post. How to extend this to hours ?

  • Abdullah

    I wan’t to get the total records of every day for the whole month. Like you see below

    SELECT COUNT( AS total_sms, DATE(slsr.date_time) AS sms_date
    FROM send_msg slsr
    WHERE slsr.user_id = 1 AND slsr.`status` =1 AND DATE(slsr.date_time) BETWEEN ‘2016-08-01’ AND ‘2016-12-31’
    GROUP BY DATE(slsr.date_time);

    This above query is returning result like this

    | total_sms | sms_date |
    | 24 | 2016-11-23 |
    | 4 | 2016-12-07 |

    This is what i’m getting but i wan’t to get the results day by day.

    | total_sms | sms_date |
    | 0 | 2016-11-01 |
    | 0 | 2016-11-02 |
    | 0 | 2016-11-03 |
    | 0 | 2016-11-04 |
    | 0 | 2016-11-05 |
    | 0 | 2016-11-06 |

    Problem is that i’m only getting the values that exist on the date.. If value not found on that date then i’m not getting the result with the 0 ,,
    Actually I wan’t to show the values in a graph. Please help me and revise my query so that i can move forward.

  • Eran Galperin

    Hi Abdullah,

    Yes, the query does not return non-existing data. I typically use a scripting language as mentioned at the end of the article to format the data and add empty data points for charting libraries. If you want to do it in a query anyway, you can use a trick by creating a table of dates and join against it.

    Christoph Tilly posted a similar solution below, you should take a look at it.