Multiple row operations in MySQL / PHP

Multiple row operations are in common use in a normalized application databases as one database entity is often linked to multiple sub-entities (for example a user and his tags). By row operations I’m referring to write queries, namely UPDATE and INSERT queries (DELETE is less interesting so I’ll leave it out for now).

Too often I’ve seen such queries ran in long loops one at a time, which is very bad for performance (as I will show here) and sometimes equally bad for integrity (if the process is interrupted). So what are the alternatives?

Inserting multiple rows

Insertion of multiple rows comes about often in batch jobs, database migrations and handling table relationships. A naive approach, via PHP, would be to loop over the data to be inserted, inserting one row at a time. Suppose the data is already properly filtered and quoted:

foreach( $data as $row ) {
      $query = "INSERT INTO `test_table` (user_id,content)"
          . " VALUES (" 
          . $row['user_id'] . "," 
          . $row['content'] . ")";
      mysql_query($query);
}

Depending on the amount of rows to be inserted, this can be a costly process. A better approach would be to concatenate the values into one insert query and then execute it:

$values = array();
foreach( $data as $row ) {
    $values[] =  "(" . $row['user_id'] . "," . $row['content'] . ")";
}
if( !empty($values) ) {
    $query = "INSERT INTO `test_table` (user_id,content) VALUES " 
             . implode(',',$values);
    mysql_query($query);
}

What’s the difference? lets see some benchmarks:
time
A single query completes much faster than looping through multiple queries. At 2560 rows inserted, it took the loop ~36 seconds to complete, yet the single query took just 0.14 seconds.

Memory consumption shows a reverse trend however:
memory
Since the values are concatenated to create the single query, it consumes more and more memory with more rows as it needs to hold a larger query string. At 2560 rows inserted, the single query approach consumed ~800kb in memory, while the loop consumed just ~60kb.

Since memory is much cheaper than CPU cycles and database connections, I’d usually opt for the single query approach. It’s important though to be aware of the implications.

Inserting / Updating (multiple) values

Another use-case of multiple row operations is when we want to insert several rows that might exist already, and in case they exist we want to update existing values instead. Of course, we could check first which rows exist, update the ones that do and insert the ones that don’t – for a total of at least three separate queries.

Inserting and updating in the same operation is done using INSERT … ON DUPLICATE KEY UPDATE.

The KEY in this statement should be a unique key in the table you are inserting to (otherwise duplicates would be allowed and this operation would be moot). The syntax of this statement allows to pick which columns are updated in the case of matching rows. For example, suppose I’m adding translations for content pages. I have the following table schema:

pages
 - id
 - content
 - ...

pages_translations
 - page_id
 - lang_id
 - content

The pages_translations table has a primary (unique) key on ( page_id , lang_id ). When adding / updating a translation, the query would look something like:

INSERT INTO `pages_translations` (page_id,lang_id,content) 
VALUES (5,2,'the brown fox ...') 
ON DUPLICATE KEY UPDATE content=VALUES(content)

This will create a translation if one does not exist and update the content field if it does exist. We can combine this statement with the previous approach for multiple insertion to insert / update multiple rows with one statement.

Updating multiple rows can be done using this technique, or through a CASE .. WHEN condition for each row. This method is a bit more involved and usually results in bigger queries but it’s worth noting. Example:

UPDATE `pages` SET content=(CASE 
    WHEN id=1 THEN 'first page content ...'
    WHEN id=2 THEN 'second page content ...'
END)

We need to repeat this format for each row which results is a somewhat verbose query.

Inserting / Updating with multiple tables

There are some cases where we would like to use data in one or several tables to update / insert into another table.

With insertion the syntax is relatively straightforward – replace the VALUES part of the statement with a SELECT statement.

INSERT INTO `my_table` (col1,col2,col3) 
SELECT col4,col5,col6 
FROM `another_table`

The ON DUPLICATE KEY condition can be used with this statement as well.

Updating rows cross-tables is done with a JOIN statement (of the less declarative type).

UPDATE `my_table`,`other_table` 
   SET `my_table`.col1 = `other_table`.col2 
WHERE `my_table`.parent_id = `other_table`.id

It’s important to try the INSERT / UPDATE select statements separately to determine how many rows they will select and how fast will they complete. INSERT / UPDATE operations are relatively costly and can be a severe bottleneck for the database if not managed correctly.

To know when the next article is published, please subscribe to new articles using your Email below or follow me on Twitter.

Subscribe to Blog via Email

Enter your email address to receive notification about new posts.