Multiple row operations in MySQL / PHP

MySQL PHP Web development May 14th, 2009

Multiple row operations are in common use in a normalized application databases as one database entity is often linked to multiple sub-entities (for example a user and his tags). By row operations I'm referring to write queries, namely UPDATE and INSERT queries (DELETE is less interesting so I'll leave it out for now).

Too often I've seen such queries ran in long loops one at a time, which is very bad for performance (as I will show here) and sometimes equally bad for integrity (if the process is interrupted). So what are the alternatives?

Inserting multiple rows

Insertion of multiple rows comes about often in batch jobs, database migrations and handling table relationships. A naive approach, via PHP, would be to loop over the data to be inserted, inserting one row at a time. Suppose the data is already properly filtered and quoted:

  1. foreach( $data as $row ) {
  2. $query = "INSERT INTO `test_table` (user_id,content)"
  3. . " VALUES ("
  4. . $row['user_id'] . ","
  5. . $row['content'] . ")";
  6. mysql_query($query);
  7. }

Depending on the amount of rows to be inserted, this can be a costly process. A better approach would be to concatenate the values into one insert query and then execute it:

  1. $values = array();
  2. foreach( $data as $row ) {
  3. $values[] = "(" . $row['user_id'] . "," . $row['content'] . ")";
  4. }
  5. if( !empty($values) ) {
  6. $query = "INSERT INTO `test_table` (user_id,content) VALUES "
  7. . implode(',',$values);
  8. mysql_query($query);
  9. }

What's the difference? lets see some benchmarks:
time
A single query completes much faster than looping through multiple queries. At 2560 rows inserted, it took the loop ~36 seconds to complete, yet the single query took just 0.14 seconds.

Memory consumption shows a reverse trend however:
memory
Since the values are concatenated to create the single query, it consumes more and more memory with more rows as it needs to hold a larger query string. At 2560 rows inserted, the single query approach consumed ~800kb in memory, while the loop consumed just ~60kb.

Since memory is much cheaper than CPU cycles and database connections, I'd usually opt for the single query approach. It's important though to be aware of the implications.

Inserting / Updating (multiple) values

Another use-case of multiple row operations is when we want to insert several rows that might exist already, and in case they exist we want to update existing values instead. Of course, we could check first which rows exist, update the ones that do and insert the ones that don't - for a total of at least three separate queries.

The update query is the major problem here - there is no way to update multiple rows with different values using one query (and it doesn't make much sense in most cases). However in this case it does make sense to do so, and fortunately MySQL provides a way to do just that, using INSERT ... ON DUPLICATE KEY UPDATE.

The KEY in this statement should be a unique key in the table you are inserting to (otherwise duplicates would be allowed and this operation would be moot). The syntax of this statement allows to pick which columns are updated in the case of matching rows. For example, suppose I'm adding translations for content pages. I have the following table schema:

 
pages
 - id
 - content
 - ...
 
pages_translations
 - page_id
 - lang_id
 - content
 

The pages_translations table has a primary (unique) key on ( page_id , lang_id ). When adding / updating a translation, the query would look something like:

INSERT INTO `pages_translations` (page_id,lang_id,content)
VALUES (5,2,'the brown fox ...')
ON DUPLICATE KEY UPDATE content=VALUES(content)
 

This will create a translation if one does not exist and update the content field if it does exist. We can combine this statement with the previous approach for multiple insertion to insert / update multiple rows with one statement.

Inserting / Updating with multiple tables

There are some cases where we would like to use data in one or several tables to update / insert into another table.

With insertion the syntax is relatively straightforward - replace the VALUES part of the statement with a SELECT statement.

INSERT INTO `my_table` (col1,col2,col3)
SELECT col4,col5,col6
FROM `another_table`

The ON DUPLICATE KEY condition can be used with this statement as well.

Updating rows cross-tables is done with a JOIN statement (of the less declarative type).

UPDATE `my_table`,`other_table`
   SET `my_table`.col1 = `other_table`.col2
WHERE `my_table`.parent_id = `other_table`.id

It's important to try the INSERT / UPDATE select statements separately to determine how many rows they will select and how fast will they complete. INSERT / UPDATE operations are relatively costly and can be a severe bottleneck for the database if not managed correctly.

Share this article:
  • Digg
  • DZone
  • del.icio.us
  • Technorati

Tags: , , , ,

13 Responses to “Multiple row operations in MySQL / PHP”

  1. #1

    Re. the “ON DUPLICATE…” -
    1. Do you know how it compares to REPLACE query?
    2. Is there a built in support for such queries in Zend Framework DB component?

  2. #2
    Andre says:

    Thanks for the article, I was just thinking about this the other day, very useful information.

    How did you benchmark this? I would love to find a tool that can create those graphs for my php apps.

  3. #3
    Zippy says:

    How does this equate to using prepared statements? I generally have been using PDO::prepare for the query and then just pop in the values.

  4. #4

    @arik:
    1. REPLACE is different – if the row exists already it deletes it before insertion. This is not useful for maintaining relationships between tables, and it can cause faster index fragmentation (as rows are constantly deleted and inserted).
    2. Unfortunately no… for INSERT and UPDATE it uses just the basic syntax.

    @andre:
    I benchmarked it using PHP, pasted the results into an excel sheet and made a graph of it :) just tinkered a little with the colors. Too bad excel doesn’t have anti-aliasing…

    @zippy:
    The problem with using prepared statements for those scenarios is that the size of the query itself is dynamic. If your statement is prepared for one row at a time, it will still execute many times instead of one.
    The overhead might be slightly smaller (as the statement is prepared and cached after one run), but it will still under perform badly against a single query.

  5. #5
    Zippy says:

    I’m still relative new to maximizing my db queries, so I really enjoyed this article. Thanks for the response. I can think of a few instances where switching to the methods suggested would benefit.

  6. #6
    Andre says:

    here’s a question. In a scenario where you have a seperate database server removed from the web application, would it be better to go for the use more memory approach and pass one large SQL statement?

    I think it depends on the server architecture to determine when to use the multiple sql versus one sql solution.

  7. #7

    Yes, definitely the server architecture should be taken into consideration. In this case I would think that sending one query would have even more performance benefits, as remote connections are more costly due to higher latency.

    You can always mix-and-match approaches, for example splitting the operations into 6 major query instead of one giant to preserve memory if needed. The performance boost over 2500~ queries would still be significant.

  8. #8
    George says:

    @Arik Fraimovich: REPLACE and ON DUPLICATE KEY UPDATE are actually quite different. REPLACE actually deletes the row if it is inserted and then re-inserts it. This means that REPLACE will change any AUTOINCREMENT fields you have in your row, thus wasting IDs and even braking table relationships.

  9. #9

    @Eran & @George – thank you for the information. Going to rewrite some queries now :)

    @Eran – “ON DUPLICATE…” in Zend Framework: so you just passing SQL in such cases or made your own wrapper?

  10. #10

    Plain SQL :) of course, I make sure to properly filter and escape user input

  11. #11
    thomas says:

    hi i have 3 tables like a ,b c
    where a contains employee detail , b contains details of item consumed by each employee and c contains different types of items, one user can have many items

    so table b will have more than one entries of same id

    i want to show in a pattern like below in webpage

    employee 1– item1 — item2 — item 3 –item4
    employee 2– item1 — ‘ ‘— item 3 –item4
    employee — item1 — item2 — item 3 –item4– item5

    i want to fetch the details in above pattern

    i get result in query as belowe

    employee1 – item1
    employee1-item2

    how could i fetch the details in the patern i want

  12. #12
    Shantanu Oak says:

    Thanks for the tip.
    I have one question.

    >> At 2560 rows inserted, it took the loop
    What if I need to send tens of thousands of records?
    Will the server have any issues?

  13. #13
    bonita says:

    hi! what is the best way of inserting multiple entries into one table?

    For example: 10 employees to be inserted in one click with their name, email, address.. etc…

    Please help..thanks.

Leave a Comment