Multiple row operations in MySQL / PHP

MySQL PHP Web development May 14th, 2009 by Eran Galperin

Multiple row operations are in common use in a normalized application databases as one database entity is often linked to multiple sub-entities (for example a user and his tags). By row operations I'm referring to write queries, namely UPDATE and INSERT queries (DELETE is less interesting so I'll leave it out for now).

Too often I've seen such queries ran in long loops one at a time, which is very bad for performance (as I will show here) and sometimes equally bad for integrity (if the process is interrupted). So what are the alternatives?

Inserting multiple rows

Insertion of multiple rows comes about often in batch jobs, database migrations and handling table relationships. A naive approach, via PHP, would be to loop over the data to be inserted, inserting one row at a time. Suppose the data is already properly filtered and quoted:

  1. foreach( $data as $row ) {
  2. $query = "INSERT INTO `test_table` (user_id,content)"
  3. . " VALUES ("
  4. . $row['user_id'] . ","
  5. . $row['content'] . ")";
  6. mysql_query($query);
  7. }

Depending on the amount of rows to be inserted, this can be a costly process. A better approach would be to concatenate the values into one insert query and then execute it:

  1. $values = array();
  2. foreach( $data as $row ) {
  3. $values[] = "(" . $row['user_id'] . "," . $row['content'] . ")";
  4. }
  5. if( !empty($values) ) {
  6. $query = "INSERT INTO `test_table` (user_id,content) VALUES "
  7. . implode(',',$values);
  8. mysql_query($query);
  9. }

What's the difference? lets see some benchmarks:
time
A single query completes much faster than looping through multiple queries. At 2560 rows inserted, it took the loop ~36 seconds to complete, yet the single query took just 0.14 seconds.

Memory consumption shows a reverse trend however:
memory
Since the values are concatenated to create the single query, it consumes more and more memory with more rows as it needs to hold a larger query string. At 2560 rows inserted, the single query approach consumed ~800kb in memory, while the loop consumed just ~60kb.

Since memory is much cheaper than CPU cycles and database connections, I'd usually opt for the single query approach. It's important though to be aware of the implications.

Inserting / Updating (multiple) values

Another use-case of multiple row operations is when we want to insert several rows that might exist already, and in case they exist we want to update existing values instead. Of course, we could check first which rows exist, update the ones that do and insert the ones that don't - for a total of at least three separate queries.

Inserting and updating in the same operation is done using INSERT ... ON DUPLICATE KEY UPDATE.

The KEY in this statement should be a unique key in the table you are inserting to (otherwise duplicates would be allowed and this operation would be moot). The syntax of this statement allows to pick which columns are updated in the case of matching rows. For example, suppose I'm adding translations for content pages. I have the following table schema:

 
pages
 - id
 - content
 - ...
 
pages_translations
 - page_id
 - lang_id
 - content
 

The pages_translations table has a primary (unique) key on ( page_id , lang_id ). When adding / updating a translation, the query would look something like:

INSERT INTO `pages_translations` (page_id,lang_id,content) 
VALUES (5,2,'the brown fox ...') 
ON DUPLICATE KEY UPDATE content=VALUES(content)
 

This will create a translation if one does not exist and update the content field if it does exist. We can combine this statement with the previous approach for multiple insertion to insert / update multiple rows with one statement.

Updating multiple rows can be done using this technique, or through a CASE .. WHEN condition for each row. This method is a bit more involved and usually results in bigger queries but it's worth noting. Example:

UPDATE `pages` SET content=(CASE 
    WHEN id=1 THEN 'first page content ...'
    WHEN id=2 THEN 'second page content ...'
END)

We need to repeat this format for each row which results is a somewhat verbose query.

Inserting / Updating with multiple tables

There are some cases where we would like to use data in one or several tables to update / insert into another table.

With insertion the syntax is relatively straightforward - replace the VALUES part of the statement with a SELECT statement.

INSERT INTO `my_table` (col1,col2,col3) 
SELECT col4,col5,col6 
FROM `another_table`

The ON DUPLICATE KEY condition can be used with this statement as well.

Updating rows cross-tables is done with a JOIN statement (of the less declarative type).

UPDATE `my_table`,`other_table` 
   SET `my_table`.col1 = `other_table`.col2 
WHERE `my_table`.parent_id = `other_table`.id

It's important to try the INSERT / UPDATE select statements separately to determine how many rows they will select and how fast will they complete. INSERT / UPDATE operations are relatively costly and can be a severe bottleneck for the database if not managed correctly.

Tags: , , , ,

If you liked this article you should follow me on Twitter and/or share below:
  • http://www.arikfr.com/blog Arik Fraimovich

    Re. the “ON DUPLICATE…” -
    1. Do you know how it compares to REPLACE query?
    2. Is there a built in support for such queries in Zend Framework DB component?

  • http://www.arikfr.com/blog Arik Fraimovich

    Re. the “ON DUPLICATE…” -
    1. Do you know how it compares to REPLACE query?
    2. Is there a built in support for such queries in Zend Framework DB component?

  • http://blog.midstride.com Andre

    Thanks for the article, I was just thinking about this the other day, very useful information.

    How did you benchmark this? I would love to find a tool that can create those graphs for my php apps.

  • http://blog.midstride.com Andre

    Thanks for the article, I was just thinking about this the other day, very useful information.

    How did you benchmark this? I would love to find a tool that can create those graphs for my php apps.

  • Zippy

    How does this equate to using prepared statements? I generally have been using PDO::prepare for the query and then just pop in the values.

  • Zippy

    How does this equate to using prepared statements? I generally have been using PDO::prepare for the query and then just pop in the values.

  • http://www.techfounder.net Eran Galperin

    @arik:
    1. REPLACE is different – if the row exists already it deletes it before insertion. This is not useful for maintaining relationships between tables, and it can cause faster index fragmentation (as rows are constantly deleted and inserted).
    2. Unfortunately no… for INSERT and UPDATE it uses just the basic syntax.

    @andre:
    I benchmarked it using PHP, pasted the results into an excel sheet and made a graph of it :) just tinkered a little with the colors. Too bad excel doesn’t have anti-aliasing…

    @zippy:
    The problem with using prepared statements for those scenarios is that the size of the query itself is dynamic. If your statement is prepared for one row at a time, it will still execute many times instead of one.
    The overhead might be slightly smaller (as the statement is prepared and cached after one run), but it will still under perform badly against a single query.

  • http://www.techfounder.net Eran Galperin

    @arik:
    1. REPLACE is different – if the row exists already it deletes it before insertion. This is not useful for maintaining relationships between tables, and it can cause faster index fragmentation (as rows are constantly deleted and inserted).
    2. Unfortunately no… for INSERT and UPDATE it uses just the basic syntax.

    @andre:
    I benchmarked it using PHP, pasted the results into an excel sheet and made a graph of it :) just tinkered a little with the colors. Too bad excel doesn’t have anti-aliasing…

    @zippy:
    The problem with using prepared statements for those scenarios is that the size of the query itself is dynamic. If your statement is prepared for one row at a time, it will still execute many times instead of one.
    The overhead might be slightly smaller (as the statement is prepared and cached after one run), but it will still under perform badly against a single query.

  • Zippy

    I’m still relative new to maximizing my db queries, so I really enjoyed this article. Thanks for the response. I can think of a few instances where switching to the methods suggested would benefit.

  • Zippy

    I’m still relative new to maximizing my db queries, so I really enjoyed this article. Thanks for the response. I can think of a few instances where switching to the methods suggested would benefit.

  • http://blog.midstride.com Andre

    here’s a question. In a scenario where you have a seperate database server removed from the web application, would it be better to go for the use more memory approach and pass one large SQL statement?

    I think it depends on the server architecture to determine when to use the multiple sql versus one sql solution.

  • http://blog.midstride.com Andre

    here’s a question. In a scenario where you have a seperate database server removed from the web application, would it be better to go for the use more memory approach and pass one large SQL statement?

    I think it depends on the server architecture to determine when to use the multiple sql versus one sql solution.

  • http://www.techfounder.net Eran Galperin

    Yes, definitely the server architecture should be taken into consideration. In this case I would think that sending one query would have even more performance benefits, as remote connections are more costly due to higher latency.

    You can always mix-and-match approaches, for example splitting the operations into 6 major query instead of one giant to preserve memory if needed. The performance boost over 2500~ queries would still be significant.

  • http://www.techfounder.net Eran Galperin

    Yes, definitely the server architecture should be taken into consideration. In this case I would think that sending one query would have even more performance benefits, as remote connections are more costly due to higher latency.

    You can always mix-and-match approaches, for example splitting the operations into 6 major query instead of one giant to preserve memory if needed. The performance boost over 2500~ queries would still be significant.

  • http://maicovschi.mp George

    @Arik Fraimovich: REPLACE and ON DUPLICATE KEY UPDATE are actually quite different. REPLACE actually deletes the row if it is inserted and then re-inserts it. This means that REPLACE will change any AUTOINCREMENT fields you have in your row, thus wasting IDs and even braking table relationships.

  • http://maicovschi.mp George

    @Arik Fraimovich: REPLACE and ON DUPLICATE KEY UPDATE are actually quite different. REPLACE actually deletes the row if it is inserted and then re-inserts it. This means that REPLACE will change any AUTOINCREMENT fields you have in your row, thus wasting IDs and even braking table relationships.

  • http://www.arikfr.com/blog Arik Fraimovich

    @Eran & @George – thank you for the information. Going to rewrite some queries now :)

    @Eran – “ON DUPLICATE…” in Zend Framework: so you just passing SQL in such cases or made your own wrapper?

  • http://www.arikfr.com/blog Arik Fraimovich

    @Eran & @George – thank you for the information. Going to rewrite some queries now :)

    @Eran – “ON DUPLICATE…” in Zend Framework: so you just passing SQL in such cases or made your own wrapper?

  • http://www.techfounder.net Eran Galperin

    Plain SQL :) of course, I make sure to properly filter and escape user input

  • http://www.techfounder.net Eran Galperin

    Plain SQL :) of course, I make sure to properly filter and escape user input

  • thomas

    hi i have 3 tables like a ,b c
    where a contains employee detail , b contains details of item consumed by each employee and c contains different types of items, one user can have many items

    so table b will have more than one entries of same id

    i want to show in a pattern like below in webpage

    employee 1– item1 — item2 — item 3 –item4
    employee 2– item1 — ‘ ‘— item 3 –item4
    employee — item1 — item2 — item 3 –item4– item5

    i want to fetch the details in above pattern

    i get result in query as belowe

    employee1 – item1
    employee1-item2

    how could i fetch the details in the patern i want

  • thomas

    hi i have 3 tables like a ,b c
    where a contains employee detail , b contains details of item consumed by each employee and c contains different types of items, one user can have many items

    so table b will have more than one entries of same id

    i want to show in a pattern like below in webpage

    employee 1– item1 — item2 — item 3 –item4
    employee 2– item1 — ‘ ‘— item 3 –item4
    employee — item1 — item2 — item 3 –item4– item5

    i want to fetch the details in above pattern

    i get result in query as belowe

    employee1 – item1
    employee1-item2

    how could i fetch the details in the patern i want

  • http://oksoft.antville.org Shantanu Oak

    Thanks for the tip.
    I have one question.

    >> At 2560 rows inserted, it took the loop
    What if I need to send tens of thousands of records?
    Will the server have any issues?

  • http://oksoft.antville.org Shantanu Oak

    Thanks for the tip.
    I have one question.

    >> At 2560 rows inserted, it took the loop
    What if I need to send tens of thousands of records?
    Will the server have any issues?

  • bonita

    hi! what is the best way of inserting multiple entries into one table?

    For example: 10 employees to be inserted in one click with their name, email, address.. etc…

    Please help..thanks.

  • bonita

    hi! what is the best way of inserting multiple entries into one table?

    For example: 10 employees to be inserted in one click with their name, email, address.. etc…

    Please help..thanks.

  • tony

    Very useful post. Thanx.

    Is there a way to be able to insert + on duplicate update multiple rows using a single query?

  • tony

    Very useful post. Thanx.

    Is there a way to be able to insert + on duplicate update multiple rows using a single query?

  • ChrisRaven

    Hi,

    There actually is a way to update multiple rows in one UPDATE query, e.g.:

    UPDATE table
    SET updated_field = CASE tested_field
    WHEN tested_field_value1 THEN updated_field_value1
    WHEN tested_field_value2 THEN updated_field_value2
    – etc.
    ELSE updated_field — important part!
    END

    It works at least in MySQL. Probably in other databases too.

    Best regards,
    ChrisRaven

  • ChrisRaven

    Hi,

    There actually is a way to update multiple rows in one UPDATE query, e.g.:

    UPDATE table
    SET updated_field = CASE tested_field
    WHEN tested_field_value1 THEN updated_field_value1
    WHEN tested_field_value2 THEN updated_field_value2
    – etc.
    ELSE updated_field — important part!
    END

    It works at least in MySQL. Probably in other databases too.

    Best regards,
    ChrisRaven

  • http://oseeyou.com/ Chinh Dung

    Thank, i choose insert multiple. I can insert more more rows and many users on my page using insert in one time.