Replace vs On Duplicate

Replace vs On Duplicate Started by Spuds · July 27, 2015, 10:30:38 am · Read 4119 times 0 Members and 1 Guest are viewing this topic. previous topic - next topic

Replace vs On Duplicate

July 27, 2015, 10:30:38 am

Working on one of my addons, I had to refresh my mind about our insert statement for MySQL

Code: [Select]

$db->insert('', <- this is replace by default
$db->insert('ignore', 
$db->insert('insert',

Thinking about is some, why do we use REPLACE INTO instead of ON DUPLICATE KEY UPDATE

Seems REPLACE INTO is legacy as far as I can tell and there may be a performance gain changing that over to ON DUPLICATE for those cases where there is a duplicate row (primary key).

In those cases REPLACE does a insert attempt, if errors, delete the duplicate row and inserts a new one. ON DUPLICATE does the same insert attempt, if errors, updates the existing row (maintains same primary key). So ON DUPLICATE avoids a row removal and maintains the index a bit better.

Any ideas?

Re: Replace vs On Duplicate

Reply #1 – July 27, 2015, 11:31:34 am

Older versions didn't have it. On MyISAM I think it is slower.

It makes sense to update when possible though.

Re: Replace vs On Duplicate

Reply #2 – July 27, 2015, 11:42:13 am

Next question, please? LOL

I feel in about 90/95% of the cases we use $db->insert('', just because we are going to insert something that is really new (like a new post), not because we really need the "REPLACE" thingy.
That said, the only potential discrepancy I see at the moment, is the return of insert_id and postgre.
I mean: at the moment the two inserts in mysql and postgre behaves exactly the same (if present delete and insert anew). In postgre, the insert_id is implemented as a "SELECT the last row id" query.
So, it may be tricky to make postgre update the row and return a consistent id^[1], as such it would be easier to keep the current behaviour in postgre. From that the little discrepancy in the ids comparing two forums run one on mysql and on on postgre, but not something to be worried about I guess.

ETA: I'm completely ignorant on the performance-side of things (or the support versions).

ETA2: reading this old post https://www.percona.com/blog/2007/01/18/insert-on-duplicate-key-update-and-replace-into/ I wonder: can ON DUPLICATE KEY UPDATE handle multiple row insertions? Because from the article it looks like REPLACE, but it doesn't say anything about ODKU...
And from time to time we do that.

¹We'd have to take care ourself of the storing of the id updated and then "forget about it" if necessary and/or return that id instead of the last one. Yeah, possible, but a little fragile I think.↵

Re: Replace vs On Duplicate

Reply #3 – July 27, 2015, 12:39:37 pm

QuoteI feel in about 90/95% of the cases we use $db->insert('', just because we are going to insert something that is really new (like a new post), not because we really need the "REPLACE" thingy

I'd bet that is right, and for all those cases the difference in performance is zero as well, they both will do an insert.

QuoteThat said, the only potential discrepancy I see at the moment, is the return of insert_id and postgre.

Yup, a valid observation. I don't think I would worry about delta's between postgre and mysql WRT the primary key being a different value for a row. Maybe I'm missing something about why that could be bad.

Reading around it appears this capability will be added with postgre 9.5 (ON CONFLICT UPDATE) but before that you would have to write a function that trys to update and if it fails try to insert instead, there was one in the user guide but it has some "short comings"

QuoteI wonder: can ON DUPLICATE KEY UPDATE handle multiple row insertions?

It seems to work the same from a couple of examples I saw.

We could also add

Code: [Select]

$db->insert('update', '

to the db repertoire, mysql would use ON DUPLICATE UPDATE and postgre could just use it as an alias for the current REPLACE. There are other cool things we could do with ON DUPLICATE, like increment a counter on the existing row but not worth the effort due to minor gains vs major pains to implement that on the two schemas

Re: Replace vs On Duplicate

Reply #4 – July 27, 2015, 01:00:17 pm

Cool then!

Or maybe, follow an old Joshua suggestion:

Code: [Select]

$db->update('query')

so we don't have to remember all the parameters. lol

Re: Replace vs On Duplicate

Reply #5 – July 27, 2015, 01:41:42 pm

I made a mistake in my post, meant to say

Code: [Select]

 $db->insert('update',

Anyway I like removing that first parameter, but not sure if its less confusing in this instance. Here the primary action is insert with an update if it exists ...

if we have the command as

Code: [Select]

 $db->update

would you expect it to insert if it could not update the record or return an error $:-\$

Re: Replace vs On Duplicate

Reply #6 – July 27, 2015, 02:21:28 pm

Yeah, you are right, it would be probably more logical to have an update that actually does the update and not an insert. LOL

Re: Replace vs On Duplicate

Reply #7 – July 28, 2015, 09:27:02 pm

$db->uspert()

Re: Replace vs On Duplicate

Reply #8 – July 29, 2015, 07:11:26 am

I've seen this referred to as "upsert" as well. We could that even though postgre will be a replace until 9.5, or we could try one of the upsert functions out there for postgre.

Re: Replace vs On Duplicate

Reply #9 – July 29, 2015, 07:28:44 am

I'm sure I'll confuse it with upset! LOL