Monday, 9 July 2012

Insert Vs Update Query


Assume that I maintain a huge denormalized table in the database which is used to maintain the details of user’s login session. Right now I am not interested in maintaining the user’s history, though it can be made as a future enhancement. It is expected that the system will be accessed by 1000+ users everyday more than once. Is it faster to have the user’s record updated each time when he/she logs into my system or is it better to insert new rows into my system for every session of his/her?

I believe we need to look this from two perspectives, the speed of the present operation and second the impact it might cause on the other future operations. Generally, the insert operation involves three or four stages at max, i.e. create a record, check for referential integrity or primary key constraints/insert trigger checks, third store the record in the database and fourth make an entry in respective indices. Moreover, we are not deleting the old records as database size is not a constraint, otherwise we need to think about ‘insert……..on duplicate key update’ scenario too. There is huge possibility in near future that our database size can grow out of control, and index size will not fit in our buffer pool. In that case we need to resort to distributed data storage or otherwise called data sharding which involves partitioning and replication. But smart sharding is tough to achieve and best solution available till now depends on the usage statistics to identify the hotspots.

Whereas the update involves search operation as a subset work. First you need to search for the individual record, fetch it, update it, check for the update trigger/referential integrity constraints, and then store it in the database and finally update the indices too if they are affected. There are lots of faster index structures like B+ tree which make the search operation less time consuming, but still it is an overhead.

Thus, I feel we can come to the conclusion that insert a new record would be better than update for logging in one record at a time when database size is not a constraint. But when we examine it from the other perspective, i.e. when a select/fetch operation has to be done, then fetching one matching record would be faster than many. This would play a huge role in web related systems where we display user information. So, it can be handled by attaching new field to the record which will keep track of the record creation time-stamp which can be used as a select criterion to fetch the latest/active record.

But does this hold good in case of bulk updates i.e. update without condition? This time update scores over insert, as it is mere operation of changing selected set of fields over all the records in the table. It is just like one multiple row insert when compared to multiple single row insert of related data.

There can be only one winner in a competition and he has to be an all-rounder or the guy who meets most of our needs. Thus in today’s world, memory has become insignificant whereas speed isn’t, so Insert has emerged victorious!

No comments:

Post a Comment