Assume that I maintain a huge denormalized table in the
database which is used to maintain the details of user’s login session. Right
now I am not interested in maintaining the user’s history, though it can be
made as a future enhancement. It is expected that the system will be accessed
by 1000+ users everyday more than once. Is it faster to have the user’s record
updated each time when he/she logs into my system or is it better to insert new
rows into my system for every session of his/her?
I believe we need to look this from two perspectives, the
speed of the present operation and second the impact it might cause on the other
future operations. Generally, the insert operation involves three or four
stages at max, i.e. create a record, check for referential integrity or primary
key constraints/insert trigger checks, third store the record in the database
and fourth make an entry in respective indices. Moreover, we are not deleting
the old records as database size is not a constraint, otherwise we need to
think about ‘insert……..on duplicate key update’ scenario too. There is huge
possibility in near future that our database size can grow out of control, and
index size will not fit in our buffer pool. In that case we need to resort to distributed
data storage or otherwise called data sharding which involves partitioning and
replication. But smart sharding is tough to achieve and best solution available
till now depends on the usage statistics to identify the hotspots.
Whereas the update involves search operation as a subset
work. First you need to search for the individual record, fetch it, update it,
check for the update trigger/referential integrity constraints, and then store
it in the database and finally update the indices too if they are affected. There
are lots of faster index structures like B+ tree which make the search
operation less time consuming, but still it is an overhead.
Thus, I feel we can come to the conclusion that insert a new
record would be better than update for logging in one record at a time when database
size is not a constraint. But when we examine it from the other perspective, i.e.
when a select/fetch operation has to be done, then fetching one matching record
would be faster than many. This would play a huge role in web related systems
where we display user information. So, it can be handled by attaching new field
to the record which will keep track of the record creation time-stamp which can
be used as a select criterion to fetch the latest/active record.
But does this hold good in case of bulk updates i.e. update
without condition? This time update scores over insert, as it is mere operation
of changing selected set of fields over all the records in the table. It is
just like one multiple row insert when compared to multiple single row insert
of related data.
There can be only one winner in a competition and he has to
be an all-rounder or the guy who meets most of our needs. Thus in today’s
world, memory has become insignificant whereas speed isn’t, so Insert has
emerged victorious!
No comments:
Post a Comment