fredag, maj 04, 2012

SQL Server, troubleshooting locks

Today was the end of a fairly exhausting work week of which most consisted of what seemed to be a deadlocked query but turned out to be something entirely different altogether. The symptom was that the SSIS package executed for what seemed like an eternity. After some investigation it turned out that the package itself consumed no CPU and no IO -it was waiting. The SPID in SQL Server consumed no IO but some CPU. There were some locks related to the transaction initiated by this SPID which by me seemed to be the problem but we could not verify this as we lacked access on the server. We made an effort in disabling the transaction in the SSIS package which rendered the exact same symptom and this was the point where we noticed that the transaction was not a part of the problem at all. After spending some more time tracing and analysing what was actually going on on the server the root cause was found; a badly written query consisting in part of a WHERE NOT EXIST (SELECT ... clause. It turned out that the amount of data in the stage area in combination with a growing amount of data in the detail area together made the query useless. It took 16h+ to execute, after which we had to ask our DBA to kill it. The solution was simple enough; rewrite the query to use LEFT OUTER JOIN ... a ... WHERE a.col IS NULL. By using this technique instead the query finished in less than 2s.

Lesson learned; if it doesn't look like a deadlock then don't treat it as such.

Inga kommentarer: