We had a report of crashes occurring for certain users when accessing a system. From the stack data in the production logs, a timeout was occurring when running a specific stored procedure. This procedure was written around 5 years ago and is in use in many customer databases without issue. Why would the same SQL suddenly start timing out in one particular database?
The stored procedure in question is called for users with certain permissions to highlight outstanding units of work that their access level permits them to do, and is a fairly popular (and useful) feature of the software.
After obtaining session information from the crash logs, it was time to run the procedure on a copy of the live database with session details. The procedure only reads information, but doing this on a copy helps ensure no ... accidents occur.
EXEC [Data].[GetX] @strSiteId = 'XXX', @strUserGroupId = 'XXX', @strUserName = 'XXX'
And it took... 27 seconds to return 13 rows. Not good, not good at all.
Viewing the query plan showed something interesting though - one of the nodes was flagged with a warning symbol, and when the mouse was hovered over it it stated
Type conversion in expression (CONVERT_IMPLICIT(nvarchar(50),[Pn].[SiteId],0)) may affect "CardinalityEstimate" in query plan choice
Time to check the procedure's SQL as there shouldn't actually be any conversions being done, let alone implicit ones.
I can't publish the full SQL in this blog, so I've chopped out all the table names and field names and used dummy aliases. The important bits for the purposes of this post are present though, although I apologize that it's less than readable now.
CREATE PROCEDURE [Data].[GetX] @strSiteId nvarchar (50) , @strUserGroupId varchar (20) , @strUserName nvarchar (50) AS BEGIN SELECT [Al1].[X] , [Al1].[X] , [Al1].[X] , [Al1].[X] INTO [#Access] FROM [X].[X] [Al1] WHERE [Al1].[X] = @strUserName AND [Al1].[X] = @strUserGroupId AND [Al1].[X] = 1 AND [Al1].[X] = 1 SELECT DISTINCT [Pn].[Id] [X] FROM [Data].[X] [Pn] INNER JOIN [Data].[X] [Al2] ON [Al2].[X] = [Pn].[Id] AND [Al2].[X] = 0 INNER JOIN [Data].[X] [Al3] ON [Al3].[X] = [Al2].[Id] AND [Al3].[X] = 0 INNER JOIN [Data].[X] [Al4] ON [Al4].[X] = [Al3].[Id] AND [Al4].[X] = 0 INNER JOIN [Data].[X] [Al5] ON [Al5].[X] = [Al4].[Id] AND [Al5].[X] = 0 AND [Al5].[X] = 1 AND [Al5].[X] = 0 INNER JOIN [#Access] ON [#Access].[X] = [Al5].[X] AND [#Access].[X] = [Al2].[X] AND [#Access].[X] = [Al3].[X] AND [#Access].[X] = [Al4].[X] WHERE EXISTS ( SELECT [X] FROM [X].[X] [Al6] WHERE [Al5].[X] = [Al6].[X] AND [Al5].[X] = [Al6].[X] AND [Al6].[X] = 1 ) AND [Pn].[SiteId] = @strSiteId; DROP TABLE [#Access] END;
The SQL is fairly straight forward - we join a bunch of
different data tables together based on permissions, data status
and where the
[SiteId] column matches the lookup value, return
return a unique list of core identifiers. With the exception of
[SiteId] all those joins on
[Id] columns are integers.
[SiteId]is the primary key in a table. Yes, I know it isn't a good idea using string keys. It was a design decision made over 8 years ago and I'm sure at some point these anomalies will be changed. But it's a side issue to what this post is about.
As the warning from the query plan is quite explicit about the
column it's complaining about, it is now time to check the
definition of the table containing the
[SiteId] column. Again,
I'm not at liberty to include anything other than the barest
information to show the problem.
CREATE TABLE [X].[X] ( [SiteId] varchar(50) NOT NULL CONSTRAINT [PK_X] PRIMARY KEY ... ); GO
Can you see the problem? The table defines
varchar(50) - that is, up to 50 ASCII characters. The stored
procedure on the other hand defines the
(that is used as a
WHERE clause for
nvarchar(50), i.e. up to 50 Unicode characters. And there we
go, implicit conversion from Unicode to ASCII that for some
(still unknown at this stage) reason destroyed the performance
of this particular database.
After changing the stored procedure (remember I'm on a copy of
the production database!) to remove that innocuous looking
I reran the procedure which completed instantly. And the warning
has disappeared from the plan.
The error probably originally occurred as a simple oversight -
almost all character fields in the database are
Those that are
varchar are ones that control definition data
that cannot be entered, changed or often even viewed by end
users. Anything that the end user can input is always
due to the global nature of the software in question.
Luckily, it's a simple fix, although potentially easy to miss, especially as you might immediately assume the SQL itself is to blame and try to optimize that.
The take away from this story is simple - ensure that the data types for variables you use in SQL match the data types of the fields to avoid implicit conversions that can cause some very unexpected and unwelcome performance issues - even years after you originally wrote the code.
- 2016-05-06 - First published
- 2020-11-21 - Updated formatting