Monday, November 16, 2009

Date and time types in SQL Server

Prior to SQL Server 2008, there are two types in SQL Server to handle date and time. Since I will be referring to these types a lot, I introduce an abbreviation for each type in below two tables (the "sn" column):
Name
sn
Minimum value
Maximum value
Accuracy
Storage
smalldatetime
sdt
1900-01-01 00:00:00
2079-06-06 23:59:00
minute
4 bytes
datetime
dt
1753-01-01 00:00:00.000
9999-12-31 23:59:59.997
3.33 ms
8 bytes

Note that there is no type to store only date or only time. Above types, have both a date and a time portion.
If you only specify a date portion, then SQL Server will store 00:00:00.000 in the time portion.
And if you only specify a time portion, SQL Server will store 1900-01-01 in the date portion.

Above is important. Read it again.

SELECT CAST ('20041223' AS datetime)
-----------------------
2004-12-23 00:00:00.000

SELECT CAST ('14:23:58' AS datetime)
-----------------------
1900-01-01 14:23:58.000


As of SQL Server 2008, we have several new types related to date and time:
Name
sn
Minimum value
Maximum value
Accuracy
Storage
datetime2
dt2
0001-01-01 00:00:00.0000000
9999-12-31 23:59:59.9999999
100ns
6-8 bytes
date
d
0001-01-01
9999-12-31
day
3 bytes
time
t
00:00:00.0000000
23:59:59.9999999
100ns
3-5 bytes
datetimeoffset
dto
0001-01-01 00:00:00.0000000
9999-12-31 23:59:59.9999999
100ns
8-10 bytes

  • As you can see, we have finally a date-only and a time-only type.
  • Datetime2 is a "better datetime" for various reasons, and it does not cost any more storage than datetime - potentially less!
  • For the new types, which include a time portion, you can specify a "fractional seconds precision" part, which specifies how many digits you have to the right of the comma after the second. So, for instance time (3) can store a value like 14:23:12.567, where if you input 14:23:12.5677 it will be rounded to 14:23:12.568.
  • The new datetimeoffset includes a time zone offset part.

Date and time formats

A common misconception is that SQL Server stores these datatypes in some particular readable format. That is not the case. SQL Server stores these values in an internal format (for instance two integers for datetime and smalldatetime). However, when you use T-SQL to specify a value (in an INSERT statement, for instance) you express it as a string. And there are rules for how different datetime string formats are interpreted by SQL Server. But note that SQL Server does not remember the format in any way.



Date and time formats for input

There are many formats available for us to specify a date/time/datetime value. Some are "better" than others are and as you read this article, I hope you will understand what I mean by "better". Interestingly enough, all formats are available for all types. So even a time-only format is allowed for a date-only type etc.
(I am ignoring the time zone offset part, which is only used for the datetimeoffset type - you can read more about this in Books Online.)

Name
sn
Format
SET DATEFORMAT dependent
SET LANGUAGE dependent
language neutral
Unseparated
u
'19980223 14:23:05'
none
none
all
Separated
s
'02/23/1998 14:23:05'
all
all
no
ANSI SQL
ansisql
'1998-12-23 14:23:05'
sdt, dt
sdt, dt
not for sdt and dt
Alphabetic
a
'23 February 1998 14:23:05'
none
all (month name)
no
ODBC datetime
odt
{ts '1998-02-23 14:23:05'}
none
none
all
ODBC date
od
{d '1998-02-23'}
none
none
all
ODBC time
ot
{t '14:23:05'}
none
none
all
ISO 8601
iso
'1998-02-23T14:23:05'
none
none
all
Time
t
'14:23:05'
'2:23:05 PM'
none
none
all

  • Note that ANSI SQL is really only a special case of the separated (aka "numeric") format, which allow for separators dash (-), slash (/) and dot (.). But since this is the only format defined in the ANSI SQL standard, I think it is worth mentioning this as a special case.
  • Most formats allow for dropping the date and/or the time portion, and in some cases, it can look a bit ... strange. It might seem stupid to specify for instance '2008-08-25' for a time type, but doing that result in same as specifying nothing in the datetime string, ''. Consider below:
    SELECT CAST ('' AS time)
    SELECT CAST ('2008-08-25' AS time)
    above two gives, us the same result (time 00:00:00).
  • The ODBC formats are different in the sense that they have a marker (literal_type, t, d or ts), you need to define correctly depending on whether you specify both date and time, date only or time only.
  • The ISO 8601 format requires both a date and a time portion.
  • SET DATEFORMAT inherits its setting from SET LANGUAGE (but an explicit SET DATEFORMAT will override later SET LANGUAGE). The language defaults to each login's language. Default language for a login is specified using sp_configure.
  • The rules regarding the format for the date part and the new types can be confusing. Microsoft wanted to make the new date related types (date, datetime2 and datetimeoffset) less dependent on these settings and more ANSI SQL compliant. End result is that these new types are language neutral for a separated datetime literal as long as the year comes first. SQL Server need to determine that this is the year part, so it requires 4 numbers year (yyyy, not yy). If that is the case, then the string will be interpreted as year first, then month and finally day - regardless of DATEFORMAT or language setting. But if the month part comes first, then DATEFORMAT and language setting will be "honored":
SET LANGUAGE British --uses dmy
GO
SELECT CAST('02-23-1998 14:23:05' AS date) --Error
GO
SELECT CAST('2/23/1998 14:23:05' AS date) --Error
GO

SELECT CAST('1998-02-23 14:23:05' AS date) --Ok
GO
SELECT CAST('1998.02.23 14:23:05' AS date) --Ok
GO
SELECT CAST('1998/02/23 14:23:05' AS date) --Ok
GO


The first two fails because the year doesn't come first (and there are not 23 months in 1998). The last three doesn't fail because the year comes first (and we are using one of the new style date related types).
Crystal clear, huh? :-)
The available formats are documented in Books Online, so I won't go into details about each format. Here's a link to the online version of Books Online, see each subsection for the different formats.

Note that the ODBC time only format will give the current date, not 1900-01-01 as the other time-only formats.

Some conclusions:
  • The Alphabetic format is especially bad since this is SET LANGUAGE dependent for all types.
  • The smalldatetime and datetime types are unfortunate since they are dependent on not only SET LANGUAGE but also SET DATEFORMAT (including the login's language setting).
  • You might think that it doesn't matter if a format/type combination is SET LANGUAGE dependent or not since "I never execute that SET LANGUAGE command anyhow". But note very carefully that the default for DATEFORMAT is drawn from the login specification (CREATE LOGIN etc.). Imagine for instance an app developed in the US with login having default language (us_english) using m/d/y format and date time. Somebody in, say, Germany installs it and bang! The app does not work anymore. Do I have your attention now? :-)
  • The types introduced in 2008 are friendlier since they are not SET DATEFORMAT dependent if you for a separated format specify the year first. For the separated formats (including ANSI SQL), the new types always interprets month before date.


Recommendations for input

As you probably realize, you don't want to use a combination of format and type, which isn't language neutral unless you make sure that you have the correct DATEFORMAT and LANGUAGE settings in your applications. Be aware that setting these options inside a stored procedure can cause a recompile of the procedure at execution time!
My recommendation is to use a language neutral format. I typically use the unseparated format, mostly out of old habit. The ISO 8601 format has the advantage of being defined in an international standard.

SQL Server 2008 has just been released at the time of writing this, but I will probably move towards date, time, datetime2 and possibly datetimeoffset over time. Over time, I might be using the ANSI SQL format (being ANSI SQL compliant) with the new types, but i'd rather that ANSI SQL supports the ISO 8601 format. I will play it safe and use unseparated or ISO 8601 for a number of years to come - even when working with the new types.



Warnings and common misconceptions


Let me say this again: you don't want to use a combination of format and type, which isn't language neutral unless you make sure that you have the correct DATEFORMAT and LANGUAGE settings in your applications.

The Numeric format can use dash (-), dot (.) or slash (/) as separator. The rules for how SQL Server parses the string doesn't change depending on the separator. A common misconception is that the ANSI SQL format (sometime a bit incorrectly referred to as the "ISO format"), 1998-02-23, is language neutral for smalldatetime and datetime. It isn't. It is a numeric format and hence it is dependent on the SET DATEFORMAT and SET LANGUAGE setting:

SET LANGUAGE us_english
SELECT CAST('2003-02-28' AS datetime)
-----------------------
2003-02-28 00:00:00.000

SET LANGUAGE british
SELECT CAST('2003-02-28' AS datetime)


Server: Msg 242, Level 16, State 3, Line 4
The conversion of a char data type to a datetime data type resulted in an out-of-range datetime value.




Please read the error message closely. It says exactly what the problem is. You specify the datetime value as a string, and because the string isn't formed according to the format you are using and the DATEFORMAT setting, SQL Server cannot convert the string to a datetime value.


Output of datetime values

When a datetime value leaves SQL Server is has no readable format. It is the client application that formats the value so it is human-readable. Some applications/developer environments will respect the regional settings of the client machine and format the value accordingly. But this is out of SQL Server's control. You can return the data from SQL Server using a specific format, but that requires you to convert it to a string, and the value will not be of the datetime type anymore. If you want to do this, use the CONVERT function, which has a useful 3:rd parameter that defines the format when you convert from datetime to a string. A few examples:



SELECT CONVERT(char(8), CURRENT_TIMESTAMP, 112)
--------
20040312

SELECT CONVERT(char(19), CURRENT_TIMESTAMP, 120)
-------------------
2004-03-12 18:08:43


SELECT CONVERT(char(10), CURRENT_TIMESTAMP, 103)
----------
12/03/2004

Searching for datetime values

The problem with datetime searches often comes from the fact that if you use smalldatetime, datetime, datetime2 or datetimeoffset then there is both a date and a time portion. Let's create table that we will use in a couple of examples:

CREATE TABLE #dts(c1 char(1), dt datetime)
INSERT INTO #dts (c1, dt) VALUES('a', '20040305 09:12:59')
INSERT INTO #dts (c1, dt) VALUES('b', '20040305 16:03:12')
INSERT INTO #dts (c1, dt) VALUES('c', '20040306 00:00:00')
INSERT INTO #dts (c1, dt) VALUES('d', '20040306 02:41:32')
INSERT INTO #dts (c1, dt) VALUES('e', '20040315 11:45:17')
INSERT INTO #dts (c1, dt) VALUES('f', '20040412 09:12:59')
INSERT INTO #dts (c1, dt) VALUES('g', '20040523 11:43:25')
Say that we want to find all rows from 2004-03-15. A common mistake is to search like this:

SELECT c1, dt FROM #dts WHERE dt = '20040305'

No rows returned. Why? What happened? We have two different types in the WHERE clause, a datetime type (the column side) and a string (the right side). SQL Server will convert one to the other according to the documented "Data Type Precedence" in Books Online. Datetime has higher precedence than a string so the string will first be converted to a datetime type. As we didn't specify a time portion, the string will be converted to the datetime value 2004-03-05 00:00:00. There are no rows with that datetime value.
OK, another alternative. We use the CONVERT function on the column side and convert it to a string so it matches the string format on the right side:


SELECT c1, dt FROM #dts WHERE CONVERT(char(8), dt, 112) = '20040305'

That returned the expected rows, but there's another problem. Since we do a calculation on the column side (using the CONVERT function), SQL Server cannot use an index to support this search criteria. This can be disastrous for performance! So let's try BETWEEN:

SELECT c1, dt FROM #dts WHERE dt BETWEEN '20040305' AND '20040306'

Because BETWEEN is inclusive, we also returned the row from 2004-03-06 00:00:00. Let's try to handle that:

SELECT c1, dt FROM #dts WHERE dt BETWEEN '20040305' AND '20040305 23:59:59.999'

What? We still got the row from 2004-03-06 00:00:00 back. Why? We need to look at the type definitions again. The resolution for the datetime type is 3.33 ms. That means that 2004-03-05 23:59:59.999 will be rounded up to 2004-03-06 00:00:00.000. Not what we expected. To get around that rounding problem, we try:

SELECT c1, dt FROM #dts WHERE dt BETWEEN '20040305' AND '20040305 23:59:59.997'

Above return the expected rows, but I would not call it intuitive and easy to read. Also, if the type in the table is smalldatetime instead of datetime, the time portion will again be rounded up, hence not producing the expected rows. So you find yourself having two ways of doing this dependent on whether you have datetime or smalldatetime. If the type is smalldatetime, you would use:

SELECT c1, dt FROM #dts WHERE dt BETWEEN '20040305' AND '20040305 23:59:00'

Having two different ways of doing this depending on the datatype is not something I recommend. And, who knows if Microsoft in the future will add types with higher precision for the time portion (which they did in SQL Server 2008)? Again you would have to adjust the time portion in the WHERE clause. Below you find my recommendation:

SELECT c1, dt FROM #dts WHERE dt >= '20040305' AND dt < '20040306'

We get the expected rows back and it is easy to understand what the criteria means. And SQL Server can use an index on the dt column. This is also easy to expand for searches on greater ranges. Say we want to return all rows for March 2004:

SELECT c1, dt FROM #dts WHERE dt >= '20040301' AND dt < '20040401'

Same logic. Here's another way to search for all rows from a specific month:

SELECT c1, dt FROM #dts WHERE DATEPART(year, dt) = 2004 AND DATENAME(month, dt) = 'March'

Above returns the expected rows, but there are two problems. Since there is a calculation on the datetime column, SQL Server cannot use an index on the dt column to support the query. And, the DATENAME function is dependent on SET LANGUAGE.

Getting rid of the time portion

It is very common that you only want to work with date, and you want to get rid of the time portion. As you should know by now it is impossible if you are a using smalldatetime, datetime, datetime2 or datetimeoffset type. If you are using date, then no problem - this is what date is for, after all! But if you insist for some reason to use smalldatetime, datetime, datetime2 or datetimeoffset then perhaps setting the time portion to 00:00:00 is fine for you? Here's my suggestion for an expression that returns a string that can be (implicitly) converted to a datetime value, regardless of SET LANGUAGE or DATEFORMAT:


CONVERT(char(8), CURRENT_TIMESTAMP, 112)

As you see, I use CONVERT with conversion code 112, which return the date in unseparated format, as a string. And you already know that the unseparated format is interpreted regardless of datetime or language settings when converted to datetime. Let us wrap above in a CAST to convert it to datetime:

SELECT CAST(CONVERT(char(8), CURRENT_TIMESTAMP, 112) AS datetime)


In case you truly only want to return the date and no time portion to the client application, then you have to return a string instead (or as of 2008, the date type). Again, read about the different conversion codes for the CONVERT function and you will probably find a format which suits you.


Hugo Kornelis had some feedback on this topic and suggests another technique for setting the time portion to 00:00:00. The idea is to decide on a base date (any date within SQL Server's datetime range will do), specify it so that time portion is 00:00:00. You first take the difference between today and this base date, in days. Then you add this number of days to the base date. Algorithm: [today]-[refdate]+[refdate], or expressed with TSQL (using 20040101 as base date):

SELECT DATEADD(day, DATEDIFF(day, '20040101', CURRENT_TIMESTAMP), '20040101')

An argument for above is that is is faster than doing string manipulations. My opinion has been that the difference is so small (a few microseconds according to tests I did about year 2000) so that this would probably not even be measurable. However SQL Server MVP Dan Guzman emailed me with some to me surprising information. Here's a quote from the email:

"I've found that the DATEADD technique significantly outperforms the string method when the expression is applied to a column in large queries. For example, I often use the DATEADD method to group by date when the datetime column includes a time component. I haven't tested the performance recently but I recall it was minutes in some cases."

Dan is someone I trust very much, so I guess I have some re-learning to do (start using the DATEADD method instead of the string method). Dan also followed up with a script file showing this. Here it is (with some minor formatting modifications made by me). My tests show consistent results on 2005 and 2008 where for 10,000,000 rows the DATEADD version took about 6 seconds and the string version took about 13 seconds. Dan reported even greater difference on 2000.


2009-04-03, I received an email from HÃ¥kan Borneland. He mentioned that is is common, for instance in the MSDN forums, to use first alternative below, instead of second alternative (which I had originally in my performance test script):
SELECT DATEADD(DAY, 0, DATEDIFF(DAY, 0, CURRENT_TIMESTAMP))
SELECT DATEADD(DAY, 0, DATEDIFF(DAY, '', CURRENT_TIMESTAMP))

Note that both these differs from the DATEDIFF version discussed slightly higher above. So now we have 4 different ways to accomplish this. "My" original string manipulation method and three DATEDIFF alternatives:

SELECT CAST(CONVERT(CHAR(8), CURRENT_TIMESTAMP, 112) AS datetime) --1
SELECT DATEADD(DAY, 0, DATEDIFF(DAY, '', CURRENT_TIMESTAMP)) --2
SELECT DATEADD(DAY, DATEDIFF(DAY, '20040101', CURRENT_TIMESTAMP), '20040101') --3
SELECT DATEADD(DAY, 0, DATEDIFF(DAY, 0, CURRENT_TIMESTAMP)) --4

If you feel that you do operations so that performance can be a major concern, then I suggest you test. Here is an updated performance test script, with all four versions. Alternative 1 is slowest and the three DATEDIFF alternatives (2-4) are pretty close with alternative 4 as the winner. I also suggest you consider readability, which can be a matter of oppinion.

The DATEADD alternative also has the advantage of flexibility. Instead of re-typing the suggestions from Hugo, I've just copied the original text from Hugo in the email he sent to me:

"The version that uses string conversion can only be used to strip the time from the date. The dateadd/datediff version can easily be varied to get the start of the current month, quarter or year. SELECT DATEADD(month, DATEDIFF(month, '20000101', CURRENT_TIMESTAMP), '20000101') Or to discard only minutes and seconds, or only seconds SELECT DATEADD(hour, DATEDIFF(hour, '20000101', CURRENT_TIMESTAMP), '20000101') (Note - when dealing with minutes, seconds or milliseconds, the DATEDIFF function CAN overflow the integer range - the date constant might have to be adapted to the expected set of datetime values). Further variations are also possible: How to get the last day of the previous month? And the last day of the current month?

SELECT DATEADD(month, DATEDIFF(month, '20000101', CURRENT_TIMESTAMP), '19991231')
SELECT DATEADD(month, DATEDIFF(month, '20000101', CURRENT_TIMESTAMP), '20000131')

How to get tomorrow's date (without time)?

SELECT DATEADD(day, DATEDIFF(day, '20000101', CURRENT_TIMESTAMP), '20000102')

How to round the datetime to the nearest hour, or to the nearest day?

SELECT DATEADD(hour, DATEDIFF(hour, '20000101', DATEADD(minute, 30, CURRENT_TIMESTAMP)), '20000101')
SELECT DATEADD(day, DATEDIFF(day, '20000101', DATEADD(hour, 12, CURRENT_TIMESTAMP)), '20000101')

How to get last Friday's date, without using a calendar table and regardless of the current DATEFIRST setting?

SELECT DATEADD(day, (DATEDIFF (day, '20000107', CURRENT_TIMESTAMP) / 7) * 7, '20000107')
or
SELECT DATEADD(day, (DATEDIFF (day, '20000108', CURRENT_TIMESTAMP) / 7) * 7, '20000107')

The first will return the current day if run on friday, the latter will return the previous friday.

Once you understand the general principle, it isn't hard to come up with new variations, that might be useful in specific circumstances.

"Tip: Always set the time to the same value

If you aren't interested in the time portion and you can't use the date type, you can set the time portion to a standard value for all rows. I recommend 00:00:00.000. Say you want to set the value to the current date when the INSERT is performed. For that you can use a DEFAULT constraint:

CREATE TABLE #dts(id INT IDENTITY, d_date DATETIME DEFAULT CONVERT(char(8), CURRENT_TIMESTAMP, 112))

INSERT #dts DEFAULT VALUES
SELECT * FROM #dts

I'm using the code 112 in the CONVERT() function above, which returns a string with the date only, in the unseparated format. However, we have not protected us from if anyone now explicitly sets the value and set the time portion to something other than 00:00:00:000. We can do that with a constraint:


ALTER TABLE #dts ADD CONSTRAINT dts_chk_d_date CHECK(CONVERT(char(12), d_date, 114) = '00:00:00:000')

What advantage does above give us? If we now want to search for all rows from a certain date, we can write the WHERE clause like:

WHERE dt = '20040305'

Above might be considered easier to read and understand. Also, say you want to join between two table based on the day only, you can now do this directly. If you have a time portion that varies, you would have to use some CONVERT in the JOIN operation which again can be disastrous for performance.
Above technique can also be applied if you are only interested in the time portion, in that case I recommend that you always set the date portion to 1900-01-01.
Now, isn't it great that we as of 2008 have separate data and time types?

Other tips, from SQL Server MVP Steve Kass

The super-careful can specify all dates in queries as smalldatetime, using cast or convert. But note that this may have unexpected behavior in SQL Server 7.0 against a (not small-) datetime column.

Without a very good reason and comments everywhere, store dates only as datetime or smalldatetime, and not as strings. If dates must be stored as strings (there are some good reasons to, but they are few), use a language neutral format and a 24-hour clock if time is also stored, so that the strings work everywhere and also sort the way the dates sort. For example, you use "dt < '20030101'" in the WHERE clause and the query doesn't return expected rows because dt is stored as a string in yyyy-mm-dd format.

The base date in Enterprise Manager (and some other Microsoft tools and applications) is 1899-12-30. EM should never be used for data entry, and rarely if ever for viewing data. If you use EM to specify time-only data, EM will store the date 1899-12-30. If you really want to use EM, specify the date 1900-01-01 explicitly. Note also that EM will only show the time portion if the date is 1899-12-30, not what you expect if you use 1900-01-01 as a standard value for date.


Why is 1753 the earliest date for datetime?

Good question. It is for historical reasons. In what we sometimes refer to as the western world, we have had two calendars in modern time: the Julian and the Gregorian calendars. These calendars were a number of days apart (depending on which century you look at), so when a culture that used the Julian calendar moved to the Gregorian calendar, they dropped from 10 to 13 days. Great Britain made this shift in 1752 (1752-09-02 were followed by 1752-09-14). An educated guess why Sybase selected 1753 as earliest date is that if you were to store an earlier date than 1753, you would also have to know which country and also handle this 10-13 day jump. So they decided to not allow dates earlier than 1753. Note, however that other countries did the shift later than 1752. Turkey, for instance, did it as late as 1927.

Being Swedish, I find it a bit amusing that Sweden had the weirdest implementation. They decided to skip the leap day over a period of 40 years (from 1700 to 1740), and Sweden would be in sync with the Gregorian calendar after 1740 (but meanwhile not in sync with anyone). However, in 1704 and 1708 the leap day was not skipped for some reason, so in 1712 which was a leap year, they inserted yet an extra day (imagine being born in Feb 30!) and then did the shift over a day like everyone else, in 1753.

No comments:

Post a Comment