# Is there any technical reason why, in programming, the default date format is YYYYMMDD and not something else?

lucaswxp 09/24/2018. 14 answers, 32.993 views

Is there any engineering reason why is it like that? I was wondering in the case of a RDBMS that it had something to do with performance, since a "YEAR" is more specific than a "MONTH", for instance: you only have one year 2000, but every year have "January", which would make it easier/faster to filter/sort something by year first, and that's why the year comes first.

But I don't know if that really makes sense... Is there any reason at all?

Arseni Mourzenko 10/20/2018.

This way, the dates can easily be sorted as strings using the default sorting rules (i.e. lexicographical sorting).

This is also why both month and day are specified using two digits (adding a leading zero if needed).

In fact it is one of the date formats defined by ISO 8601. That standard also defines a date-and-time format, 2015-03-27T15:26:40Z, which is also sortable as strings.

However, YYYYMMDD has an added benefit of making it possible to easily (no substrings or character replacements involved) parse the string as an integer, and still use default ordering on integers.

MSalters 09/29/2018.

Not mentioned yet, but you quickly gloss over the order inside YYYY. That's already millennia, centuries, decades, years. That is to say, YYYY is already ordered from longest period to shortest period. The same goes for MM and DD, that's just how the number system works.

So to keep the order between fields consistent with the order within fields, the only option is YYYYMMDD.

As zahbaz and Arseni Mourzenko noted, the YYYYMMDD formats sorts easily. That is not a lucky coincidence, that's a direct consequence of putting the fields for the longest duration first (and keeping the length fixed; we are introducing a Y10K problem here.)

Pod 09/28/2018.

Is there any reason at all?

Yes. Those pieces of software will be using ISO 8601.

ISO 8601 has a number of advantages over other date formats:

• It's a standard with a spec document :)
• It's unambiguous. mm/dd/yyyy and dd/mm/yyyy can be confusing unless it's past the 13th day.
• It lexicographically sorts into ascending time order, so no special date-sorting logic is required. This is especially useful in filenames, where lexicographical number sorting is often confusing (e.g. 1_file, 10_file, 2_file).
• It mandates 4-digit year and zero padded month and year. This avoids the year 2000 problem and other ambiguities.

As for why ISO 8601 exists in the first place, it's because people were finding date-formats ambiguous and confusing when swapping data between countries/systems, and they needed something unambiguous.

For the rationale see the spec's introduction.

Although ISO Recommendations and Standards in this field have been available since 1971, different forms of numeric representation of dates and times have been in common use in different countries. Where such representations are interchanged across national boundaries misinterpretation of the significance of the numerals can occur, resulting in confusion and other consequential errors or losses. The purpose of this International Standard is to eliminate the risk of misinterpretation and to avoid the confusion and its consequences.

...

This International Standard retains the most commonly used expressions for date and time of the day and their representations from the earlier International Standards and provides unique representations for some new expressions used in practice. Its application in information interchange, especially between data processing systems and associated equipment will eliminate errors arising from misinterpretation and the costs these generate. The promotion of this International Standard will not only facilitate interchange across international boundaries, but will also improve the portability of software, and will ease problems of communication within an organization, as well as between organizations.

The standard defines “basic” variations as minimizing the use of delimiters. So, YYYYMMDD is the basic alternate to the extended format YYYY-MM-DD.

Pieter B 10/04/2018.

It's because all the other ways to do it are ambiguous.

01/02/2003 what does that mean? January second 2003? Or in Europe: February 1st 2003? It gets even worse if you use two digits for the year, as 01/02/03.

That is why you use YYYYMMDD, it's the convention which enables us to communicate clearly about dates, 20030201 as a date is always clear. (and it makes it easier to sort)

(Now don't go storing that as the integer 20 million 30 thousand 2 hundred and 1. please ok? pretty please?)

zahbaz 09/25/2018.

Let t1 and t2 be distinct integers that represent two times written in YYYYMMDD formatting. Then t1 < t2 implies that t2 occurred after t1.

You lose this ordering with DD and MM first formatting.

ISO is, IMO, the only sensible format.

Martin 09/25/2018.

One point not mentioned is that, in interactive inputs, this format allows to control the input.

The system cannot know if a month has 28, 29, 30 or 31 days without knowing the specific year and month. When the interactive input mandates that year and month come first it can check if the day (inserted last) is in the allowed range.

Granted, the question was largely about the date format, but it can be argued that the date format follows the formatting presented to the user.

Joel Coehoorn 09/25/2018.

YYYYMMDD orders dates the same way you orders numbers: most significant portion first. MMDDYYYY would be like writing "one hundred twenty three" as "twenty and one hundred three".

In our culture, we have a natural understanding of MMDDYYYY because, as humans, we have an awareness of time, and years progress slowly. We generally know what year it is. Seeing the year rarely matters, so we push it to the back. Months change over just fast enough to retain their importance. Other cultures handle this differnently. Much of the world prefer DDMMYYYY.

mckenzm 09/25/2018.

Sorting has been mentioned but by far the most useful reason for doing is to compare them as "strings", and yes a 26 character timestamp is ordered similarly.

I am aware such comparisons are essential for sorting, but it is generally useful for a 2 element sort.

I have worked on projects where this was not adopted, and yes, programmers tried (with mixed results) to compare the dates as strings.

Pretty formatting is for the client side or typesetting.

WolfgangGroiss 09/25/2018.

This format makes alphabetical order of the strings identical to chronological order of the dates. This is useful because many tools provide alphabetical ordering of e.g. files by name, but no way to parse arbitrarily-formatted dates from file names and sort by those.

mrvinent 09/25/2018.

It's about restrictiveness. Imagine YEAR, MONTH and DAY as parameters, in the format YYYYMMDD each parameter is more restrictive than the previous one.

So if you want to search something that happened in 1970 you can do it by searching a string starting by "1970*", but if you remeber which month was you can add the month like "197005*". This way every "parameter" of the date gives you more specific information.

It's the only way to go from less specific info ("1970*") to more specific info ("19700523").

Rob 09/26/2018.

Why, in programming, the default date format is YYYYMMDD ...

It's a human readable format for input and output, it's not necessarily stored that way.

Over a third of all programming languages were developed in a country with English as the primary language and most of the modern ones adhere to a Standard of some description - the international Standard for dates is ISO 8601.

As time changes, usually forward, days increment first, then months, lastly years - it might be easier to understand if we had decimal dates (and decimal time) - as time passes the number gets bigger. It's simply easier for humans to look at the number and compare it to another date at a glance.

The computer doesn't care what structure you want to use and in most (but not all) computers binary logic is used - base e actually has the lowest radix economy but isn't the most efficient nor easiest for a complete sequence.

The actual input and output format for dates varies by country and is set by localization, while YYYYMMDD may seem to make the most sense and be what you are used to it isn't universal today, nor was it that way in the past for the longest time, yet even today Roman numerals are commonly used for dates.

Knowing the year upfront tells you the number of days in a year, the biggest variation in duration that a year can undergo. It tells you upfront the number of days in each month to follow (for error checking during entry), permitting input of the day first might have to back you up if the subsequent year did not agree with your input - possibly making accessible input more difficult. It also has importance with regards to the calendar format. See also the geek calendar, with its decimal stardates.

As far as the computer is concerned it's likely to use UNIX Epoch time, the number of seconds that have elapsed since 00:00:00 Coordinated Universal Time (UTC), Thursday, 1 January 1970, where every day is treated as if it contains exactly 86400 seconds. See also the Julian day. The YYYYMMDD format is simply preferred by egocentric humans, the IAU regards a year as a Julian year of 365.25 days (31.5576 million seconds) unless otherwise specified.

Ben 09/26/2018.

Another use I've seen for this representation is that you can store dates as integers (i.e. in a database), using only 4 bytes per date. Using YYYYMMDD then means that integer comparisons (often a single machine instruction) have the same result as comparisons on the date represented. And it prints moderately human-readably. And none of this requires any code or special support at all, in any mainstream programming environment.

If those things are most of what you need to do with dates, and you need to do a lot of it, then this format has a lot of appeal.

By comparison, dates in common formats like DD/MM/YYYY take 10 bytes as strings of ASCII characters. YYYYMMDD strings reduce that to 8 and gain the "comparing the representations has the same result as comparing the dates" advantage, but even then string-based comparison is character-by-character rather than a single integer comparison.

Goyo 09/28/2018.

Same reason the Moon is made of green cheese: it is not. In most cases the default format is some kind of localized string. Sometimes ISO format is used but usually with dashes for better readability. YYYYMMDD(or %Y%m%d in strftime parlance) is seldom the default. To be fair I am sure I have seen it but I cannot think of an example right now.

# Unix date (GNU core utilities)

date

Output:

Wed Sep 26 22:20:57 CEST 2018

# Python

import time
print(time.ctime())

output:

Wed Sep 26 22:27:20 2018

# C

#include <stdio.h>
#include <time.h>

int main () {
time_t curtime;

time(&curtime);
printf(ctime(&curtime));
return(0);
}

Output:

Wed Sep 26 22:40:01 2018

# C++

#include <ctime>
#include <iostream>

int main()
{
std::time_t result = std::time(nullptr);
std::cout << std::ctime(&result);
}

Output:

Wed Sep 26 22:51:22 2018

# Javascript

current_date = new Date ( );
current_date;

Output:

Wed Sep 26 2018 23:15:22 GMT+0200 (CEST)

# SQLite

SELECT date('now');

Output:

2018-09-26

# Python + numpy

import numpy as np
pd.datetime64('now')

Output:

numpy.datetime64('2018-09-26T21:31:55')

# Python + pandas

import pandas as pd
pd.Timestamp('now', unit='s')

Output:

Timestamp('2018-09-26 21:47:01.277114153')

# apport.log

ERROR: apport (pid 9742) Fri Sep 28 17:39:44 2018: called for pid 1534, signal 6, core limit 0, dump mode 2

# alternatives.log

update-alternatives 2018-05-08 15:14:24: run with --quiet --install /usr/bin/awk awk /usr/bin/mawk 5 --slave /usr/share/man/man1/awk.1.gz awk.1.gz /usr/share/man/man1/mawk.1.gz --slave /usr/bin/nawk nawk /usr/bin/mawk --slave /usr/share/man/man1/nawk.1.gz nawk.1.gz /usr/share/man/man1/mawk.1.gz

# cups/access.log

localhost - - [28/Sep/2018:16:41:58 +0200] "POST / HTTP/1.1" 200 360 Create-Printer-Subscriptions successful-ok

# syslog

Sep 28 16:41:46 pop-os rsyslogd:  [origin software="rsyslogd" swVersion="8.32.0" x-pid="946" x-info="http://www.rsyslog.com"] rsyslogd was HUPed

Caius Jard 10/01/2018.

An additional benefit not mentioned so far is that desirable quantization (assigning a precise value as belonging to the same general range of values) is a relatively easy and fast single operation..

Suppose you're writing a report that summarises the events today, like the sum and number of sales. The sale date and time is stored as YYYYMMDDHHMISS, you simply need to keep the leftmost 8 characters (if it's a string) or integer divide (i.e. floor) by 1,000,000 to reduce your datetime to the day of the sale.

Similarly, if you wanted the month's sales, you keep only the leftmost 6 digits, or divide by 100,000,000

Sure, you could argue that any string manipulation is possible, a datetime for sales of "12-25-2018 12:34pm" could be substringed and manipulated multiple times to get the month and the year. In numeric form 122520181234 could be divided and modded, and multiplied, and divided some more, and eventually also produce a month and a year.. ..but the code would be really difficult to write, read, maintain and understand..

And even sophisticated database optimizers might not be able to make use of an index on a column for a where clause if the date form was MM/DD/YYYY but cut up and pieced back together. In comparison, storing a YYYYMMDD representation and wanting December 2018 leads to where clauses of the ilk dateasstring LIKE '201812%' or dateasint BETWEEN 20181200 and 20181299 - something an index can easily be used for

Thus, if there wasn't a dedicated datatype for dates and string/numerical representation was the only choice, using and storing times in some representation of longest-interval-on-the-left-to-shortest-interval-on-the-right has quite a few benefits for ease of understanding, manipulation, storage, retrieval and code maintenance