Monday, August 30, 2021

Default sort order in SQL Server ?

 

No Seatbelt - Expecting Order without ORDER BY


Is there a default SELECT order? -  if there is no "order by", the database is free to return the rows in any order it thinks is most efficient.

One of the greatest lessons I've learned in building software is that every technical implementation has a human element to how it is used, perceived, and interpreted.  While I've seen my share of poor technical solutions to problems, sometimes the most interesting "failure" point is the human operator and how he or she uses/perceives/interprets a software feature.  Some of them are easy to understand  Others are harder for me to grok.

Occasionally I get asked about ordering guarantees in SQL Server.  In some cases, SQL Server had to break previously undocumented but somewhat consistent behavior (TOP 100 Percent... ORDER BY, a blog here).  In other cases, I can find it difficult to reason through the logic ;).  Today's question has to do with not specifying an ORDER BY but still getting rows back in sorted order.  In their zeal to keep me employed with interesting corner cases in the SQL language, some have concluded:

"I don't have to specify the ORDER BY because I *know* that it will return rows in sorted order.  It always does! every time! I promise!"

So, we're talking about the top-level ORDER BY clause in a query that defines the "presentation order" in ANSI SQL.  The common reason for this conclusion is that, during development, the database developer found that a query plan always came back with the same plan.  So, if I get out the "Jump to Conclusions" game and start playing, I'll decide that everyone will get the same plan from now until the end of the universe.  The primary reason for this is that the query probably scanned an index that happens to be in the order that you desired:

Query Plan:

Will that query plan change?  Well, if keep the same machine configuration (CPUs, Memory, etc) and don't ever insert data into the table, perhaps you can get lucky and keep the same plan... perhaps.  Of course, when you upgrade to the next release of SQL Server, perhaps the server has changed, new optimizations have been added, or someone adds a column to the clustered index that makes it slower than before.  All of these could change the plan, even if you didn't change a thing. 

Amazingly, if I happen to just add an "ORDER BY col1" into this query, I'll note that the query plan stays the same.  So, no runtime cost to pay to get that guarantee - perhaps just a few more characters to type.

Sorted Order:

So the "human" part of these kinds of problems arise when I get questions like "Where does it SAY that the system will not return the rows in order?"  It's difficult to undo the conclusion in your head when many experiments have shown, for a certain set of conditions, that the query will return rows in sorted order in "all" cases.  So, I'll try to help resolve a few of these discussions today by giving an example when SQL Server will indeed stop returning rows in sorted order.

I'm running this on a Developer Edition build of SQL Server 2008 on my personal machine, a dual-core machine with 8 GB of RAM, For this test, you'll need a machine that has more than one CPU available for SQL Server to use.

CREATE TABLE [dbo].[orderingtest1](
    [col1] [int] NULL,
    [col2] [int] NULL,
    [col3] [int] NULL,
    [col4] [int] NULL,
    [col5] [binary](4000) NULL
) ON [PRIMARY];
create clustered index i1 on orderingtest1(col1)

DECLARE @i INT=0;
set nocount on
begin transaction
WHILE @i<20000
BEGIN
INSERT INTO [orderingtest1](col1, col2, col3, col4) VALUES (rand()*1000, rand()*1000, rand()*1000, rand()*1000);
SET @i+=1;
END;
commit transaction

I've created a table with 20,000 rows in it.  Perhaps this is your inventory table for the database backing your web site.

So let's go run an query that yields a plan that happens to give me results in a specific order

select * from orderingtest1 where col2 = 5 

OutPut

Hey, that's ordered on col1!  Let's add some more rows just to be sure.

DECLARE @i INT=0;

set nocount on
begin transaction
WHILE @i<20000
BEGIN
INSERT INTO [orderingtest1](col1, col2, col3, col4) VALUES (rand()*1000, rand()*1000, rand()*1000, rand()*1000);
SET @i+=1;
END;
commit transaction

Another 20,000 rows, and sure enough my query is fine.. There's only the one index anyways, so what could happen?  Let's ship this guy.

Then we get a new set of inventory and add yet another 20,000 rows into our table and run our query again:

Parallel:

Whoa! we just got a fancy parallel query plan.  It uses multiple threads to scan the index and apply the filter condition, only sending up rows to the main thread if they qualify.  This can be faster on larger data sets.

(You may need to run DBCC FREEPROCCACHE if you aren't getting this plan - since the data is random, you may also need to try another value for col2.  Assuming you have enough CPUs and are running a SKU that can use them in SQL, you should get this plan eventually, just keep adding rows ;)

Un Ordered:

Oh my - it's not ordered!

The hard part here is that there is no reasonable way for any external user to know when a plan will change .  The space of all plans is huge and hurts your head to ponder.  SQL Server's optimiser will change plans, even for simple queries, if enough of the parameters change.  You may get lucky and not have a plan change, or you can just not think about this problem and add an ORDER BY.  Interestingly enough, once you have enough rows *and* add an ORDER BY, SQL Server still generates a plan that looks like this:

Parallel Ordered:

However, if you run the query, you'll notice that the rows DO come back in order... The Exchange operator can run in a mode where it preserves order, but it only does so if you ask it through an ORDER BY clause:

Properties:

So, adding an ORDER BY gives things back in order.  (I have visions of the two guys in the pub mentioning this and one guy saying "A way to order the rows that come back from a query? Brilliant!")

The example I've done is fairly simple - one table with a clustered index using a simple query with a predicate on only one column.  Even if there were only 2 possible plans considered for this query, one of them doesn't guarantee order unless you ask for it.

There are lots of situations where plans can change in the Optimizer - for more complex queries, there can be thousands of plan choices or more, and each of them has a case when it would likely be picked.  For each of those plans, the sort for that plan can be different if you don't specify it.  So, I hope this gives you something to go try - I find that examples help me learn, so perhaps it can help you as well.

So, my advice for the day:

SQL Server only guarantees that results are ordered per the columns you specify in an ORDER BY clause

There is no “default” ordering that a query will fall back on outside of an ORDER BY clause.

Results may come back in the order of the clustered index, or they may not Even if results come back in the order of the clustered index on one run of the query, they may not come back in the same order if you run it again If you need results to come back in a specific order, you must be explicit about it in the ORDER BY clause of the query.


Collation A collation specifies the bit patterns that represent each character in a data set. Collations also determine the rules that sort and compare data. SQL Server supports storing objects that have different collations in a single database. For non-Unicode columns, the collation setting specifies the code page for the data and which characters can be represented. Data that is moved between non-Unicode columns must be converted from the source code page to the destination code page.


SQL Server doesn’t guarantee the order of the result-set. It’s independent of the order unless we define it explicitly on how we want those records sorted. The only way to change the order in which the results appear is to use the SQL Order by clause.


it will be good practice to explicitly mentioning of "order by clause" with the select statements to the fetch the data in a particular order if your resultant logic is based on the order of the data fetched.


If you need order in your query results, put in an ORDER BY. It's that simple. Anything else is like riding in a car without a seatbelt.


Happy Querying!!

No comments:

Popular Posts