This content originally appeared on DEV Community and was authored by Franck Pachot
In IT, like in math, a negative test can prove that an assertion is wrong. But a positive test doesn't prove that the assertion is right. It is worse in IT because the algorithm may show an apparently correct result with a probablilty that is higher than just being lucky:
postgres=# create table demo (n int primary key);
CREATE TABLE
postgres=# insert into demo
select n from generate_series(1,1e6) n;
INSERT 0 1000000
postgres=# select n from demo limit 10;
n
----
1
2
3
4
5
6
7
8
9
10
(10 rows)
With this example, you may think that a select returns the rows ordered. This is wrong. The SQL language is declarative. Without an ORDER BY, the result is random. The apparently sorted result here is just a side effect of inserting into heap tables by appending at the end of the file. And reading the file from beginning to end with one thread. The rows are displayed as they are fetched.
When I insert the rows in another order:
postgres=# drop table if exists demo;
DROP TABLE
postgres=# create table demo (n int primary key);
CREATE TABLE
postgres=# insert into demo select 1e6-n from generate_series(1,1e6) n;
INSERT 0 1000000
postgres=# select n from demo limit 10;
n
--------
999999
999998
999997
999996
999995
999994
999993
999992
999991
999990
(10 rows)
they come as they were stored. This is typical of a serial SeqScan:
postgres=# explain select n from demo limit 10;
QUERY PLAN
-------------------------------------------------------------
Limit (cost=0.00..0.14 rows=10 width=4)
-> Seq Scan on demo (cost=0.00..14425.00 rows=1000000 width=4)
(2 rows)
You cannot rely on any of this behavior. With another execution plan, this order may change:
postgres=# set enable_seqscan=false;
SET
postgres=# select n from demo limit 10;
n
--------
0
1
2
3
4
5
6
7
8
9
(10 rows)
postgres=# explain select n from demo limit 10;
QUERY PLAN
-------------------------------------------------------------
Limit (cost=0.42..0.77 rows=10 width=4)
-> Index Only Scan using demo_pkey on demo (cost=0.42..34712.43 rows=1000000 width=4)
(2 rows)
Many things can change the behavior. Here an index scan instead of a full table scan. Parallel query will also change the way they are read by SeqScan. And updates can also move them physically:
postgres=# set enable_seqscan=true;
SET
postgres=# alter table demo add column x char;
ALTER TABLE
postgres=# update demo set x=1 where mod(n,3)=0;
UPDATE 333334
postgres=# select n from demo limit 10;
n
-------------
999998
999997
999995
999994
999992
999991
999989
999988
999986
999985
(10 rows)
The only way to get a reliable sorted result is with an ORDER BY:
postgres=# select n from demo order by n limit 10;
n
--------
0
1
2
3
4
5
6
7
8
9
(10 rows)
Don't think an ORDER BY will have to sort the rows, the query planner may opt for a physical structure that returns the them in order:
postgres=# explain select n from demo order by n limit 10;
QUERY PLAN
------------------------------------------------------------------------------------------------
Limit (cost=0.42..0.77 rows=10 width=4)
-> Index Only Scan using demo_pkey on demo (cost=0.42..34716.43 rows=1000000 width=4)
(2 rows)
In SQL you declare the order you want, and the query planner knows which access methods retrieves them in order. The optimizer knows but you don't know. Except of course if you have read, and understood, the source code for the exact version you run.
YugabyteDB
In a distributed SQL database, there are good chances that the default order does not match anything you expect:
yugabyte=# create table demo (n int primary key);
CREATE TABLE
yugabyte=# insert into demo select 1e6-n from generate_series(1,1e6) n;
INSERT 0 1000000
yugabyte=# select n from demo limit 10;
n
-------------
110359
192735
219128
237047
310517
593962
627995
651891
669921
790562
(10 rows)
This looks random. But of course, there's a logic. The default sharding method is a hash function on the first column of the primary key. We can query this hashing function ourselves with yb_hash_code()
yugabyte=# select min(h),max(h),avg(h) from (
select yb_hash_code( generate_series(1,1e6) ) h
) v;
min | max | avg
----------+-------+--------------------
0 | 65535 | 32774.509179000000
(1 row)
This function can return one of the 65536 values from 0 to 65535. This is what is used to distribute rows into tablets in the distributed storage (DocDB).
Without an ORDER BY, the rows are returned ordered on this hash code first:
yugabyte=# select yb_hash_code(n), n from demo limit 10;
yb_hash_code | n
-------------------+--------
0 | 110359
0 | 192735
0 | 219128
0 | 237047
0 | 310517
0 | 593962
0 | 627995
0 | 651891
0 | 669921
0 | 790562
(10 rows)
Look, If I query all rows with this speific hash code, they come back in order.
yugabyte=# select n from demo where yb_hash_code(n)=0;
n
-------------
110359
192735
219128
237047
310517
593962
627995
651891
669921
790562
792363
819768
891493
984191
(14 rows)
When understanding the way the rows are stored physically, and retrieved with a SeqScan, the order is random but not magic:
yugabyte=# select n, yb_hash_code(n) from demo limit 50;
n | yb_hash_code
-------------+--------------
110359 | 0
192735 | 0
219128 | 0
237047 | 0
310517 | 0
593962 | 0
627995 | 0
651891 | 0
669921 | 0
790562 | 0
792363 | 0
819768 | 0
891493 | 0
984191 | 0
17012 | 1
24685 | 1
153595 | 1
186378 | 1
219742 | 1
258869 | 1
271029 | 1
547922 | 1
565568 | 1
763430 | 1
766123 | 1
772002 | 1
781840 | 1
840822 | 1
844655 | 1
953917 | 1
162485 | 2
168413 | 2
271551 | 2
285516 | 2
407063 | 2
420509 | 2
440160 | 2
572540 | 2
585722 | 2
589471 | 2
628271 | 2
719191 | 2
837125 | 2
866379 | 2
951013 | 2
976519 | 2
994652 | 2
854 | 3
57757 | 3
70079 | 3
(50 rows)
Internally, the YugabyteDB table rows are stored ordered on the document key, which is the primary key prefixed by the hash code. This brings the fast access to a key point or range: from the hash code, it determines the right tablet. And then it finds the row in the SSTable (Sorted Sequence Table) structure.
If I decide to shard on a range, even a SeqScan would return in order:
yugabyte=# drop table demo;
DROP TABLE
yugabyte=# create table demo (n int, primary key(n asc))
split at values ( (333333),(666666) );
CREATE TABLE
yugabyte=# insert into demo select 1e6-n from generate_series(1,1e6) n;
INSERT 0 1000000
yugabyte=# select n from demo limit 10;
n
--------
0
1
2
3
4
5
6
7
8
9
(10 rows)
yugabyte=# explain analyze select n from demo limit 10;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..1.00 rows=10 width=4) (actual time=0.788..0.796 rows=10 loops=1)
-> Seq Scan on demo (cost=0.00..100.00 rows=1000 width=4) (actual time=0.787..0.791 rows=10 loops=1)
Planning Time: 0.047 ms
Execution Time: 0.855 ms
(4 rows)
The Seq Scan on is then similar to the Index Scan on the primary key:
yugabyte=# explain analyze select n from demo order by n limit 10;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..1.14 rows=10 width=4) (actual time=0.809..0.816 rows=10 loops=1)
-> Index Scan using demo_pkey on demo (cost=0.00..114.00 rows=1000 width=4) (actual time=0.808..0.813 rows=10 loops=1)
Planning Time: 0.066 ms
Execution Time: 0.843 ms
(4 rows)
This is because the table is actually stored in the primary key structure (Similar to Oracle Index Organized Table, or SQL Server Clustered Index, or MySQL InnoDB tables...)
I mentioned that the hash code has 65536:
yugabyte=# drop table demo;
DROP TABLE
yugabyte=# create table demo (n int primary key)
split into 16 tablets;
CREATE TABLE
yugabyte=# insert into demo
select n from generate_series(1,1e6) n;
INSERT 0 1000000
yugabyte=# select count(distinct yb_hash_code(n)) from demo;
count
------------
65536
(1 row)
But of course the number of tablets is lower. Just a few per nodes to allow adding nodes. When I want to know the number of tablets
If by curiosity you want to know how many tablets, I run an aggregate function that is pushed-down to each tablet so that the number of rows in the execution plan is the number of aggregations:
yugabyte=# explain analyze select count(*) from demo;
yugabyte=# explain analyze select count(*) from demo;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Aggregate (cost=102.50..102.51 rows=1 width=8) (actual time=984.031..984.031 rows=1 loops=1)
-> Seq Scan on demo (cost=0.00..100.00 rows=1000 width=0) (actual time=357.419..984.010 rows=16 loops=1)
Planning Time: 0.055 ms
Execution Time: 984.145 ms
(4 rows)
The rows=16
is from the 16 count(*)
results coming from each tablets. I have 16 tablets here, with 65536/16=4096 hash codes per tablet.
It is interesting to understand how the rows are stored and fetched, because you can think about performance. But when it comes to run application queries, don't forget that SQL is a declarative language. If you want a sorted result, declare it with ORDER BY. And the query planner will figure out what to do.
This content originally appeared on DEV Community and was authored by Franck Pachot
Franck Pachot | Sciencx (2021-11-15T18:24:55+00:00) ORDER BY is mandatory in SQL to get a sorted result. Retrieved from https://www.scien.cx/2021/11/15/order-by-is-mandatory-in-sql-to-get-a-sorted-result/
Please log in to upload a file.
There are no updates yet.
Click the Upload button above to add an update.