Last Update: 2024 - 05 - 21 |
DFirst/DLast and the Myth of the Sorted Result Setby Philipp Stiefel, originally published June 2nd, 2018 last updated July 7th, 2018 Photo by Alex Holyoake on Unsplash There are two closely related questions I encounter frequently in various database related forums. “Why is the result of my query not sorted the way I expected it to be?” …and… “Why does the (D)First (or (D)Last) function not show the value I expected it to show?” The first question is crucial to understand the core issues here, so let’s focus on that one first. The Order of Query ResultsWhy are your results not sorted in the order you expected? Let go of your concept of sequential records. Relational databases work with sets of data. They do not store or process records in any particular order. - Of course they do, but for your own safety, just assume they don’t! - The database engine will return the records in your result set in any order it sees fit to produce the results most efficiently. You must never expect any particular record order! If you want your result to be sorted in any particular way, you have to explicitly instruct the database engine to sort the records in that order. This can be done with the ORDER BY clause in your SQL statement. The syntax is pretty simple. At the end of your SQL you add ORDER BY followed by the comma separated fields you want your results to be sorted on. By default each field will be sorted ASCending. You can add the DESCending keyword to each field sort them that way.
SELECT ID,
LastName,
FirstName,
DateOfBirth
FROM tblPerson
ORDER BY LastName ASC,
DateOfBirth DESC
;
And this should be the end of the discussion. Period! Unfortunately, at this point quite often someone else chimes in with a witty remark. “The records are physically stored in the order of their primary key and query results will be automatically returned in that order. If you want your records sorted by a strictly increasing auto number primary key, you can rely on that and do not need to add the ORDER BY.” This is wrong on multiple levels! In many database systems, like Microsoft SQL-Server or Oracle, the records are stored in the order of the primary key only when using the default settings when creating the table. You can apply settings and options upon creation of the table that will cause the physical order of records to be different. With Microsoft Access, the records in a table are actually stored in the order of the primary key. - But only after the database has been reorganized by Compact&Repair. Before that, new records will just be added at the end of the storage area(s) of the table. - For an incrementing auto number PK that does not make a difference, but it might do for other types of primary keys. But even in Microsoft Access you cannot rely on the results of a query being returned on order of the primary key. So, my reply to the above remark in the past has been along the following lines. You should not and cannot rely on (undocumented) database internals for the results of your query. They may work differently than you think they are or they may be changed in a future release without any notice. If you depend on a particular order of records in the result, you have to explicitly specify this in the ORDER BY clause. While I just knew from experience that this was correct, I had no sample code to reproducible prove that remark wrong. - Now, thanks to Matthew McGiffen’s post The Importance of ORDER BY, I came up with a scenario to prove that you cannot rely on records being returned in the order of the primary key. Assume we got a table tblPerson with the columns Id (Primary Key), Lastname, Firstname and DateOfBirth. Here is the sample data from that table. Whenever we select records from this table without specifying an explicit ORDER BY, they will be returned in the order of ID, the primary key of the table. This is still true when filtering the data, e.g. to see only the records of persons born on or after 1/1/2000.
SELECT ID, LastName, FirstName, DateOfBirth
FROM tblPerson
WHERE DateOfBirth >=#1/1/2000#;
Note, there is no ORDER BY in the query. So far this rather seems to prove the assumption that records will be automatically returned in the order of the primary key. However, if you run this type of query frequently and there is a huge number of records in your table, it is a very sensible choice to put an index on the DateOfBirth column. This can be done quickly setting the Indexed property of the field to Yes (Duplicates OK). A very small, impeccable change, isn’t it? It should speed up executions, but it does not affect the results of our query. - Or does it? Let’s run the query again with the index and look at the results. What happened here? The results are not ordered by the primary key anymore! They are ordered by the DateOfBirth now. Why is that? The database engine recognized that using the new index on DateOfBirth will make executing the query much more efficient. It reads the DoB values from the sorted index to filter the data, then it fetches each record belonging to the filtered index entries. The results are now ordered by DateOfBirth because it would have meant additional effort to sort by the primary key again. In general, database engines will never waste resources on anything that is not strictly required to process the query at hand. So, even when you want records in a Microsoft Access query to be returned in order of the primary key, you should always include an explicit ORDER BY clause in your query. DFirst/DLast and what is wrong with themThe function names of the First and Last aggregate functions and their VBA counterparts, the DFirst and DLast domain functions, suggest you get the first/last record from a given data set. Up until Access 2003 the official documentation to this functions used to say: “The Microsoft Access DFirst function returns the first value from an Access table (or domain).” Most people (wrongly) interpreted this to be the first/last record entered chronologically. However, it should be obvious that this statement is lacking any information in which order the retrieved record is the first or last. The current documentation is much clearer now. The Access Developer Reference for the First and Last Functions remarks: “They simply return the value of a specified field in the first or last record, respectively, of the result set returned by a query. Because records are usually returned in no particular order (unless the query includes an ORDER BY clause), the records returned by these functions will be arbitrary.” Correct, but please keep in mind, these functions are aggregate functions. Any ORDER BY will usually be applied to the final result of the aggregation, but not to the lower level of the query from where the first/last value is retrieved. So, you should always assume the results to be arbitrary, unless you explicitly ordered the input data to First/Last in a subquery. As it is not even possible to apply an ORDER BY to domain functions, the documentation of DFirst is taking it one step further. “You can use the DFirst function to return a random record from a particular field in a table or query when you simply need any value from that field.” Now, you should not misunderstand this in the way that there is a true random generator behind determining the return value of DFirst/DLast. Quite to the contrary, it is the first (or last) value that is easiest to retrieve for the ACE-database-engine that will be returned. This is usually always the same value until the circumstances change, e.g. new records are added, the database is compacted, or a new index is created. You can never rely on DFirst/DLast returning a determined result, even if you get the expected result in many test runs. Better Alternatives to DFirst/DLast: Min/Max respecticvely DMin/DMaxSo far for the problems with DFirst/DLast. - But what is a better approach to retrieve the first or last value in a certain context? Die aggregate functions Min and Max retrieve reliably and always reproducibly the smallest or biggest value from a certain database column. If you want to just display the smallest or largest value from a field, it is trivial to get that value with DMin or DMax. It gets more difficult if do not just want the value from the column whose order is clearly defined with Min/Max, but a value from a different field from that record. An example; you want to retrieve the order value of the last order of a customer. The sortable column here would be the order date, but you do not care for the date itself, but for the total value of that order. In those cases, you would need to use a two-tiered approach. There are two options for that.
I will write about this in more detail in another article soon. Conclusions
I will never share your email with anyone. You can unsubscribe any time. © 1999 - 2024 by Philipp Stiefel - Privacy Policiy |