Last Update: 2018 - 02 - 27
Access SQL - Select Top X records - With no ties
by Philipp Stiefel, originally published November 30th, 2015
During the last couple of weeks I encountered one and the same question in several forums. “I want to select the top X records ordered by Y. But when there are several records among them with the same value for Y, I get more than the expected number of records. But I want exactly X records, not more.”
This article will show you this can be achieved in Microsoft Access using simple SQL queries.
I will discuss this based on two simple sample tables. It is a table of actors and another table containing rating(s) of the actors by a random authority (me ;-)).
As a prerequisite we need to join the actors to their ratings. This is a very simple query; here it is:
SELECT a.ActorId, a.ActorName, ar.Rating, ar.DateOfRating FROM tblActors a INNER JOIN tblActorRatings ar ON a.ActorId = ar.ActorId;
I save this query with the name qryActors_Ratings.
The basic requirement to select the top X best rated actors is very easy to solve. The Access Jet/Ace-Engine (its database engine) supports the TOP predicate in a select statement. So to just select the top X records you write a query like this:
SELECT TOP 5 * FROM qryActors_Ratings ORDER BY Rating DESC
This queries the top 5 best rated actors from that table. So far it is very simple.
But looking at the result a problem will become obvious. Within this sample data set, there are several actors with a rating of 4. All of them are included in our result. So, as you will probably have noticed right away, the query does not return 5 records but 6.
To solve this problem, we need sort over an additional field to get a unique ranking with no ties on any rank. The technical requirement is for this fields to be absolutely unique. That requirement is meet by the primary key of the table.
Just doing this would solve the problem, but it might trigger questions from your users. Like: “Why is actor A included and not actor B even though they have the same ranking?” So I would recommend to use a rule that is easily explained, like “newer rankings are weighted higher”.
To make sure there still are no ties, even if there two ratings on the same date, include the primary key as third sort field.
SELECT TOP 5 * FROM qryActors_Ratings ORDER BY Rating DESC, DateOfRating DESC, ActorId DESC
This query produces the desired result.
Multiple ratings per actor
So this could have been the end of this article. But if you really paid attention to my table design, you might have spotted the possibility that there is more than one review for any one actor. This might lead to the actor being shown twice in the results. – I added rating record, which causes such a problem.
If your data structures allow for the same issue to occur, you need to decide on a method to eliminate the duplicates from the final result. Depending on you exact requirements, the correct solution will differ. I see two possibly valid options in this context.
Option 1 – Based on average rating
You modify our query qryActors_Ratings from earlier on to calculate the average rating of an actor. This can easily be done by using the AVG Domain Aggregate Function.
SELECT a.ActorId, a.ActorName, AVG(ar.Rating) AS Rating, CDate(AVG(ar.DateOfRating)) AS DateOfRating FROM tblActors AS a INNER JOIN tblActorRatings AS ar ON a.ActorId = ar.ActorId GROUP BY a.ActorId, a.ActorName;
I used the AVG function on the DateOfRating as well. The resulting average however will be a number, so we need to convert it back to a date using the CDate-Function. (This works, because dates are stored as numbers internally.)
The query to retrieve the final results (the top 5) will remain unchanged then. – But it can’t include the RatingId in the final result anymore, because each line in the result may be based on multiple ratings.
Option 2 – Based on the last rating
Another valid option to query the top 5 actors, is based on the last (or analogously the first) rating that is stored for each actor. In this query we need to use the MAX Domain Aggregate Function to retrieve the highest rating id from the base table tblActorRatings. – Please be aware that this will not necessarily be the last rating by DateOfRating, but the last one stored in our table. (Because I used the AutoNumber data type for the RatingId.)
SELECT TOP 5 qar.* FROM qryActors_Ratings AS qar WHERE RatingId =(SELECT MAX(ar.RatingId) FROM tblActorRatings ar WHERE ar.ActorId = qar.ActorId) ORDER BY Rating DESC, DateOfRating DESC, ActorId DESC;
There might be other, more complex requirements in some cases, but I think these two options are at least a good starting point for the most common scenarios.
© 1999 - 2017 by Philipp Stiefel