Language: Deutsch English















Last Update: 2024 - 05 - 21








Access SQL - Select Top X records - With no ties

by Philipp Stiefel, originally published November 30th, 2015


bird silouettes, article header image

During the last couple of weeks I encountered one and the same question in several forums. “I want to select the top X records ordered by Y. But when there are several records among them with the same value for Y, I get more than the expected number of records. But I want exactly X records, not more.

This article will show you this can be achieved in Microsoft Access using simple SQL queries.

Prerequisites

I will discuss this based on two simple sample tables. It is a table of actors and another table containing rating(s) of the actors by a random authority (me ;-)).

Screenshot of tables and releationsships

As a prerequisite we need to join the actors to their ratings. This is a very simple query; here it is:

SELECT a.ActorId, a.ActorName, ar.Rating, ar.DateOfRating FROM tblActors a INNER JOIN tblActorRatings ar ON a.ActorId = ar.ActorId;

I save this query with the name qryActors_Ratings.

Basic solution

The basic requirement to select the top X best rated actors is very easy to solve. The Access Jet/Ace-Engine (its database engine) supports the TOP predicate in a select statement. So to just select the top X records you write a query like this:

SELECT TOP 5 * FROM qryActors_Ratings ORDER BY Rating DESC

This queries the top 5 best rated actors from that table. So far it is very simple.

Screenshot of basic select top query results

But looking at the result a problem will become obvious. Within this sample data set, there are several actors with a rating of 4. All of them are included in our result. So, as you will probably have noticed right away, the query does not return 5 records but 6.

No ties

To solve this problem, we need sort over an additional field to get a unique ranking with no ties on any rank. The technical requirement is for this fields to be absolutely unique. That requirement is meet by the primary key of the table.

Just doing this would solve the problem, but it might trigger questions from your users. Like: “Why is actor A included and not actor B even though they have the same ranking?” So I would recommend to use a rule that is easily explained, like “newer rankings are weighted higher”.

To make sure there still are no ties, even if there two ratings on the same date, include the primary key as third sort field.

SELECT TOP 5 * FROM qryActors_Ratings ORDER BY Rating DESC, DateOfRating DESC, ActorId DESC

This query produces the desired result.

Screenshot of select top query results - with no ties

Multiple ratings per actor

So this could have been the end of this article. But if you really paid attention to my table design, you might have spotted the possibility that there is more than one review for any one actor. This might lead to the actor being shown twice in the results. – I added rating record, which causes such a problem.

Screenshot of select top query results - with duplicates

If your data structures allow for the same issue to occur, you need to decide on a method to eliminate the duplicates from the final result. Depending on you exact requirements, the correct solution will differ. I see two possibly valid options in this context.

Option 1 – Based on average rating

You modify our query qryActors_Ratings from earlier on to calculate the average rating of an actor. This can easily be done by using the AVG Domain Aggregate Function.

SELECT a.ActorId, a.ActorName, AVG(ar.Rating) AS Rating, CDate(AVG(ar.DateOfRating)) AS DateOfRating FROM tblActors AS a INNER JOIN tblActorRatings AS ar ON a.ActorId = ar.ActorId GROUP BY a.ActorId, a.ActorName;

I used the AVG function on the DateOfRating as well. The resulting average however will be a number, so we need to convert it back to a date using the CDate-Function. (This works, because dates are stored as numbers internally.)

Screenshot of select top query results - by average (AVG)

The query to retrieve the final results (the top 5) will remain unchanged then. – But it can’t include the RatingId in the final result anymore, because each line in the result may be based on multiple ratings.

Option 2 – Based on the last rating

Another valid option to query the top 5 actors, is based on the last (or analogously the first) rating that is stored for each actor. In this query we need to use the MAX Domain Aggregate Function to retrieve the highest rating id from the base table tblActorRatings. – Please be aware that this will not necessarily be the last rating by DateOfRating, but the last one stored in our table. (Because I used the AutoNumber data type for the RatingId.)

SELECT TOP 5 qar.* FROM qryActors_Ratings AS qar WHERE RatingId =(SELECT MAX(ar.RatingId) FROM tblActorRatings ar WHERE ar.ActorId = qar.ActorId) ORDER BY Rating DESC, DateOfRating DESC, ActorId DESC;
Screenshot of select top query results - by last rating

There might be other, more complex requirements in some cases, but I think these two options are at least a good starting point for the most common scenarios.

Share this article: Share on Facebook Tweet Share on LinkedIn Share on XING

Subscribe to my newsletter

*

I will never share your email with anyone. You can unsubscribe any time.
This email list is hosted at Mailchimp in the United States. See our privacy policy for further details.

Benefits of the newsletter subscription



© 1999 - 2024 by Philipp Stiefel - Privacy Policiy