SQL Aggregation: Having vs Where Clause

  • Post last modified:June 16, 2023
  • Reading time:4 mins read

Discussing the difference between where and having a clause with aggregate function

Introduction

  • SQL provides different types of clauses such as where, having, like, etc. These clauses help us to filter data, perform aggregation and analyze the data.
  • In this article, we will discuss where & having clauses along with aggregation function. We will also cover the difference between them.

Table

  • I created a sample table called book_reviews for this tutorial which looks like below

Table description

  • It contains basic columns for book reviews such as user_id, isbn, and rating.
postgres=# \d book_reviews
                                   Table "public.book_reviews"
   Column    |  Type   | Collation | Nullable |                     Default
-------------+---------+-----------+----------+--------------------------------------------------
 reviews_id  | integer |           | not null | nextval('book_reviews_reviews_id_seq'::regclass)
 user_id     | text    |           |          |
 isbn        | text    |           |          |
 book_rating | text    |           |          |
Indexes:
    "book_reviews_pkey" PRIMARY KEY, btree (reviews_id)

Loading data

COPY book_reviews(user_id, isbn, book_rating) FROM '#{FILE_PATH}' delimiter ';' csv header encoding 'UTF-8';

Records Count

postgres=# select count(*) from book_reviews;
 count
--------
 493813
(1 row)

Having Clause

  • Having a clause in SQL is used to filter the results of the query based on aggregate function results.
  • For example, if we would like to filter user_id who have rated books more than 10 times, we can include the count in the HAVING clause like below.
postgres=# SELECT user_id FROM book_reviews GROUP BY user_id HAVING COUNT(book_rating) > 10;
 user_id
---------
 256407
 241666
 209563
 239467
 270838
 193768
 252785
 181223
 177590
....
  • We can further expand our query and instead of returning users, we can count those users by referring above query as a base/subquery query.
postgres=# select count(*) from ( select user_id from book_reviews GROUP BY user_id HAVING count(book_rating) > 10 )x ;
 count
-------
  5120
(1 row)

Where Clause

  • Where clause cannot be used with aggregation since its execution order is before group by and aggregation function.
  • It’s mainly used to filter out each row for select operation.
postgres=# SELECT user_id, COUNT(book_rating) FROM book_reviews GROUP BY user_id ORDER BY 2 DESC;
 user_id | count
---------+-------
 198711  |  7550
 212898  |  4785
 278418  |  4533
 235105  |  3067
 230522  |  2991
 234623  |  2674
 245963  |  2507
 204864  |  2504
 185233  |  2448
 171118  |  2421
 232131  |  2347
 227447  |  2340
...........
  • But we can use the where clause on top of the group by and aggregation query by treating it as a subquery.
postgres=# SELECT user_id FROM ( SELECT user_id, COUNT(book_rating) total_rating_count FROM book_reviews GROUP BYuser_id ORDER BY 2 DESC) x WHERE x.total_rating_count > 10;
 user_id
---------
 198711
 212898
 278418
 235105
 230522
 234623
 245963
 204864
 185233
 171118
........
  • We can confirm the total user count who rated more than 10 times with one we calculated using HAVING.
postgres=# SELECT COUNT(user_id) FROM ( SELECT user_id, COUNT(book_rating) total_rating_count FROM book_reviews GROUP BY user_id ORDER By 2 DESC ) x WHERE x.total_rating_count > 10;
 count
-------
  5120
(1 row)
  • Another example would be to use aggregation in the filter condition of the WHERE Clause. Filter condition is a subquery that uses average aggregation.
postgres=# SELECT user_id, book_rating FROM book_reviews WHERE book_rating::NUMERIC > ( SELECT AVG(book_rating::NUMERIC) FROM book_reviews );
 
user_id | book_rating
---------+-------------
 276726  | 5
 276729  | 3
 276729  | 6
 276736  | 8
 276737  | 6
 276744  | 7
 276745  | 10
 276747  | 9
 276747  | 9
 276747  | 8
 276747  | 7
 276747  | 7
 276748  | 6
 276751  | 8
 276754  | 8
 276755  | 5

What’s the difference?

  • The fundamental difference between Having and Where is Where selects input rows before group by and aggregates are computed, Whereas Having selects group rows after groups and aggregate are computed.
  • Since Where is executed before aggregates, it makes sense not to allow aggregates with the where clause. Where takes participation in selecting the rows that get added to groups and aggregates.
  • And Having’s execution happens after the group and aggregates so it makes sense to use it to filter upon the computed aggregated value.

Conclusion

  • In this article, we discuss the difference between a where clauses and having clauses along with their execution order.
  • Having a clause can make the query look shorter and apply the filter on aggregated columns, while the Where clause filters the rows for the selection, but we can always use a subquery approach to filter over aggregation

Leave a Reply