Understanding Correlated and Uncorrelated Sub-queries in SQL

Author Bio:
Ben Richardson is the owner of Acuity Training, an IT training company based in the UK. Acuity is one of the UK’s leading SQL training companies. For more details click here.

Sub-queries are queries within another query. The result of the inner sub-query is fed to the outer query, which uses that to produce its outcome. If that outer query is itself the inner query to a further query then the query will continue until the final outer query completes.

There are two types of sub-queries in SQL however, correlated sub-queries and uncorrelated sub-queries. Let’s take a look at these.

Uncorrelated Sub-query

A uncorrelated sub-query is a type of sub-query where inner query doesn’t depend upon the outer query for its execution. It can complete its execution as a standalone query. Let us explain uncorrelated sub-queries with the help of an example.

Suppose, you have database “schooldb” which has two tables: student and department. A department will have many students. This means that the student table has a column “dep_id” which contains the id of the department to which that student belongs. Now, suppose we want to retrieve records of all students from the “Computer” department.

The sub-query used in this case will be uncorrelated sub-query since the inner query will retrieve the id of the computer department from the department table; the result of this inner query will be directly fed into the outer query which retrieves records of students from the student table where “dep_id” column’s value is equal to value retrieved by inner query.

The inner query which retrieves the id of the department using name can be executed as standalone query as well.

Correlated Sub-query

A correlated sub-query is a type of query, where inner query depends upon the outcome of the outer query in order to perform its execution.

Suppose we have a student and department table in “schooldb” as discussed above. We want to retrieve the name, age and gender of all the students whose age is greater than the average age of students within their department.

In this case, the outer query will retrieve records of all the students iteratively and each record is passed to the inner query. For each record, the inner query will retrieve average age of the department for the student record passed by the outer query. If the age of the student is greater than average age, the record of the student will be included in the result, and if not not. Let’s see this in action.

Preparing the Data

Let’s create a database named “schooldb”. Run the following SQL in your query window:

CREATE DATABASE schooldb;

The above command will create a database named “schooldb” on your database server.

Next, we need to create a “department” table within the “schooldb” database. The department table shall have three columns: id, name and capacity. To create department table, execute following query:

CREATE TABLE department
( 
  id INT PRIMARY KEY,
  name VARCHAR(50) NOT NULL, 
  capacity INT NOT NULL, 
)

Next lets add some dummy data to the table so that we can execute our sub-queries. Execute the following to create 5 departments: English, Computer, Civil, Maths and History.

USE schooldb; 

INSERT INTO department 
  VALUES (1, 'English', 300), 
         (2, 'Computer', 450), 
         (3, 'Civil', 400),
         (4, 'Maths', 400),
         (5, 'History', 300)

Next we need to create a “student” table within our database. The student table will have five columns: id, name, age, gender, and dep_id.

The dep_id column will act as the foreign key column and will have values from the id column of the department table. This will create a one to many relationship between the department and student tables. Execute following query to create student table.

USE schooldb; 

CREATE TABLE student
(  
  id INT PRIMARY KEY,
  name VARCHAR(50) NOT NULL,
  gender VARCHAR(50) NOT NULL,
  age INT NOT NULL,
  dep_id INT NOT NULL     
)

And then the below to add 10 students to the table.

USE schooldb;

INSERT INTO student
  VALUES (1, 'Jolly', 'Female', 20, 4), 
         (2, 'Jon', 'Male', 22, 3),
         (3, 'Sara', 'Female', 25, 4),
         (4, 'Laura', 'Female', 18, 2),
         (5, 'Alan', 'Male', 20, 3),
         (6, 'Kate', 'Female', 22, 2),
         (7, 'Joseph', 'Male', 18, 2),
         (8, 'Mice', 'Male', 23, 1),
         (9, 'Wise', 'Male', 21, 5),
         (10, 'Elis', 'Female', 27, 2);

Notice that values in “dep_id” column of the student table exists in the id column of the department table.

Now, let us see examples of both correlated and uncorrelated sub-queries.

Uncorrelated Sub-query Example

Let us execute a uncorrelated sub-query which retrieves records of all the students who belong to “Computer” department.

USE schooldb;

SELECT * FROM
  student 
  WHERE dep_id =
  (
    SELECT id from department WHERE name = 'Computer'
  );

The output of the above SQL will be:

id	name	gender	age	dep_id
4	Laura	Female	18	2
6	Kate	Female	22	2
7	Joseph	Male	18	2
10	Elis	Female	27	2

You can see that there are two queries. The inner query retrieves id of the “Computer” department while the outer query retrieves student records with that id value in the dep_id column.

We know that in the case of uncorrelated sub-queries the inner query can be executed as standalone query and it will still work. Let’s check if this is true in this case. Execute the following query on the server.

SELECT id from department WHERE name = 'Computer';

The above query will execute successfully and will return 2 i.e. the of the “Computer” department. This is a uncorrelated sub-query.

Correlated Sub-query Example

We know that in case of correlated sub-queries, the inner query depends upon the outer query and cannot be executed as a standalone query.

Lets execute a correlated sub-query that retrieves results of all the students with age greater than average age within their department as discussed above.

USE schooldb;

SELECT   name, gender, age
  FROM     student Greater
  WHERE    age >
  (SELECT   AVG (age)
     FROM     student average
     WHERE      greater.dep_id = average.dep_id) ;

The output of the above query will be:

name	gender	age
Kate	Female	22
Elis	Female	27
Jon	Male	22
Sara	Female	25

We know that in the case of a correlated sub-query, the inner query cannot be executed as standalone query. You can verify this by executing the following inner query on it’s own:

SELECT   AVG (age)
  FROM     student average    
  WHERE      greater.dep_id = average.dep_id

The above query will throw an error.

Other small differences between correlated and uncorrelated sub-queries are:

The outer query executes before the inner query in the case of a correlated sub-query. On the other hand in case of a uncorrelated sub-query the inner query executes before the outer query.
Correlated sub-queries are slower. They take M x N steps to execute a query where M is the records retrieved by outer query and N is the number of iteration of inner query. Uncorrelated sub-queries complete execution in M + N steps.

Comments

Cory Long says

August 24, 2017 at 1:38 pm

No Bunny?! I guess it is a guest post, but I would have liked to see Ben get a bunny. How can I learn without bunnies?!

- "No Bugs" Hare says
  
  August 27, 2017 at 8:27 am
  
  🙂
  
Iain says

May 14, 2018 at 5:38 am

Uncorrelated subqueries are much better than correlated subqueries but when I read this article it widens my knowledge on both differences…

budyn1410 says

March 4, 2019 at 9:17 am

Thank you for this amazing article!

- Surekha says
  
  December 2, 2019 at 5:39 pm
  
  It’s very understandable and *easy* ..🤓