Delete Duplicate Emails is an easy and tricky question asked in many interviews. We are going to solve this question with very simple self-joins.
Let's start with having a table named Person containing emails,
SQL Schema for Person table:
CREATE TABLE Person (Id int, Email varchar(255))
INSERT INTO Person (id, email) values ('1', 'john@example.com')
INSERT INTO Person (id, email) values ('2', 'bob@example.com')
INSERT INTO Person (id, email) values ('3', 'john@example.com')
Id | |
---|---|
1 | john@example.com |
2 | bob@example.com |
3 | john@example.com |
Write an SQL query to delete all the duplicate emails, keeping only one unique email with the smallest id. Note that you are supposed to write a DELETE statement and not a SELECT one.
The query result format is following:
We need to have output:
Id | |
---|---|
1 | john@example.com |
2 | bob@example.com |
So the question is clear and the output is clear. Now we will start making a query to get the output:
First, let's join tables with self-join on the basis of Email:
SELECT p1.* FROM PERSON p1 JOIN Person p2 WHERE p1.Email=p2.Email;
The output will look like this,
Id | |
---|---|
3 | john@example.com |
1 | john@example.com |
2 | bob@example.com |
3 | john@example.com |
1 | john@example.com |
Now, we will need to find the bigger id having the same email address as other records. We will be going to another condition so the query will look like this:
SELECT p1.* FROM PERSON p1 JOIN Person p2 WHERE p1.Email=p2.Email AND p1.id>p2.id;
So, the output will be:
Id | |
---|---|
3 | john@example.com |
Now we got the record of duplicate emails we will be going to delete these records using the DELETE statement.
DELETE FROM Person p1 JOIN Person p2 WHERE p1.Email=p2.Email AND p1.id>p2.id;
This is a query that will serve you the required output. Practice this question and let me know in the comments if you find this useful.
Also, let us know if you have any other method to solve this question.
0 Comments