Group By Statement and Finding Duplicates
The SQL GROUP BY Statement is used to fetch identical data, if only one column is used then it will act as DISTINCT.
It is often used with aggregate functions (COUNT, MAX, MIN, SUM, AVG) to group the result by one or more columns.
When using GROUP BY with aggregate function you have to use HAVING instead of WHERE clause for aggregate column filter.
Important Points:
- GROUP BY clause is used with the SELECT statement.
- GROUP BY clause is placed after the WHERE clause.
- GROUP BY clause is placed before ORDER BY clause.
SELECT column_name(s)
FROM table_name
WHERE condition
GROUP BY column_name(s)
ORDER BY column_name(s);
Example
DROP TABLE IF EXISTS Departments;
CREATE TABLE IF NOT EXISTS Departments
(
DepartmentId Integer Primary Key AutoIncrement
,DepartmentName TEXT
);
INSERT INTO Departments(DepartmentName) VALUES('Administration');
INSERT INTO Departments(DepartmentName) VALUES('Administration');
INSERT INTO Departments(DepartmentName) VALUES('Administration');
INSERT INTO Departments(DepartmentName) VALUES('Sales & Marketing');
INSERT INTO Departments(DepartmentName) VALUES('HR');
INSERT INTO Departments(DepartmentName) VALUES('Accounts');
INSERT INTO Departments(DepartmentName) VALUES('Accounts');
INSERT INTO Departments(DepartmentName) VALUES('Accounts');
-- Show Distinct rows
SELECT DepartmentName FROM Departments
GROUP BY DepartmentName;
-- Show Record COUNT
SELECT DepartmentName, Count(*) As RecordCount
FROM Departments
GROUP By DepartmentName
-- Finding Duplicates
SELECT DepartmentName, Count(*) As RecordCount
FROM Departments
GROUP By DepartmentName
HAVING Count(*) > 1
-- Show DISTINCT entire duplicate reocords
SELECT DISTINCT a.DepartmentId , a.DepartmentName
FROM Departments a JOIN (
SELECT DepartmentName, Count(*) As RecordCount
FROM Departments
GROUP By DepartmentName
HAVING Count(*) > 1) b
ON a.DepartmentName = b.DepartmentName
ORDER BY DepartmentId
-- Two group by for Last Record for duplicate data
SELECT a.DepartmentId , a.DepartmentName
FROM Departments a JOIN (
SELECT DepartmentName, Count(*) As RecordCount
FROM Departments
GROUP By DepartmentName
HAVING Count(*) > 1) b
ON a.DepartmentName = b.DepartmentName
GROUP BY a.DepartmentName
ORDER BY DepartmentId
Group by not only is used for grouping items to find count, sum or in other aggregate functions. It is also helpful in finding duplicate rows.