|
|
In the following example we will identify duplicate
records in the Clients table and remove them. I'm taking for granted
that Customer record has it's native Id (CustomerID) and our task is purely
technical.
The source dataset looks like this:
| CustomerID | CustomerName | phone |
| 3 | Barbara Spears | 64-396583 |
| 4 | Pestiana Oliviera | 000-523147 |
| 4 | Pestiana Oliviera | 000-523147 |
| 7 | Eva Longoria | 507-509875 |
| 10 | Erika Nass | 354-257413 |
| 11 | Somerset Dogan | 64-359734 |
| 14 | Antonella Musentah | 507-501478 |
| 14 | Antonella Musentah | 507-501478 |
| ... | ... | ... |
CustomerID that appear more than once designates the duplicate record:
SELECT CustomerID, count(*) AS cn
FROM Clients
GROUP BY CustomerID HAVING count(*) > 1;
|
Now we know that records with CustomerID equal to 4 or 14 are duplicate. In the
next step we have to chose only one record representing CustomerID, that means
make it unique. To do this we will numerate entire table with aditional Autoincrement
Key. And after this will chose minimal numerator valuie per each CustomerID as
a candidate to remain in dataset. Others (not equal to Minimum) will be deleted.
ALTER TABLE Clients ADD idno Autoincrement;
CREATE UNIQUE INDEX idx1_clients_idno ON Clients (idno);
|
Following query creates list of minimal id numbers among duplicates and
saves it as a table tmp_Min_Idno
SELECT MIN(a.idno) AS min_id, a.CustomerID
  INTO tmp_Min_Idno
FROM
  (SELECT idno, CustomerID FROM Clients) AS a
    INNER JOIN
  (SELECT CustomerID, count(*) AS cn
   FROM Clients GROUP BY CustomerID
   HAVING count(*) > 1) AS b
ON a.CustomerID=b.CustomerID
GROUP BY a.CustomerID;
|
Delete command can look like this:
DELETE a.*
FROM Clients AS a INNER JOIN tmp_Min_Idno AS b ON a.CustomerID = b.CustomerID
WHERE a.idno <> b.min_id;
|
But cause not every version of Desktop DBMS supports Delete Joins, we can achive
the same goal by making some additional work:
ALTER TABLE Clients ADD remove_this INT;
UPDATE Clients SET remove_this = 0;
UPDATE Clients AS a INNER JOIN tmp_Min_Idno AS b
ON a.CustomerID=b.CustomerID
SET a.remove_this = 1
WHERE a.idno <> b.min_id;
DELETE * FROM Clients WHERE remove_this = 1;
|
Finally drop temporary fields:
ALTER TABLE Clients DROP COLUMN remove_this;
ALTER TABLE Clients DROP COLUMN idno;
|
sqlexamples.info
|