Thursday, 29 September 2011

Removing Duplicate Records


So you have duplicate records found within tables with unique ID’s

Here are three methods you may want to consider in selecting/removing those duplicates.


Method 1: The unMAX method – just the earliest duplicates

Max will display the largest value of a selected column, in this case your identifier. As a result non duplicated rows will be obtained…  If you negate that result (as shown) you will have only duplicated records:


   1:  DELETE
   2:  FROM MyTable
   3:  WHERE ID NOT IN (
   4:       SELECT MAX(ID)
   5:       FROM MyTable
   6:       GROUP BY DuplicateColumn1, DuplicateColumn2, DuplicateColumn3
   7:  )


*Note* since your negating everything but the largest ID, only the smaller ID’s will be removed

Method 2: The unMATCH method – find all duplicates

Here you compare a table with itself matching all desired duplicated information, and un-matching all equivalent Id’s.

   1:  SELECT M1.DuplicateColumn1, M1.DuplicateColumn2, M1.DuplicateColumn3
   2:  FROM MyTable M1
   3:  INNER JOIN MyTable M2
   4:  ON ISNULL(M1.DuplicateColumn1, '') = ISNULL(M2.DuplicateColumn1, '')
   5:  AND ISNULL(M1.DuplicateColumn2, '') = ISNULL(M2.DuplicateColumn2, '')
   6:  AND ISNULL(M1.DuplicateColumn3, '') = ISNULL(M2.DuplicateColumn3, '')
   7:  AND M1.ID <> M2.ID
   8:  ORDER BY M1.DuplicateColumn1, M1.DuplicateColumn2, M1.DuplicateColumn3

*Note* IsNull prevents comparing null values with blank values. A measure of safety 

Method 3: The ID comparison – just the latest duplicates

Literally a combination of method 1 and Method 2. More straight forward, and with more control. Few more lines in your select statement of course!

   1:  DELETE FROM MyTable
   2:  WHERE ID IN (
   3:       SELECT M1.ID
   4:       FROM MyTable M1
   5:       INNER JOIN MyTable M2
   6:       ON ISNULL(M1.DuplicateColumn1, '') = ISNULL(M2.DuplicateColumn1, '')
   7:       AND ISNULL(M1.DuplicateColumn2, '') = ISNULL(M2.DuplicateColumn2, '')
   8:       AND ISNULL(M1.DuplicateColumn3, '') = ISNULL(M2.DuplicateColumn3, '')
   9:       AND M1.ID > M2.ID
  10:  )


My sources for the above are Pinal Dave and Gordon Bell, thank you!

If you have anything to add please comment.


Technorati Tags: ,,



Page Hits