Wednesday, March 21, 2012

How to eliminate duplicate data

I have a table with 68 columns. If all the columns hold the same value except for one which is a datetime column I want to delete all but one of the duplicate rows. Preferably the latest one but that is not important. Can someone show me how to accomplish this?

You can use the Group By fucntion

SELECT col_a, col_b, col_c From Table GROUP BY col_a, col_b, col_c

If you want the newest date - you can also using HAVING Clause

SELECT col_a, col_b, col_c From Table GROUP BY col_a, col_b, col_c HAVING max(col_date)

WHERE col_a - col_c are the columns with the same data and col_date is your date column

AWAL

|||Is this a easier process if I manually delete them? I just want to view the duplicate data but my problem is that the DateTime column in my table is unique unlike all the other columns with the same values.|||

John -

I'm not quite certain what you mean but if your datetime column is not unique but you want dups of all the other columns just don't group by the datetime column.

AWAL

|||

Assuming that your data isn't too big, you can use a technique like this:

drop table removeDups
go
create table removeDups(
column1 int,
column2 int,
column3 int,
column4 int,
column5 int,
column6 int,
column7 int,
column8 int,
column9 int,
datevalue datetime)

insert removeDups
select 1,1,1,1,1,1,1,1,1,getdate()
waitfor delay '00:00:01'
insert removeDups
select 1,1,1,1,1,1,1,1,1,getdate()
waitfor delay '00:00:01'
insert removeDups
select 1,1,1,1,1,1,1,1,1,getdate()
waitfor delay '00:00:01'

insert removeDups
select 2,2,2,2,2,2,2,2,2,getdate()
waitfor delay '00:00:01'
insert removeDups
select 2,2,2,2,2,2,2,2,2,getdate()
waitfor delay '00:00:01'
insert removeDups
select 3,3,3,3,3,3,3,3,3,getdate()
waitfor delay '00:00:01'


delete from removeDups
where not exists
(select *
from ( select min(dateValue) as dateValue,column1,column2,column3,column4,column5,column6, column7, column8, column9
from removeDups
group by column1,column2,column3,column4,column5,column6, column7, column8, column9) as mins
where mins.dateValue = removeDups.dateValue
and mins.column1 = removeDups.column1
and mins.column2 = removeDups.column2
and mins.column3 = removeDups.column3
and mins.column4 = removeDups.column4
and mins.column5 = removeDups.column5
and mins.column6 = removeDups.column6
and mins.column7 = removeDups.column7
and mins.column8 = removeDups.column8
and mins.column9 = removeDups.column9)

select *
from removeDups

I figure once you finish this process you will probably want to hurt the person who gave you this design, even if it is yourself. Try to identify a key amongst the 68 columns and add a unique constraint so you can never get in this position again :)

No comments:

Post a Comment