Communiquez avec les autres et partagez vos connaissances professionnelles

Inscrivez-vous ou connectez-vous pour rejoindre votre communauté professionnelle.

Suivre

I've 250 million row citizen data with field, name, gender, birth date, birth place, could u share best sql query to get potencial data duplication?

user-image
Question ajoutée par helmy yoga pranata , Database administrator , Ministry of Home Affairs
Date de publication: 2017/08/02
Wameedh Fawzi
par Wameedh Fawzi , Postpay CVM Manager , Vodafone

 

 

select t1.*

from

(Select

a.name

,a.gender

,a.birth_date

,a.birth_place

,row_number() over (partition by a.name,a.gender, a.birth_date order by a.name) as rnk

from Citizen_data a

) t1

where t1.rnk >1

Tahir Rehman
par Tahir Rehman , EHS System Analyst, Data Analyst, developer and trainer , Maaden

select name,gender, birthdate,birthplace,count(*) from Citizen group by name,gender, birthdate,birthplace having count(*) > 1;

Kasiviswanathan Selvakumar
par Kasiviswanathan Selvakumar , Full-Stack Developer , Freelance

Let us consider the table name be 'citizen_table' with 'name', 'gender', 'birth_date' and 'birth_place' as column names

The following sql query gets the potential data duplicates

select count(*) as number_of_repeated_rows, name, gender, birth_date, birth_place

from citizen_table

group by name, gender, birth_date, birth_place

 

More Questions Like This

Avez-vous besoin d'aide pour créer un CV ayant les mots-clés recherchés par les employeurs?