I have an Excel sheet (File 1) with huge employee data running into tens of thousand of rows. Sometimes I get sent another spreadsheet (File 2) with a few hundred employee data and I am asked to find if these employees exist in my database based on their name, mobile and email. Now I cannot run a vlookup because it will only pull out the same name from File 1, but it won't necessarily be the same employee. I need a macro to compare the name, phone, email for an employee in File 2 and tell me if there is an employee with the same details in File 1. These criteria keep changing too, sometimes I am asked to compare based on Surname, Maiden Name and so on...
Macro to find duplicates based on more than one column
-
Hyperventilate -
December 19, 2020 at 10:31 AM -
Thread is marked as Resolved.
-
-
-
Hello,
With such a massive and important database ... don't you have a Unique Identifier ...
-
Hello,
With such a massive and important database ... don't you have a Unique Identifier ...
You can imagine the frustration . The only unique identifier I have is the serial number
-
You can imagine the frustration . The only unique identifier I have is the serial number
Question for you :
Is your serial number TRULY UNIQUE ... ???
-
Question for you :
Is your serial number TRULY UNIQUE ... ???
Haha...asking the right questions I see...
-
-
Well.... even if your serial number is not logical ... BUT UNIQUE ... your problem is already solved ...
-
Well.... even if your serial number is not logical ... BUT UNIQUE ... your problem is already solved ...
Ah, no. I was being sarcastic. Each file has serial numbers starting from 1 and running through to the end, so they're just useless.
-
Fair enough ...
What about building your own 'smart' unique identifier ... to compensate for the stupidity of your IT department ... ???
-
Can you post a sample of File 1 and File 2? You can de-sensitize the data if necessary and you can include a few dozen rows of data. Also, what do you want to do if there is an employee with the same details in File 1?
-
Should you make an attempt to build your own Unique Identifier ...
https://howtoexcelatexcel.com/blog/generate-…excel-data-set/
Hope this will help your inspiration
-
-
Can you post a sample of File 1 and File 2? You can de-sensitize the data if necessary and you can include a few dozen rows of data. Also, what do you want to do if there is an employee with the same details in File 1?
Please see attached sample file. I need to see if sheet1 employees exist in my database (sheet2) based on their name, father's name, mobile and email. For example, (Sl.1) from sheet one exists in sheet two as (Sl. 29.)
If there is a match found, I just need them to be moved to Sheet 3 or a new book.
-
Hello,
In your 'New Sheet' , the Employee Contact No 8565875655 appears twice (row 5 and row 9) with, for example, two different first names
Could the combination of the the Employee Contact No with, say, the Family Name ... be your Unique Identifier ...?
-
Hello,
In your 'New Sheet' , the Employee Contact No 8565875655 appears twice (row 5 and row 9) with, for example, two different first names
Could the combination of the the Employee Contact No with, say, the Family Name ... be your Unique Identifier ...?
Currently, I just go for exact matches and reject anything that is even slightly different. Contact numbers don't repeat in my files, but since this is a dummy db I created just to demonstrate what I need, I've not noticed that. My bad.
-
Currently, I just go for exact matches and reject anything that is even slightly different. Contact numbers don't repeat in my files, but since this is a dummy db I created just to demonstrate what I need, I've not noticed that. My bad.
Sadly in your Database, even by concatenating your 6 Columns B & C & D & E & F & G ... you cannot generate a Unique Identifier ...!!!
-
Hello again,
Understand you cannot share your real life data ...
But
If you were to rebuild your dummy database ... you should stick to both your actual coherence AND incoherence ...in order to replicate the underlying logic of the problems you are facing with your file ...
-
-
Currently, I just go for exact matches and reject anything that is even slightly different. Contact numbers don't repeat in my files, but since this is a dummy db I created just to demonstrate what I need, I've not noticed that. My bad.
Sorry ... but read your message # 14 way too quickly ...
Did you say : " Contact numbers don't repeat in my files "
If you really mean it ... look no further for your Unique Identifier ... !!!
-
Hello again,
Understand you cannot share your real life data ...
But
If you were to rebuild your dummy database ... you should stick to both your actual coherence AND incoherence ...in order to replicate the underlying logic of the problems you are facing with your file ...
Agreed and noted Sir.
-
Make sure that both workbooks are open. Place this macro in File1 and run it from there. Change the sheet names and workbook name to suit your needs.
Code
Display MoreSub CompareData() Application.ScreenUpdating = False Dim desWS As Worksheet, srcWS As Worksheet, arr1 As Variant, arr2 As Variant, dic As Object Set srcWS = ThisWorkbook.Sheets("Database") Set desWS = Workbooks("File2.xlsx").Sheets("Sheet1") arr1 = srcWS.Range("G2", srcWS.Range("G" & Rows.Count).End(xlUp)).Value arr2 = desWS.Range("G2", desWS.Range("G" & Rows.Count).End(xlUp)).Value Set dic = CreateObject("Scripting.Dictionary") For i = 1 To UBound(arr1, 1) dic.Add arr1(i, 1), Nothing Next i For i = 1 To UBound(arr2, 1) If dic.Exists(arr2(i, 1)) Then desWS.Rows(i + 1).Interior.ColorIndex = 3 End If Next i Application.ScreenUpdating = True End Sub
-
Possibly...
Code
Display MoreSub SearchDB() Dim v As Variant, n As Variant Dim i As Long, j As Long With ThisWorkbook v = .Worksheets("Database").UsedRange.Resize(, 9) n = .Worksheets("New").UsedRange For i = LBound(v) To UBound(v) For j = LBound(n) To UBound(n) If v(i, 2) = n(j, 2) Then 'first name If v(i, 3) = n(j, 3) Then 'last name If v(i, 7) = n(j, 7) Then 'mobile If v(i, 8) = n(j, 8) Then 'email 'puts the matching new serial number to col I of the database v(i, 9) = n(j, 1) End If End If End If End If Next j Next i .Worksheets("Database").UsedRange.Resize(, 9) = v End With End Sub
-
Make sure that both workbooks are open. Place this macro in File1 and run it from there. Change the sheet names and workbook name to suit your needs.
Code
Display MoreSub CompareData() Application.ScreenUpdating = False Dim desWS As Worksheet, srcWS As Worksheet, arr1 As Variant, arr2 As Variant, dic As Object Set srcWS = ThisWorkbook.Sheets("Database") Set desWS = Workbooks("File2.xlsx").Sheets("Sheet1") arr1 = srcWS.Range("G2", srcWS.Range("G" & Rows.Count).End(xlUp)).Value arr2 = desWS.Range("G2", desWS.Range("G" & Rows.Count).End(xlUp)).Value Set dic = CreateObject("Scripting.Dictionary") For i = 1 To UBound(arr1, 1) dic.Add arr1(i, 1), Nothing Next i For i = 1 To UBound(arr2, 1) If dic.Exists(arr2(i, 1)) Then desWS.Rows(i + 1).Interior.ColorIndex = 3 End If Next i Application.ScreenUpdating = True End Sub
Thank you! I'm getting a runtime error "This key is already associated with an element of this collection".
The debug points to - (dic.Add arr1(i, 1), Nothing)
-
-
Participate now!
Don’t have an account yet? Register yourself now and be a part of our community!