Compare 2 Sheets with 1 million rows and 50 columns and highlights the difference with color

revanth 
July 22, 2020 at 10:17 AM 
Thread is marked as Resolved.
We will be implementing some important changes during 25th and 26th May 2024 which may result in an outage period of the website. We anticipate this process may take a couple of hours and so we apologise in advance for any inconvenience.



Yes if available then check the other columns for difference if not available then colour total row of active Sheet

No idea if this will be any quicker than the above but you may get a speed bump. I figured it was worth a shot. Haven't tested it because I don't have workbooks with a million rows!
Code
Display MoreSub compareData() Dim a, b, c, d, i&, j&, k& Dim sheetA As Worksheet Dim sheetB As Worksheet Set sheetA = Sheets("Sheet1") Set sheetB = Sheets("Sheet2") a = sheetA.Range("A1", sheetA.Cells(sheetA.Rows.Count, 1).End(xlUp).Resize(, 50)).Value b = sheetB.Range("A1", sheetB.Cells(sheetB.Rows.Count, 1).End(xlUp).Resize(, 50)).Value ReDim c(1 To UBound(b), 1 To 1) c(1, 1) = "Different?" Set d = CreateObject("Scripting.Dictionary") For i = 2 To UBound(a) d(a(i, 1)) = i Next i Application.ScreenUpdating = False For i = 2 To UBound(b) If Not d.exists(b(i, 1)) Then sheetB.Cells(i, 1).Resize(, UBound(b, 2)).Interior.ColorIndex = 45 c(i, 1) = "YES" Else k = d(b(i, 1)) For j = 2 To UBound(b, 2) If b(i, j) <> a(k, j) Then sheetB.Cells(i, j).Interior.Color = vbRed c(i, 1) = "YES" End If Next j End If Next i sheetB.Range("A1").Offset(, UBound(b, 2)).Resize(UBound(c)).Value = c Application.ScreenUpdating = True Set d = Nothing Erase a Erase b Erase c End Sub

Hi, on above code c(i,1) means adding one more column and adding yes right ?



Hi revanth,
I tested this code comparing two sheets of data each with 41 columns and 520,000 rows, it took 12.29 seconds to complete. It does not do everything you are after yet but if it you think it is in the ballpark of what you are after let me know.
Currently it finds the first difference in a row then highlights the entire row and moves to the next row.
Code
Display MoreOption Explicit Sub compare() Application.ScreenUpdating = False Application.Calculation = xlCalculationManual Dim StartTime As Double Dim SecondsElapsed As Double StartTime = Timer Dim ws1 As Worksheet: Set ws1 = Sheet1 Dim ws2 As Worksheet: Set ws2 = Sheet2 Dim MyArr1 As Variant 'MyArr1 = ws1.Range("A1").CurrentRegion.Value2 Dim MyArr2 As Variant 'MyArr2 = ws2.Range("A1").CurrentRegion.Value2 Dim i As Long, x As Long Dim MaxRW As Long, CRW As Long, IncRW As Long, MaxCL As Long MaxRW = ws1.Cells(Rows.Count, 1).End(xlUp).Row MaxCL = ws1.UsedRange.Columns.Count CRW = 1 IncRW = 50000 While CRW < MaxRW MyArr1 = ws1.Range(ws1.Cells(CRW, 1), ws1.Cells(CRW + IncRW, MaxCL)) MyArr2 = ws2.Range(ws2.Cells(CRW, 1), ws2.Cells(CRW + IncRW, MaxCL)) For x = LBound(MyArr1, 1) To UBound(MyArr1, 1) For i = LBound(MyArr1, 2) To UBound(MyArr1, 2) If MyArr1(x, i) <> MyArr2(x, i) Then ws2.Cells(x + CRW  1, i).EntireRow.Interior.Color = vbRed GoTo MyNxti End If Next i MyNxti: Next x CRW = CRW + IncRW Wend SecondsElapsed = Round(Timer  StartTime, 2) Application.ScreenUpdating = True Application.Calculation = xlCalculationAutomatic MsgBox "This code ran successfully in " & SecondsElapsed & " seconds", vbInformation End Sub

HI Trunten,
I tried your code on the large data set and it returned an out of memory error, I was getting the same when loading the entire sheet into an array so I cut the array into pieces. when I cut it down to 90,000 rows per sheet it ran but I cancelled after about 20 seconds. The highlight is colouring numerous cells which are the same on both sheets ( I only included a couple of differences throughout, but thousands of cells are being highlighted).
I obtained some large data sets from Here then copied and pasted them a few times to get into the ridiculous range.
Regards
Justin

Probably should have guessed that would be too big for an array!!
i thought I had to compare rows that had the same id in column A (and then highlight the entire row if the id was not on the first sheet) so that may explain why you’re seeing so many “reds”. Could’ve got the wrong end of the stick though

Maybe, I am not entirely sure what he is after yet. I was just interested in how long it would take for an array solution to work through that many cells... I think 12 seconds is not too bad, though it may increase significantly depending on the number of differences and the additional requirements.

It took 3.25 min to compare sheet 1 and sheet2


Made a few tweaks to mine to try and fit in memory and up the speed a little. It still took 35 seconds but I created a spreadsheet with 2 sheets each with 50 columns and 500,000 rows which took over a minute just to open on my machine so I'm pretty happy with that
Code
Display MoreSub compareData() Dim a, b, c, d, i&, j&, k&, cols&, t, rowA, rowB Dim sheetA As Worksheet Dim sheetB As Worksheet t = Timer Set sheetA = Sheets("Sheet1") Set sheetB = Sheets("Sheet2") cols = sheetA.UsedRange.Columns.Count a = sheetA.Range("A1", sheetA.Cells(sheetA.Rows.Count, 1).End(xlUp)).Value b = sheetB.Range("A1", sheetB.Cells(sheetB.Rows.Count, 1).End(xlUp)).Value ReDim c(1 To UBound(b), 1 To 1) c(1, 1) = "Different?" Set d = CreateObject("Scripting.Dictionary") For i = 2 To UBound(a) d(a(i, 1)) = i Next i Application.ScreenUpdating = False For i = 2 To UBound(b) If Not d.exists(b(i, 1)) Then sheetB.Cells(i, 1).Resize(, cols).Interior.ColorIndex = 45 c(i, 1) = "YES" Else k = d(b(i, 1)) rowA = sheetA.Cells(k, 1).Resize(, cols) rowB = sheetB.Cells(i, 1).Resize(, cols) For j = 2 To cols 'If sheetB.Cells(i, j).Value <> sheetA.Cells(k, j).Value Then If rowA(1, j) <> rowB(1, j) Then sheetB.Cells(i, j).Interior.Color = vbRed c(i, 1) = "YES" End If Next j End If Next i sheetB.Range("A1").Offset(, cols).Resize(UBound(c)).Value = c Application.ScreenUpdating = True Set d = Nothing Erase a Erase b Erase c Debug.Print Round(Timer  t, 2) End Sub

What version of excel are you running, and how old is the computer system? Maybe if it is a 32 bit version of excel? That is a big jump in time for a not particularly large jump in size (about a 25% increase considering you had 9 extra columns). so you would expect it to be closer to 20 seconds. Anyway, that is the trouble with humungous data sets. You could try your hand at R.

Participate now!
Don’t have an account yet? Register yourself now and be a part of our community!