I have a directory with a hundred or so csv files to import. Some rows/records are duplicated in each file.
My goal is to combine all the files on one worksheet while removing the duplicate rows.
I have two macros I found in other Oz posts that now import the csv files into worksheets and then combine them into one worksheet while removing the dupes but it runs progressively slower and slower as the number of records increases until it essentially hangs completely.
I was hoping someone could look at the attached macros and perhaps suggest some enhancements/fixes.
One obvious fix would be to not remove the dupes after each worksheet is copied but if I try to add all the files first, the 65K row limit gets in the way so my only option was to run the dupe remover as each worksheet is copied over since my VB skills are insufficient to make it only remove dupes when the row limit is going to be surpassed.
Sub Copy_Sheets_To_DataSheet_RemoveDupes() Dim shtName As Worksheet Dim lstRow As Long For Each shtName In Worksheets If Left(shtName.Name, 14) = "CollectionTool" Then shtName.Select Range(Selection, ActiveCell.SpecialCells(xlCellTypeLastCell)).Copy Worksheets("Data").Select Range("A1").Select Selection.End(xlDown).Offset(1, 0).Select Selection.PasteSpecial Paste:=xlPasteValues, Operation:=xlNone, SkipBlanks _ :=False, Transpose:=False End If 'remove dupes Application.Run "DeleteDuplicates" Next End Sub
Public Sub DeleteDuplicates() Sheets("Data").Select Dim StartRow As Long, CheckRG As Range, ColToCheck As Range Dim UniqCells() As String, UniqCount As Long, i As Long, DelRG As Range 'Set row to start searching for duplicates StartRow = 2 'Determine the range of used cells Set CheckRG = Intersect(Rows(StartRow & ":65536"), ActiveSheet.UsedRange) If CheckRG Is Nothing Then Exit Sub 'Nothing to check 'Ask the user what column should be tested for duplicate values Do Set ColToCheck = Range("c:c") 'Application.InputBox(prompt:="Which column should be" & vbNewLine & "checked for duplicates?", Title:="Please select single column", Type:=8) '*****You can remove the "Range("c:c") and uncomment the above line. This asks the user to select the column where duplicate values are to be checked Loop Until ColToCheck.Columns.Count = 1 'Begin the process of removing duplicates by loading unique values into string array ' and setting the duplicate rows into a range to delete later on UniqCount = 0 ReDim UniqCells(0) For i = 1 To CheckRG.Rows.Count If Not InSArray(UniqCells, Intersect(CheckRG.Rows(i).EntireRow, ColToCheck).Text) Then ReDim Preserve UniqCells(UniqCount) UniqCells(UniqCount) = Intersect(CheckRG.Rows(i).EntireRow, ColToCheck).Text UniqCount = UniqCount + 1 Else If DelRG Is Nothing Then Set DelRG = CheckRG.Rows(i).EntireRow Else Set DelRG = Union(DelRG, CheckRG.Rows(i).EntireRow) End If End If Next i 'Delete duplicate rows If Not DelRG Is Nothing Then 'If there are duplicate cells... Application.ScreenUpdating = False 'Turn off screen updating DelRG.Delete 'Delete duplicate range Application.ScreenUpdating = True 'Turn on screen updating End If End Sub