I have a directory with a hundred or so csv files to import. Some rows/records are duplicated in each file.
My goal is to combine all the files on one worksheet while removing the duplicate rows.
I have two macros I found in other Oz posts that now import the csv files into worksheets and then combine them into one worksheet while removing the dupes but it runs progressively slower and slower as the number of records increases until it essentially hangs completely.
I was hoping someone could look at the attached macros and perhaps suggest some enhancements/fixes.
One obvious fix would be to not remove the dupes after each worksheet is copied but if I try to add all the files first, the 65K row limit gets in the way so my only option was to run the dupe remover as each worksheet is copied over since my VB skills are insufficient to make it only remove dupes when the row limit is going to be surpassed.
Sub Copy_Sheets_To_DataSheet_RemoveDupes()
Dim shtName As Worksheet
Dim lstRow As Long
For Each shtName In Worksheets
If Left(shtName.Name, 14) = "CollectionTool" Then
shtName.Select
Range(Selection, ActiveCell.SpecialCells(xlCellTypeLastCell)).Copy
Worksheets("Data").Select
Range("A1").Select
Selection.End(xlDown).Offset(1, 0).Select
Selection.PasteSpecial Paste:=xlPasteValues, Operation:=xlNone, SkipBlanks _
:=False, Transpose:=False
End If
'remove dupes
Application.Run "DeleteDuplicates"
Next
End Sub
Display More
Public Sub DeleteDuplicates()
Sheets("Data").Select
Dim StartRow As Long, CheckRG As Range, ColToCheck As Range
Dim UniqCells() As String, UniqCount As Long, i As Long, DelRG As Range
'Set row to start searching for duplicates
StartRow = 2
'Determine the range of used cells
Set CheckRG = Intersect(Rows(StartRow & ":65536"), ActiveSheet.UsedRange)
If CheckRG Is Nothing Then Exit Sub 'Nothing to check
'Ask the user what column should be tested for duplicate values
Do
Set ColToCheck = Range("c:c")
'Application.InputBox(prompt:="Which column should be" & vbNewLine & "checked for duplicates?", Title:="Please select single column", Type:=8)
'*****You can remove the "Range("c:c") and uncomment the above line. This asks the user to select the column where duplicate values are to be checked
Loop Until ColToCheck.Columns.Count = 1
'Begin the process of removing duplicates by loading unique values into string array
' and setting the duplicate rows into a range to delete later on
UniqCount = 0
ReDim UniqCells(0)
For i = 1 To CheckRG.Rows.Count
If Not InSArray(UniqCells, Intersect(CheckRG.Rows(i).EntireRow, ColToCheck).Text) Then
ReDim Preserve UniqCells(UniqCount)
UniqCells(UniqCount) = Intersect(CheckRG.Rows(i).EntireRow, ColToCheck).Text
UniqCount = UniqCount + 1
Else
If DelRG Is Nothing Then
Set DelRG = CheckRG.Rows(i).EntireRow
Else
Set DelRG = Union(DelRG, CheckRG.Rows(i).EntireRow)
End If
End If
Next i
'Delete duplicate rows
If Not DelRG Is Nothing Then 'If there are duplicate cells...
Application.ScreenUpdating = False 'Turn off screen updating
DelRG.Delete 'Delete duplicate range
Application.ScreenUpdating = True 'Turn on screen updating
End If
End Sub
Display More