Remove Duplicate Rows

  • I have a directory with a hundred or so csv files to import. Some rows/records are duplicated in each file.
    My goal is to combine all the files on one worksheet while removing the duplicate rows.
    I have two macros I found in other Oz posts that now import the csv files into worksheets and then combine them into one worksheet while removing the dupes but it runs progressively slower and slower as the number of records increases until it essentially hangs completely.


    I was hoping someone could look at the attached macros and perhaps suggest some enhancements/fixes.


    One obvious fix would be to not remove the dupes after each worksheet is copied but if I try to add all the files first, the 65K row limit gets in the way so my only option was to run the dupe remover as each worksheet is copied over since my VB skills are insufficient to make it only remove dupes when the row limit is going to be surpassed.




    Code
    Function InSArray(ByRef vArray() As String, ByVal vValue As String) As Boolean
        Dim i As Long
        For i = LBound(vArray) To UBound(vArray)
            If vArray(i) = vValue Then
                InSArray = True
                Exit Function
            End If
        Next i
        InSArray = False
    End Function
  • Re: speed up duplicate row removal while avoiding 65k row limit


    Hi.


    It occurs to me there are other ways to do this but I built this version so try it out!
    What it does:


    - Copies data from sheet x to sheet Data
    - Checks ONLY the copied data from that sheet for dupes and kills them


    - Rinse and repeat for each sheet eg copy data from sheet x, check only that data for dupes and delete if found


    - Only when all sheets are done does it do a final check on all data in Sheet 'Data' for dupes and deletes dupes if found (all data is checked once not repeatedly)


    The difference is that your version was copying a sheet, checking ALL data for dupes and deleting them then copying the next sheet and checking _everything_ again so every sheet that is copied slows the process as more data is checked (repeatedly)


    Let me know.

Participate now!

Don’t have an account yet? Register yourself now and be a part of our community!