Get unique items from large array fast

  • Hi everyone,


    I'd have 2 large ranges:
    A1:A50000
    B1:B50000


    Both contain duplicates.
    I'd like to determine which unique values from A range are contained in B range.


    I thought of ruling out these methods:
    a) Looping thru arrays and making array of unique values
    b) Range.AdvancedFilter Unique:=True
    c) Creating pivot tables


    I found on Ozgrid this nice looking method:

    Code
    Dim V As Variant
        With Range("A1:A50000")
            V = Filter(Evaluate("TRANSPOSE(IF(COUNTIF(OFFSET(" & _
                .Address & ",,,ROW(1:" & .Rows.Count & "))," & _
                .Address & ")=1," & .Address & "))"), False, 0)
        End With


    But it still takes long processing time.


    What would you suggest as fastest way ?

  • Re: Get unique items from large array fast


    Try this


    If you want to display the result then add this line above "End Sub" (change [d2] to whatever cell you want to use to start the list display).

    Code
    [d2].Resize(UBound(z)) = Application.Transpose(z)

    We now have a reputation system in place. If my reply helped please "Like" the reply by clicking the "Like" icon at bottom right of my reply.

  • Re: Get unique items from large array fast


    You're welcome.

    We now have a reputation system in place. If my reply helped please "Like" the reply by clicking the "Like" icon at bottom right of my reply.

  • Re: Get unique items from large array fast


    You may well find that using a Dictionary rather than ArrayList will give further speed improvement.


    Thanks to Pike for pointing out to me the advantage of Dictionary over Arraylist :)

    We now have a reputation system in place. If my reply helped please "Like" the reply by clicking the "Like" icon at bottom right of my reply.

  • Re: Get unique items from large array fast


    KjBox, thanks again.


    I think I'll stick with the Collections.
    Dictionaries seem to be way slower over such large arrays.


    I also found strange that when I had 100K rows rather than 50K,
    application could transpose only somewhat over 34K items,
    unlike all 50K in original example.


    But anyway, your help was very descriptive and appreciated.
    Keep up the same good approach !

  • Re: Get unique items from large array fast


    Sorry my mistake for using

    Code
    y = Application.Transpose(.Columns(2))


    Normally a one dimensional array, when used in Application.Match, is slightly faster than a two dimensional array. I forgot about you size of data.


    I have corrected that, and modified the Collections.Arraylist code to make it more efficient.


    Try both these procedures to see which is faster for you. You will get a message box saying how many seconds it took for the code to run.


    I would be interested to know the times taken for your full set of data, even on a very small set of test data I find the Dictionary method to be appreciably faster.

    We now have a reputation system in place. If my reply helped please "Like" the reply by clicking the "Like" icon at bottom right of my reply.

  • Re: Get unique items from large array fast


    I guess this seems to be a good comparison of collection versus dictionary. https://colinlegg.wordpress.co…or-unique-values-vba-udf/
    Of course even at 50% slower, does 1.5 seconds really seem that much longer than 1 second? Part of the speed advantage is likely due to collections being case insensitive.


    In some speed tests that I looked at some time back, the speed varied significantly in multiple calls. If it is a one-off deal, the comparison should be based on a fresh start of Excel for each test. Speed can be affected by the way the macros are played as well.


    KjBox, like you, it might be fun to run some speed tests for a large matrix. All things being equal, speed-wise, early binding methods should be faster than late binding methods. https://support.microsoft.com/en-us/kb/245115



    Here are a few unique routines that I use.


  • Re: Get unique items from large array fast


    Hi, thank you both, this is really useful.


    At 2 columns of 100K of data,
    Collections took 5 secs while Dictionaries only 3.1 secs.
    Both are case insensitive.
    Is there a way to apply Text compare rather than binary? But still, I can overcome it with UPPER worksheet function just in case.


    KjBox: the Application.Match failed with type mismatch, cause it probably expected range as 2nd argument (rather than y).

  • Re: Get unique items from large array fast


    You can do Text Compare when using a Dictionary


    We now have a reputation system in place. If my reply helped please "Like" the reply by clicking the "Like" icon at bottom right of my reply.

  • Re: Get unique items from large array fast


    Quote

    KjBox: the Application.Match failed with type mismatch, cause it probably expected range as 2nd argument (rather than y).


    Application.Match works perfectly well with either an array or a worksheet range. What columns is your data in?

    We now have a reputation system in place. If my reply helped please "Like" the reply by clicking the "Like" icon at bottom right of my reply.

  • Re: Get unique items from large array fast


    I showed how to set the compare mode for the dictionary object in my example and how to set early binding and late binding.


    Most will code in early binding but post in late binding. There are lots or pros and cons for both methods of binding. I usually post late binding method solutions but if the user takes the time to learn early binding, intellisense alone is worth it since they are just learning. Imagine trying to program using the Excel object if it were not early bound for you already. It would just be a scripting language like vbs then.

Participate now!

Don’t have an account yet? Register yourself now and be a part of our community!