combine 1000's of csv files and append Lic#, filename and date to each row

  • I have approx 15,000 csv files that I need to combine into a single csv file (or worksheet)


    The filenames look like this:
    40259395_Consumer Portfolio Services Inc _Displaying records 1 through 10 of 37_051012.csv
    40259395_Consumer Portfolio Services Inc _Displaying records 11 through 20 of 37_051012.csv
    40259395_Consumer Portfolio Services Inc _Displaying records 21 through 30 of 37_051012.csv


    The csv file contents contain a header row followed by a maximum of 10 data rows like this;

    Displaying records 1 through 10 of 37 Next 1020304050Full NameLicense NumberLicense Type
    Ayuyao , Ma.Micaela Irene Bennette 40276870 Sales Agent
    Azul , Carlo Magno II 40175001 Debt Collector
    Bingham , Martin 40288241 Sales Agent
    Bruce , Carol
    Burgins , Colin 40288215 Sales Agent
    Burgins , Imelda Michelle Grace 40283925 Sales Agent
    Coker , Marlon 40288270 Sales Agent
    Coleman , Lona 20402350 Sales Agent
    Collins, Jr. , William 40166027 Sales Agent
    Cox , Ta'shalanda 40288245 Sales Agent


    I would like to combine the csv files while appending the licence# and name of the company contained in the filename along with date contained at the end of the filename to each record.
    I have tried using Ron DeBruin’s code (http://www.rondebruin.nl/csv.htm) for combining csv files into an xls file, however, it has trouble with the header row in each file and prepends it to the last record in the previous file. Also, from what I can see his code is appending each file as whole using the windows file copy so appending the company name and date to each row is not an option. I have viewed many posts on combining files but nothing quite like this.

    Misc
    99.9% of the rows/records contain only 1 comma and no other punctuation.
    99.9% of the rows/records contain a name, lic# and type – if the lic#/type is missing, the row can be ignored or dropped
    Appending the company name is not critical as long as the company lic# is appended.

    If this is too big a project I’ll understand, but if that is the case if someone could please tell me how to delete the first row in every csv file that would be a huge help. If it could delete the first row AND insert the file name that would be even better as I could then run Ron DeBruin's code on the resulting files and then use some basic excel formulas to grab the details I need to populate the rows. Thanks!

  • Re: combine 1000's of csv files and append Lic#, filename and date to each row


    It is unclear which part is the name/Lic#/Lic type.etc.
    Can you upload your desired result from your data provided?

  • Re: combine 1000's of csv files and append Lic#, filename and date to each row


    Quote from jindon;608453

    It is unclear which part is the name/Lic#/Lic type.etc.
    Can you upload your desired result from your data provided?


    Hi Jindon, thank you for taking an interest.


    The layout of the CSV files, after the first row, is;


    Surname, FirstName(s) Licence# LicenceType


    Ex. Bingham , Martin John 40288241 Sales Agent


    The desired output is


    Surname, FirstName(s) Licence# LicenceType CompanyLic# Company Name Date


    Ex. Bingham , Martin John 40288241 Sales Agent 40259395 ABC Corp 01/01/2012


    I have attached an example to illustrate as well. Thank you.

  • Re: combine 1000's of csv files and append Lic#, filename and date to each row


    try

  • Re: combine 1000's of csv files and append Lic#, filename and date to each row


    Hi Jindon, that worked perfectly! Thank you so much!! This will be useful in so many ways. All the best to you!!


    If it's not asking too much, could you please give me a brief description of what the regexp expression is doing? I have been trying to figure it using some regexp boards I found but so far no luck.


    Thanks again!

  • Re: combine 1000's of csv files and append Lic#, filename and date to each row


    Code
    .Pattern = "([^,]+),(([^,]+),)? *([ \D]+)(\d+)(.*)"


    [^,]+ = charcters except comma
    ()? = none or one of character set inside the brackets
    * = one or more space
    [ \D] = space and non-numeric characters
    \d+ = numeric characters
    .* = any characters


    When you use backward reference to the submatches, count the open bracket from the left and the index will start from 0 for submatches property and 1 for the Replace method.


    Does this help?

Participate now!

Don’t have an account yet? Register yourself now and be a part of our community!