Determine column labels based on received data

  • I receive half a dozen text files each day containing 38 to 51 columns of data but no column labels. They are csv files saved with a .txt extension so excel will open them with the import text wizard. The column labels are provided later via email, although sometimes not at all.


    Columns 1-38 never change and are not an issue. Columns 39 - 51, however, can change in both content and order.


    My goal is to develop a script that can look at the data in the columns and either assign the appropriate column label or, ideally, move the data into the appropriate column in the 'master' data table (“Table2”).


    The variable columns (39-51) can be any of the following;


    2. Account Number - (1 to 16 digits, always arrives in text format, never blank)
    10. Home Phone# - (messy, often blank, may contains alpha chars)
    38. Sub Product ID - (3 to 7 alpha characters, can be blank)
    Append Score 7 Value - (2 digits, often blank)
    Last Worked Date - (never blank, always in 08/02/2017 4:08:23 PM format or “0000-00-00 00:00:00”)
    Last Worked By - (often blank, 9 alpha chars, 1st 3 chars always the same)
    1st Additional phone - (messy, often blank and/or contains alpha chars as well)
    2nd Additional phone - (messy, often blank and/or contains alpha chars as well)
    Email Address 1 - (often blank else a typically formatted email address)
    Email Address 2 - (often blank else a typically formatted email address)
    17. Assigned Balance - (never blank, always in proper number format with 2 decimals) This column also has the distinction of always being the last column when it is present.

    I have attached a sample file containing the raw data as it is received on the first tab and with the headers on the 2nd tab. I know this is a big ask but if anyone can at least provide some suggestions for how to approach it, I would greatly appreciate it. Basically, I’m looking for anything that can tell me ‘this column contains phone numbers” vs “this column contains dates and this one is the balance because it’s a number with 2 digits and never blank” etc. Any suggestions appreciated!

Participate now!

Don’t have an account yet? Register yourself now and be a part of our community!