Help Extracting Data From String

  • Hello,


    I am having a problem working a formula to get the data I need. I have attached a test spreadsheet.

    The raw data comes in on column A and I need the values between the <value> and </value> text for each cell. There can be up to 5 <value> and </value> entries in each cell of raw data so I need up to 5 columns of values for each raw data entry.


    My approach -

    Figure out how many value and /value entries were in each cell of data which is my column G

    Then figure out in which potential columns there was value and /value data in - For instance in cell A9 there is data between the values for Invoiced and Hours so I need blank cells in R9, T9, and V9, 20 in cell S9, and 86 in cell U9.

    So my idea was to get find the location of the first <value> and first </value> by using Find in M9 and N9 and then I was going to attempt to use Left or some combo using left, right, mid, ect to get the 20. However 20 is 2 characters and there is a 10 character difference in the values in M9 and N9 and the entries will never have a fixed number of characters it could be anything between 2 and 40 characters.


    I hope I have made it somewhat clear of what I need help with. I know my skill level and I will need help with how to extract the first 2 potential value and /value combos, from there I am sure I can figure out how to get the value in the remaining combos if there is data to extract.


    roster test.xlsx


    Thank you in advance.


    Grant

  • Hi Grant,


    I think I came up with something that will work.


    roster test JR.xlsx


    My approach was very similar, except I did not try to find the Nth pair of <Value> and </Value>. Instead, I found the NEXT <Value> and </Value> pair, appearing after the custom field's name in the string.


    I used the SEARCH function's optional third argument, [start_num], to begin searching for "<Value>" part-way through the string. This skips any instances of "<Value>" and "</Value>" related to other custom fields, that are closer to the beginning of the string. From there, I plugged the positions into the MID function to retrieve the data.


    I was able to use the same formula across B6:F32, to fill in the data for the 5 fields. The formula is fairly difficult to parse, so I added a walkthrough (highlighted yellow), which describes the "helper columns" I used to arrive at the final formula.


    I hope this helps!


    Thank you,

    James

Participate now!

Don’t have an account yet? Register yourself now and be a part of our community!