PDF Adobe copy to excel

  • Good day


    I have a folder with 120 pdf invoice files, I need to copy data from each file to excel sheet (data shows qty,price, name of product etc. ).


    Are there vba codes which can loop through folder and make copies to excel.


    Thanks

  • The concept is simple.


    1. Batch to get filenames: aFFs() does that.
    2. Get the content for each pdf. Using Acrobat, ReadAcrobatDocument() does that.
    3. Iterate the filenames and insert the content.


    Since I don't have Acrobat on this computer nor a sample of your files, I can not finish (3) for you. There are a few ways to do that. e.g. Split() and Range's Resize.


    I could show you how to poke the (2) into a txt file. If you attached 1 or 2 of those txt files, I could then finish (3) for you. For now, view the contents of the Immediate Window (Ctrl+G) after you run Main().


    To use this, add the Acrobat object. Alt+F11 to open VBE, Tools > References > Acrobat > OK
    Change the path for aFFs() to suit.
    Run Main().

  • Seems like that function worked fine for me when I tested it with an installed cdrom Acrobat Pro v9 probably. It might error if the PDF was password protected or not readable I guess.


    Maybe try testing with just one pdf file in the folder generated by Excel. I can't test myself until Wednesday.


    Some 3rd party programs can probably get the content, if readable. Most are not free and limited even for a 30 day free trial. If you find one that you want to try and it is free, at least for a while, let me know.


    I guess you could use "free" online converters. I don't trust those though. e.g. http://pdftotext.com/


    If you know vb.net or c#.net, iTextSharp.dll might be able to get the content. I haven't used it in a while so I don't remember if it can or not. I made a lesson on how to use it once.

  • Your reference should simply be, Acrobat.


    I guess in your testing that you changed the value for aFFs()? The one in the test file was my path.


    Lets' go ahead and change the passed variable type.

    Code
    s = ReadAcrobatDocument(cstr(e))
  • Thanks a lot Mr. Kenneth
    I already selected "acrobat" but after clicking "References" I found it changed to "Adobe Acrobat 10.0 Type Library"
    I don't think this was the problem as the trick of using the CSTR is working well and now it is importing data from the PDF file to the excel file ..


    Another point: All the data is imported in A2 only .. How can the data be imported .. each piece of data into a cell?

  • As I said in post #4, I would need the data to know what would be the best way to parse it. See code at end to see how to create those txt files..


    If you like, ofuscate the content in the text file(s) and post the smallest and largest text file. Or, record a macro to import one or two of the txt files and post that. If needed, I can then show you how to automate it. If all of the data was structured consistently, parsing is more static and easier to code. Part of proper parsing is to know the data type for the columns. I suspect that your data won't be structured consistently.


    As you can see, the txt files will save to your temp folder (Win+R, %temp%, OK) named 1.txt, 2.txt, etc.

  • I am not sure that a copy and paste from the pdf would be the same as the text from acrobat which was the point of post #12. The copy paste and import had 12 columns, A:L in row 2 basically. The last column, L2, had several parts to it.

  • Mr. Hobson
    again thanks for all your help.



    What would be the best way of getting the text from sample file “Copy Pall Corp Inv 2”(In Yellow) into excel .


    For example if the invoice had 12 line of item name, price etc. would it be possible to get each line into 12 rows with columns for items, price etc. The Purchase order number would be repeated 12 time in one column next to each row.


    The current code copy the file contents into one cell in Excel.

  • As I said, the data has to parsed since it is simply a long string. I need the content to see what would work best. Ergo, my request for the acrobat content from the text files created in my last macro post. There is no sense parsing a copy/paste of the content for the PDF that you attached as it may vary from the acrobat content.


    There are several methods one can use to parse data. I might parse by location/position, key words, etc.


    So, from your PDF file, what part(s) did you want? You marked in yellow highlight in the PDF. Can you make an xlsx with columns and values below from the PDF that you want? Would all the PDF's be structured the same? Obviously, one would think so...


    Too bad your PDF is not an FDF, form. Those data fields are more readily imported.

  • Mr. HOBSON



    I save 2 pdf invoices to excel ; see tabs (“Save to excel Many” and “Save to excel one”).
    The invoices are all the same format in PDF
    The tab “Ideal data’ is how I would love my final data in excel to look.
    All PDf invoice in folder would be summarize in one excel tab; showing.


    PO#
    Invoice #
    Invoice date
    Line
    Item code
    Product
    Ordered
    Shipped
    Balance
    Price


    Total



    THANKS

  • I did not forget about you. It may be a day or three before I get to this and hopefully get something close for that one file anyway. It just depends on how it goes when I start.


    I was able to get the content as shown in the sheet 1.pdf. From here, it is a matter of figuring out how to parse it. Many on the forums should know how to do that. The hard part is the table as it is not all structured the same. I can envision that some Description columns might only have one line or maybe more than two per item's "row". The other fields are easy to parse.


    For giggles, I tried the FDF field method but no luck there.

  • Thanks for all your help Mr Hobson


    What about looping through folder with PDF files save as word files,,, then loop through word files and copy tables to excelrow by row, column by column.


    The below code does not work... the attempt here is to loop through folder with word docx and copy tables to excel tab. any help on fixing code is appreciated.


    [VBA]Sub ZxZXZXZOpnWord()
    Dim WordDoc As String
    Dim WdDoc As Variant
    Dim oAPP As Object
    Dim tableStart As Integer
    Dim TableNo As Integer 'table number in Word
    Dim wdhrd As Word.Document
    path = "m:\FmmPDFtest\"
    file = Dir(path & "*.docx")


    Set oAPP = CreateObject(Class:="Word.Application")
    oAPP.Visible = True
    WdDoc = oAPP.Documents.Open(path & file)




    With WdDoc


    TableNo = WdDoc.tables.Count


    tableTot = WdDoc.tables.Count


    resultRow = 4


    For tableStart = 1 To 1 ' tableTot
    With .tables(tableStart)
    'copy cell contents from Word table cells to Excel cells
    For iRow = 1 To .Rows.Count
    For iCol = 1 To .Columns.Count
    Cells(resultRow, iCol) = WorksheetFunction.Clean(.cell(iRow, iCol).Range.Text)
    Next iCol
    resultRow = resultRow + 1
    Next iRow
    End With
    resultRow = resultRow + 1
    Next tableStart
    End With


    End Sub[/VBA]

Participate now!

Don’t have an account yet? Register yourself now and be a part of our community!