Good day
I have a folder with 120 pdf invoice files, I need to copy data from each file to excel sheet (data shows qty,price, name of product etc. ).
Are there vba codes which can loop through folder and make copies to excel.
Thanks
Good day
I have a folder with 120 pdf invoice files, I need to copy data from each file to excel sheet (data shows qty,price, name of product etc. ).
Are there vba codes which can loop through folder and make copies to excel.
Thanks
Get the content is the biggest part. Do you have Adobe Acrobat? This is not the Reader version. It is the Pro version.
Adobe Acrobat X1; Version 11.0.18
Thanks for your response
The concept is simple.
1. Batch to get filenames: aFFs() does that.
2. Get the content for each pdf. Using Acrobat, ReadAcrobatDocument() does that.
3. Iterate the filenames and insert the content.
Since I don't have Acrobat on this computer nor a sample of your files, I can not finish (3) for you. There are a few ways to do that. e.g. Split() and Range's Resize.
I could show you how to poke the (2) into a txt file. If you attached 1 or 2 of those txt files, I could then finish (3) for you. For now, view the contents of the Immediate Window (Ctrl+G) after you run Main().
To use this, add the Acrobat object. Alt+F11 to open VBE, Tools > References > Acrobat > OK
Change the path for aFFs() to suit.
Run Main().
Sub Main()
Dim a, e, s, r As Range
a = aFFs("C:\Users\lenovo1\Dropbox\Excel\Outlook\emailPDFinvoices\*.pdf")
For Each e In a
s = ReadAcrobatDocument(e)
Set r = Cells(Rows.Count, "A").End(xlUp).Offset(1)
Debug.Print vbLf, e, vbLf, s, vbLf, vbLf
r.Value = s
Next e
End Sub
'http://www.eileenslounge.com/viewtopic.php?f=30&t=5907
'Add reference: Acrobat
Public Function ReadAcrobatDocument(strFileName As String) As String
'Note: A Reference to the Adobe Library must be set in Tools|References!
Dim AcroApp As CAcroApp, AcroAVDoc As CAcroAVDoc, AcroPDDoc As CAcroPDDoc
Dim AcroHiliteList As CAcroHiliteList, AcroTextSelect As CAcroPDTextSelect
Dim PageNumber, PageContent, Content, I, j
Set AcroApp = CreateObject("AcroExch.App")
Set AcroAVDoc = CreateObject("AcroExch.AVDoc")
If AcroAVDoc.Open(strFileName, vbNull) <> True Then Exit Function
' The following While-Wend loop shouldn't be necessary but timing issues may occur.
While AcroAVDoc Is Nothing
Set AcroAVDoc = AcroApp.GetActiveDoc
Wend
Set AcroPDDoc = AcroAVDoc.GetPDDoc
For I = 0 To AcroPDDoc.GetNumPages - 1
Set PageNumber = AcroPDDoc.AcquirePage(I)
Set PageContent = CreateObject("AcroExch.HiliteList")
If PageContent.Add(0, 9000) <> True Then Exit Function
Set AcroTextSelect = PageNumber.CreatePageHilite(PageContent)
' The next line is needed to avoid errors with protected PDFs that can't be read
On Error Resume Next
For j = 0 To AcroTextSelect.GetNumText - 1
Content = Content & AcroTextSelect.GetText(j)
Next j
Next I
ReadAcrobatDocument = Content
AcroAVDoc.Close True
AcroApp.Exit
Set AcroAVDoc = Nothing: Set AcroApp = Nothing
End Function
'3rd party, pdf2text.exe
'http://www.vbaexpress.com/kb/getarticle.php?kb_id=977
'Manual method using Distiller:
'http://dmcritchie.mvps.org/excel/pdf.htm
'http://www.library.mcgill.ca/edrs/Services/publications/howto/PDFtoXLS/PDFtoExcel.html
'Set extraSwitches, e.g. "/ad", to search folders only.
'MyDir should end in a "\" character unless searching by wildcards, e.g. "x:\test\t*
'Command line switches for the shell's Dir, http://ss64.com/nt/dir.html
Function aFFs(myDir As String, Optional extraSwitches = "", _
Optional tfSubFolders As Boolean = False) As Variant
Dim s As String, a() As String, v As Variant
Dim b() As Variant, I As Long
If tfSubFolders Then
s = CreateObject("Wscript.Shell").Exec("cmd /c dir " & _
"""" & myDir & """" & " /b /s " & extraSwitches).StdOut.ReadAll
Else
s = CreateObject("Wscript.Shell").Exec("cmd /c dir " & _
"""" & myDir & """" & " /b " & extraSwitches).StdOut.ReadAll
End If
a() = Split(s, vbCrLf)
If UBound(a) = -1 Then
Debug.Print myDir & " not found.", vbCritical, "Macro Ending"
Exit Function
End If
ReDim Preserve a(0 To UBound(a) - 1) As String 'Trim trailing vblfcr
For I = 0 To UBound(a)
If Not tfSubFolders Then
s = Left$(myDir, InStrRev(myDir, "\"))
'add the folder name
a(I) = s & a(I)
End If
Next I
aFFs = sA1dtovA1d(a)
End Function
Function sA1dtovA1d(strArray() As String) As Variant
Dim varArray() As Variant, I As Long
ReDim varArray(LBound(strArray) To UBound(strArray))
For I = LBound(strArray) To UBound(strArray)
varArray(I) = CVar(strArray(I))
Next I
sA1dtovA1d = varArray()
End Function
Display More
Did you add the acrobat reference?
Yes I have installed the acrobat reference
Seems like that function worked fine for me when I tested it with an installed cdrom Acrobat Pro v9 probably. It might error if the PDF was password protected or not readable I guess.
Maybe try testing with just one pdf file in the folder generated by Excel. I can't test myself until Wednesday.
Some 3rd party programs can probably get the content, if readable. Most are not free and limited even for a 30 day free trial. If you find one that you want to try and it is free, at least for a while, let me know.
I guess you could use "free" online converters. I don't trust those though. e.g. http://pdftotext.com/
If you know vb.net or c#.net, iTextSharp.dll might be able to get the content. I haven't used it in a while so I don't remember if it can or not. I made a lesson on how to use it once.
Thanks a lot for your interest Mr. Kenneth
I have prepared a PDF that is exported from excel so as to test it .. And I have added acrobat reference
But I got the same error
Thanks a lot Mr. Kenneth
I already selected "acrobat" but after clicking "References" I found it changed to "Adobe Acrobat 10.0 Type Library"
I don't think this was the problem as the trick of using the CSTR is working well and now it is importing data from the PDF file to the excel file ..
Another point: All the data is imported in A2 only .. How can the data be imported .. each piece of data into a cell?
As I said in post #4, I would need the data to know what would be the best way to parse it. See code at end to see how to create those txt files..
If you like, ofuscate the content in the text file(s) and post the smallest and largest text file. Or, record a macro to import one or two of the txt files and post that. If needed, I can then show you how to automate it. If all of the data was structured consistently, parsing is more static and easier to code. Part of proper parsing is to know the data type for the columns. I suspect that your data won't be structured consistently.
As you can see, the txt files will save to your temp folder (Win+R, %temp%, OK) named 1.txt, 2.txt, etc.
Sub Main()
Dim a, e, s$, r As Range, i As Long, p$, fso As Object
a = aFFs("C:\Users\lenovo1\Dropbox\Excel\Outlook\emailPDFinvoices\*.pdf")
p = VBA.Environ$("temp") & "\"
Set fso = CreateObject("Scripting.FileSystemObject")
For Each e In a
s = ReadAcrobatDocument(CStr(e))
i = i + 1
fso.createtextfile(p & i & ".txt").writeline s
Set r = Cells(Rows.Count, "A").End(xlUp).Offset(1)
Next e
Set fso = Nothing
End Sub
Display More
Thank you very much for great and awesome assistance Mr. Kenneth
Best Regards
Thanks MR. Hobson
See attachment of sample pdf file
I am not sure that a copy and paste from the pdf would be the same as the text from acrobat which was the point of post #12. The copy paste and import had 12 columns, A:L in row 2 basically. The last column, L2, had several parts to it.
Mr. Hobson
again thanks for all your help.
What would be the best way of getting the text from sample file “Copy Pall Corp Inv 2”(In Yellow) into excel .
For example if the invoice had 12 line of item name, price etc. would it be possible to get each line into 12 rows with columns for items, price etc. The Purchase order number would be repeated 12 time in one column next to each row.
The current code copy the file contents into one cell in Excel.
As I said, the data has to parsed since it is simply a long string. I need the content to see what would work best. Ergo, my request for the acrobat content from the text files created in my last macro post. There is no sense parsing a copy/paste of the content for the PDF that you attached as it may vary from the acrobat content.
There are several methods one can use to parse data. I might parse by location/position, key words, etc.
So, from your PDF file, what part(s) did you want? You marked in yellow highlight in the PDF. Can you make an xlsx with columns and values below from the PDF that you want? Would all the PDF's be structured the same? Obviously, one would think so...
Too bad your PDF is not an FDF, form. Those data fields are more readily imported.
Mr. HOBSON
I save 2 pdf invoices to excel ; see tabs (“Save to excel Many” and “Save to excel one”).
The invoices are all the same format in PDF
The tab “Ideal data’ is how I would love my final data in excel to look.
All PDf invoice in folder would be summarize in one excel tab; showing.
PO#
Invoice #
Invoice date
Line
Item code
Product
Ordered
Shipped
Balance
Price
Total
THANKS
I did not forget about you. It may be a day or three before I get to this and hopefully get something close for that one file anyway. It just depends on how it goes when I start.
I was able to get the content as shown in the sheet 1.pdf. From here, it is a matter of figuring out how to parse it. Many on the forums should know how to do that. The hard part is the table as it is not all structured the same. I can envision that some Description columns might only have one line or maybe more than two per item's "row". The other fields are easy to parse.
For giggles, I tried the FDF field method but no luck there.
Thanks for all your help Mr Hobson
What about looping through folder with PDF files save as word files,,, then loop through word files and copy tables to excelrow by row, column by column.
The below code does not work... the attempt here is to loop through folder with word docx and copy tables to excel tab. any help on fixing code is appreciated.
[VBA]Sub ZxZXZXZOpnWord()
Dim WordDoc As String
Dim WdDoc As Variant
Dim oAPP As Object
Dim tableStart As Integer
Dim TableNo As Integer 'table number in Word
Dim wdhrd As Word.Document
path = "m:\FmmPDFtest\"
file = Dir(path & "*.docx")
Set oAPP = CreateObject(Class:="Word.Application")
oAPP.Visible = True
WdDoc = oAPP.Documents.Open(path & file)
With WdDoc
TableNo = WdDoc.tables.Count
tableTot = WdDoc.tables.Count
resultRow = 4
For tableStart = 1 To 1 ' tableTot
With .tables(tableStart)
'copy cell contents from Word table cells to Excel cells
For iRow = 1 To .Rows.Count
For iCol = 1 To .Columns.Count
Cells(resultRow, iCol) = WorksheetFunction.Clean(.cell(iRow, iCol).Range.Text)
Next iCol
resultRow = resultRow + 1
Next iRow
End With
resultRow = resultRow + 1
Next tableStart
End With
End Sub[/VBA]
Don’t have an account yet? Register yourself now and be a part of our community!