Copy text from a opened PDF File to Excel

  • Hi,
    Im trying to copy text from a PDF File to an excel workbook, I have managed to open the PDF File with a userform the user selects the file to open using Application.GetOpenFilename then I use a Function I found to open the PDF file, so far so good, but then when I try to use sendkeys to select all and copy it doesnt work, this file ( Macro ) is going to be used on several computers on several places so I dont know the path of the program used to open the PDF File, any ideas on how to select and copy the text from the PDF File ? Thanks in advance !


    Code
    Public Bitacora
    Dim Bitacora2 As String
    Sub Bitacora_a_Excel()
    Application.ScreenUpdating = False
    UserForm1.Show
    Bitacora2 = Bitacora
    Call OpenAnyFile(Bitacora2)
    Application.Wait (Now + TimeValue("0:00:10"))
    SendKeys ("^a")
    SendKeys ("^c")


    This if the code in the user form


  • Hi Kenneth thaks for your reply, the pdf files that I´m trying to select and copy are all text and have the same format, if I open manually one of these PDF with Adobe Reader I can select all and copy using CTR + A and CTRL + C, but I want to automatize these process for the final user, by the way I cant use a third party program.


    This is the function I use to open the PDF file, but after the file is opened I cant select and copy, the sendkeys ("^a") select the text (code) inside vba instead of selecting the text in the PDF file, watching your code on the other thread I see or at least I think that I need the path of the adobe reader, but the thing is that this excel file is going to be used by several final users on several computers and I dont know the path of their adobe readers, it is posible to track or identify the path after the PDF file is opened, after the "objshell.open" line ? so I can use it on the code of the other thread.


    Code
    Function OpenAnyFile(strPath As String)
      Set objshell = CreateObject("Shell.Application")
      objshell.Open (strPath)
      End Function


    Regards Alex

  • I've done some PDF extraction in the past and it can get tricky.


    As you've indicated you can use SendKeys but that works within Excel rather than Acrobat.


    The sequence of events you need are:
    1. Launch the PDF file (open it)
    2. Activate the opened PDF file
    3. SendKeys to highlight and copy all
    4. Re-activate the Excel workbook that will store the pasted content
    5. Dump the copy into an Excel sheet (presumably)
    6. Close the PDF file


    To launch the PDF file, I use

    Code
    ActiveWorkbook.FollowHyperlink "\path\filename.pdf", NewWindow:=True


    To activate the PDF file gets a little trickier as it depends on whether Acrobat Reader or Acrobot Pro is installed.


    I have used something like the following to determine if Acrobat Reader or Acrobat Pro is installed. This seems to work OK with the later versions of Acrobat.



    Once the PDF file is activated the SendKeys commands can be used.
    Again, this can get tricky. If the Page Display in Acrobat is set by default to Enable Scrolling, then Ctrl+A will highlight the entire file, otherwise only the active page.
    One way around this is to send the Shift+Ctrl+H key sequence to put acrobat into auto-scroll mode.


    Then send the Ctrl+A and finally Ctrl+C key commands.


    Timing can get tricky too so best to put some wait commands in.


    I use the following code to close the PDF file, which again requires using the FindWindow code.


    To put that altogether, you will also need to know the caption of the workbook where you want to paste the data in order to re-activate the window.


    To get the window caption, use this on the workbook you want to paste data to:

    Code
    sCaption = wbPDF.Application.Caption


    wbPDF is the workbook object for where the data will be pasted into


    Putting that altogether is along the following lines:


    The JustFileName code is as follows:

    Code
    'returns just the filename from the fully qualified filename
    Function JustFileName(sFullPath As String) As String
      JustFileName = sFullPath
      If InStrRev(sFullPath, "\") = 0 Then Exit Function
      
      JustFileName = Mid(sFullPath, InStrRev(sFullPath, "\") + 1)
    End Function



    As with all code involving esoteric functions that manipulate other applications and SendKeys, this may or may not work for you.


    I have used code similar to this on a limited basis and it seems to work on the specific files I have needed it for and using the specific version/s of Acrobat I've been able to test it on.

Participate now!

Don’t have an account yet? Register yourself now and be a part of our community!