Copying content from web page creates error with or without .Document

  • I have looked at over 50 posts through various forums, working through at least 5-6 examples, and continue to get an error.

    Of those one was: Pull Web Page Into Worksheet


    I would be happy if any web browser worked, so if you have an existing code that would make your response easier with a preferred browser, don't let my first attempts with IE deter you.


    The main barrier that I predict is that it has multiple pause points and differing url's for them before the content I want shows up. It's an intranet site but I was able to get it to still pull in a cell from the final site but not .


    URL 1 is an intranet search engine, enter value into input box, click and it sends you to URL 2 (which has search string added on) but is just progress bars, URL 3 (with unique tail each search) is now loaded with the table (or whole page is fine) that I need copied.


    They follow this formatting:

    URL 1 --- https://abc.XYZ.org/.../v12/

    URL 2 --- https://abc.XYZ.org/.../cgi/..…engines=Bing&flags=genius

    URL 3 --- https://abc.XYZ.org/.../v12//main.html?path=110746_29570


    So I've attempted where I control IE and enter in the input box and click as well as where I just create the URL 2 as a search string. I can get URL 3 to come up as expected but idk if it's because it's a new URL (not new window) it's not seeing it as the active object to copy from?


    I've tried so many things outside of code as well, clearing cache/cookies, thinking maybe I don't have the right reference library I am now up to this mess...



    I have reviewed concepts of capturing a url from an existing window, copying by tags and class, etc. But I keep getting errors like:


    Method 'Busy' of object 'IWebBrowser2' failed, or if I lead it with Document then that is the method it stops on.


    Do While objIE.Document.ReadyState <> 4 Or objIE.Busy = True: DoEvents: Loop

    Do Until objIE.Document.ReadyState = 4: DoEvents: Loop


    Automation error


    Unspecificed error



    I cannot add extensions to Chrome as they are blocked by admin as well as cannot install things or I'd attempt to get a macro recorder for my browser.



    Thank you for all your help.



    I'll paste code here, but since I can't give you the site, idk how much use it will be other than to see some of the different approaches I tried. Here's a blank workbook with the code inside: CopyWebContent.xlsm


    This site may be an ok surrogate for URL 3: http://disclosure.bursamalaysi…Access/viewHtml?e=2891609




  • Have you attempted any scraping with Power Query?


    External Content www.youtube.com
    Content embedded from external sources will not be displayed without your consent.
    Through the activation of external content, you agree that personal data may be transferred to third party platforms. We have provided more information on this in our privacy policy.

  • Thanks, Alan! I was digging into this a bit since my post. And I'm 90% sure if I can get past this ready state issue, capturing the newest url, and possibly the credential requirements, I won't have any trouble at all getting capturing I need from the URL 3... Ok, maybe that's still a lot to tackle.


    Also in my perusing solutions I also appreciated from Pike regarding pulling the table:

    VBA retrieve Internet Explorer HTML Web Table


    I guess I missed the boat on realizing there is a "Get Data" ... "From Web" option under the Data tab. Obvious to those who it is obvious to, but not those of use who all the options in the ribbon still can kind of blur together. For other newbs like myself, this is how I got some assist in recording a macro for the finer details.

  • Okay, so I feel like I've tried a bajillion things and was attempting to think outside the box so I could at least try my other code.


    Almost there:

    ready state issue, found workaround

    capturing the newest url, got it

    credential requirements, non applicable



    So this works well enough, and with my intranet urls it can pull it up the first time and it's fine so I know it's not a credentials thing.


    I put this code with the surrogate URL.


    This portion:

    Code
    "OLEDB;Provider=Microsoft.Mashup.OleDb.1;Data Source=$Workbook$;Location=""http://disclosure.bursamalaysia.com/FileAccess/viewHtml?e=2891609"";Extended Properties="""""), Destination:=Range("$A$11")).QueryTable


    With mine being like this:

    URL 1 --- https://abc.XYZ.org/.../v12/

    URL 2 --- https://abc.XYZ.org/.../cgi/..…engines=Bing&flags=genius

    URL 3 --- https://abc.XYZ.org/.../v12//main.html?path=110746_29570

    Then gets split:


    Code
    "OLEDB;Provider=Microsoft.Mashup.OleDb.1;Data Source=$Workbook$;Location=""https://abc.XYZ.org/.../v12//main.html?path=" _
    , "=110746_29570"";Extended Properties="""""), Destination:=Range("$A$11")).QueryTable


    Which actually could be cool if it recognized strings. What am I missing that it won't allow me to put a string (URLtail) here or any of the other portions where it is putting entire URLs? If I take out any quotations it gets immediately upset expecting a list separator.


    Thanks again for your thoughts!


    Code
    "OLEDB;Provider=Microsoft.Mashup.OleDb.1;Data Source=$Workbook$;Location=""https://abc.XYZ.org/.../v12//main.html?path=" _
    , "URLtail"";Extended Properties="""""), Destination:=Range("$A$11")).QueryTable





Participate now!

Don’t have an account yet? Register yourself now and be a part of our community!