Extract URLs from website, name of attachment, and save attachment to a folder $80.00

  • I need an Excel file that contains the below:


    UPDATE: #1 and #2 can be on the same button if that is easier for coding purposes; that is, when the code that extracts the URLs occurs, it might be easier to grab the file name at that time.


    1. A button with code that extracts URLs from a given website that is entered in a cell. I only need URLs that link to an attachment (e.g., .PDF, .XLS, .XLSX, JPEG, .MSG, .DOC, .DOCX, etc.). I might not have listed all the data types that I might encounter, but I need the Excel file to extract URLs for all data types from a given website.


    2. A button with code that extracts the name of the File Name (with extension) as displayed on the website (e.g., the example file that I have posted on this post is named Download Internet Files Automatically - User-Defined File Names - SAMPLE.XLSM and that is file name that I want updated to the Excel file).


    3. A button with code that will save the attachment files (from the website) to a folder that will be selected by the user (see Download Files button in the attached example file).


    4. A button to Clear URLs and File Names (see Clear button in the attached example file).


    5. A button to Select a directory (see Select Folder button in the attached example file).


    The attached file is ONLY an example to help convey what I need; please create a new Excel file but with a similar layout. The attached example file pulls all URLs even if the URLs are NOT tied to an attachment; so, that is NOT what I want in the Excel file I am requesting because I ONLY want URLs related to an attachment as mentioned above in #1. The attached example currently only works for PDF files and not other file types as it did not save correctly a PowerPoint file that I tested. The attached example file and code is from a file I found online and the code was created by Christos Samaras.

  • Re: Extract URLs from website, name of attachment, and save to a folder $80.00


    Ok. Was hoping to have it within in a day, but I can be flexible if it takes longer.


    Quote from S M C;766081

    I will look at this. How much time do you have?

  • Re: Extract URLs from website, name of attachment, and save to a folder $80.00


    I would like it sooner than Sunday; I would like it no later than Thursday or Friday. Thanks.


    Quote from S M C;766088

    How flexible is 'longer'. Can you wait till Sunday?

  • Re: Extract URLs from website, name of attachment, and save attachment to a folder $8


    Some questions.


    1. Exactly which website do you want this to work with?
    2. Is it only this one website you need it to work with, or do you need it to work with other websites as well?
    3. For a/the given website, do you want it to scrape URLs and download files for ONLY that specific website URL, or is the given website URL a 'starting point' and you want to scrape URLs and download files for all sibling pages and child pages as well?

  • Re: Extract URLs from website, name of attachment, and save attachment to a folder $8


    1. I will be working with multiple websites, but the websites will only be accessible for a user once they log into the main website. I cannot disclose which websites due to privacy protections.


    2. I need it to work with other websites as well - websites that will potentially have attachments. For example, this website (with my post) contains the attachment file named Download Internet Files Automatically - User-Defined File Names - SAMPLE.xlsm.


    3. For a/the give website, I want the Excel application to download files ONLY for that specific website URL.


    Thanks


    Quote from John_w;766127

    Some questions.


    1. Exactly which website do you want this to work with?
    2. Is it only this one website you need it to work with, or do you need it to work with other websites as well?
    3. For a/the given website, do you want it to scrape URLs and download files for ONLY that specific website URL, or is the given website URL a 'starting point' and you want to scrape URLs and download files for all sibling pages and child pages as well?

  • Re: Extract URLs from website, name of attachment, and save attachment to a folder $8


    UPDATE: #1 and #2 can be on the same button if that is easier for coding purposes; that is, when the code that extracts the URLs occurs, it might be easier to grab the file name at that time.


  • Re: Extract URLs from website, name of attachment, and save attachment to a folder $8


    Quote from RG26;766132

    1. I will be working with multiple websites, but the websites will only be accessible for a user once they log into the main website. I cannot disclose which websites due to privacy protections.


    Multiple websites means your requirement is too open-ended for me to proceed with this work, and the code to handle a specific website would probably have to be specific to that website, i.e. not generic to all websites. Plus the log in aspect makes it potentially more difficult, if you want the code to do the log in.


    Therefore open to other developers.


    PS you might want to look at https://www.httrack.com/, which with suitable filters and settings should be able to handle your request, including a command line which should be callable from Excel VBA. Can't really help with this because it's many years since I used the HTTrack UI and never from Excel.

  • Re: Extract URLs from website, name of attachment, and save attachment to a folder $8


    Ok, thanks for the info. I do not want the application to deal with the log in part because the users will already be logged on, and I also do not want user names or passwords in the code. The current example I attached currently pulls all URLs from any website; however, it pulls all URLs and not only the URLs associated with an attachment so I am hoping someone very familiar with connecting to websites via VBA can create the Excel application that I requested.


    Quote from John_w;766140

    Multiple websites means your requirement is too open-ended for me to proceed with this work, and the code to handle a specific website would probably have to be specific to that website, i.e. not generic to all websites. Plus the log in aspect makes it potentially more difficult, if you want the code to do the log in.


    Therefore open to other developers.


    PS you might want to look at https://www.httrack.com/, which with suitable filters and settings should be able to handle your request, including a command line which should be callable from Excel VBA. Can't really help with this because it's many years since I used the HTTrack UI and never from Excel.

  • Re: Extract URLs from website, name of attachment, and save attachment to a folder $8


    I dont know if you are still looking for an answer however as previously stated the problem is (to quote your example) even though the link for your sample workbook has the narration Download Internet Files Automatically - User-Defined File Names - SAMPLE.xlsm the actual link is "attachment.php?attachmentid=68388&d=1456280246". As a result it is very difficult to parse the html in a dynamic website such as Ozgrid. In a static page you would have more luck as most likely the anchor links would refer to the true filename. In the case of (for example this Ozgrid page) you would have to use the HTML DOM and use node attributes to get the file name which is the inner text of the node. As John_w said to create a general solution would be extremely difficult as there is so many ways to write a webpage. If you had a set of web pages and by some fluke the html code is similar then you could write a solution - even regular expressions would be an option (albiet in my opinion not the best option).
    If you were to inspect the source for the pages your want to parse to get filenames and find they are similar then perhaps you could get a solution. Not sure how far you want to pursue it though. It is an interesting problem in that it is a challenge. I think the best way myself is to use the DOM, get all the nodes by tag (anchor) and hope the inner text contains the filename proper. It is easy to modify your sample workbook to do the latter - however whether this would work on the pages you are interested in cannot be know unless you allowed the developer to take a look at the source.
    One final option is to take the program John_w provided a link for and then parse the downloaded directories from your hard drive for the file extensions of interest and read it back into an Excel workbook......

    Regards
    [SIZE=3]Anthony
    [/SIZE]&WCF_AMPERSAND[SIZE=3]
    [/SIZE]&WCF_AMPERSAND&WCF_AMPERSAND&WCF_AMPERSAND[SIZE=2]You have your way. I have my way. As for the right way, the correct way, and the only way, it does not exist.[/SIZE]




Participate now!

Don’t have an account yet? Register yourself now and be a part of our community!