Web Scraping city code question, specific site

  • I can't for the life of my re-code a custom project of ours to deal with changes on the following website. I'm not 100% sure what the developer was going after in the code below, but maybe the site changed?? I removed part of the code that used to attempt to scrape whether the number is a cell or landline that doesn't appear to be available any longer. I hope I didn't mess anything up.


    The code doesn't error. It's just that the result is ALWAYS the same data = "Call" << That's what it scrapes for every record now.


    The following function queries a page like this https://check-caller.net/1-240505-0-index, from check-caller.net.




    Using the html code there's two possible areas to scrape the data I need. (CITY)


    1) <head>
    <title>240-505-00## Silver Spring &#x2714; CALLER ID &#x2714; CALLER NAME check-caller.NET</title>


    2)<div class="entry-content">
    You want to know who called you? You want to know if the number 240-505-00## is safe or unsafe? check-caller.NET is a Real Time Directory with Caller ID lookup: White Pages & Yellow Pages. Add your feedback and your Voting about your call, please!
    <br>Call from Silver Spring, MD / Maryland <div class="center">



    If anyone knows the code to scrape the city from one or both locations above, please let me know. :)

  • Re: Web Scraping city code question, specific site


    The code below extracts the city from the <title> tag within the <head> tag, and is based on a new discovery when assigning the responseText from a XMLhttpRequest to a HTMLDocument.


    I'll leave it to you to incorporate the above method into your code.


    Analysis of two different methods of assigning the responseText string from a XMLhttpRequest to a HTMLDocument

    1. The usual method, using early binding of the HTMLDocument object, is:


    Code
    Dim HTMLdoc As HTMLDocument  'Or As Object
        Set HTMLdoc = New HTMLDocument
        HTMLdoc.body.innerHTML = XMLHttpRequest.responseText

    However, because that is assigning the responseText to the body element it results in an empty <head> tag and the <title> tag is lost:


    HTML
    <HTML><HEAD></HEAD>
    <BODY>


    You might think that assigning the responseText to the parent of the body element, i.e. the main html element itself, would work:


    Code
    HTMLdoc.body.parentElement.innerHTML = XMLHttpRequest.responseText
    'Or
    HTMLdoc.DocumentElement.innerHTML = XMLHttpRequest.responseText

    but both give the error "Could not set the innerHTML property. Invalid target element for this operation."


    2. Another method of assigning the responseText to a HTMLDocument uses late binding of the HTMLDocument object and the Write method:


    Code
    Dim HTMLdoc As Object
        Set HTMLdoc = CreateObject("HTMLfile")
        HTMLdoc.Open
        HTMLdoc.write XMLHttpRequest.responseText
        HTMLdoc.Close

    This time the <head> and <title> tags are fully populated in the HTMLDocument object:


    HTML
    <!DOCTYPE html>
    <html lang="en">
    <head>
    <title>240-505-00## Silver Spring &#x2714; CALLER ID &#x2714; CALLER NAME check-caller.NET</title>
    <meta name="description" content="Find out who owns the number 240-505-00## and some special information &#x2714; Click here to find out who calls you." />
    :
    :
    </head>

    and we can extract the title tag text and parse the city from it.

  • Re: Web Scraping city code question, specific site


    Thanks so much for trying John! I hope it can help others down the road that are further ahead than myself. :)


    I couldn't even get my URL query string to work without a syntax or object defined error (i.e. querys = "http://www.check-caller.com/1-" & ThisWorkbook.Sheets("PhoneToCity").Cells(x, 4).Value & ThisWorkbook.Sheets("PhoneToCity").Cells(x, 5).Value & "-" & ThisWorkbook.Sheets("PhoneToCity").Cells(x, 7).Value & "-index")


    I tried every combination I could think of multiple double quotes, single quotes and &s. :(


    After giving up and using the fixed URL you used in your example I get (run-time error '1004': Application-definded or object defined error) when I try to get city into the worksheet using... ThisWorkbook.Sheets("PhoneToCity").Cells(x, 11).Value = city


    It's probably all for not anyway, as the only way I could even get it to begin execution was to change the name from a sub to a function. And I'm certain this is not simply interchangeable!


    Sorry to take up your time John. fwiw here's the code I was trying at the end of my efforts...



  • Re: Web Scraping city code question, specific site


    Quote from wittonline;796878

    After giving up and using the fixed URL you used in your example I get (run-time error '1004': Application-definded or object defined error) when I try to get city into the worksheet using... ThisWorkbook.Sheets("PhoneToCity").Cells(x, 11).Value = city

    Because the x variable is undefined and therefore zero. You should put Option Explicit at the top of the module to trap these sort of errors.


    No reason why this can't be a Sub procedure, as long as it has a unique name.

  • Re: Web Scraping city code question, specific site


    What stinks is that I feel that I'm still a mile away, but then when I replace the string below with 'MsgBox City' your code IS pulling the city; so we're not that far off...I don't think. :smash:


    Here's what I've tried. I went back to a Sub and a traditional 'Call' in the main Macro, mainphone, below...


    I even tried playing with this...

    Code
    With ThisWorkbook.Sheets("PhoneToCity")
           .Range(.Cells(x, 11)).Value = city
        End With


    But still get (run-time error '1004': Application-definded or object defined error) as well as on the original...

    Code
    ThisWorkbook.Sheets("PhoneToCity").Cells(x, 11).Value = city


    See that I added Option Explicit. I can send an abridged version of the source file if that will help. :)


    And I haven't been able to avoid a syntax or object defined error using my URL query string...

    Code
    URL = "http://www.check-caller.com/1-" & ThisWorkbook.Sheets("PhoneToCity").Cells(x, 4).Value & ThisWorkbook.Sheets("PhoneToCity").Cells(x, 5).Value & "-" & ThisWorkbook.Sheets("PhoneToCity").Cells(x, 7).Value & "-index"



  • Re: Web Scraping city code question, specific site


    UPDATE: I think that the other macros work, because they're Functions which correlate to Sub mainphone() and see the progression values 'For x = 2 To 5'. Normally I use a popup to define y and I use 'For x = 2 To y' where y = the total number rows of data.


    So I either need to use a Function or somehow get your code to refer to the 'For x = 2 To y' formula. Even if I'm off base RE: The Macro vs. Function part, the key is that this is where x is defined and it progresses through one row after the other until y is reached.

  • Re: Web Scraping city code question, specific site


    Obviously I wasn't explicit enough. If x = 0 then Cells(x, 11) is an invalid cell reference, hence the error. Remember, it is Cells(row number, column number) and there is no such row number on a sheet as row 0.


    Put x = 1 somewhere before that line to define the x variable and then the line should work. But, as I said, I'll leave it to you to incorporate my code to suit.

  • Re: Web Scraping city code question, specific site


    I knew it wouldn't work, but I was curious if it would for the first record in row 2 and it did! The problem that I've been trying to communicate is that x needs to continue to grow based on the number of records, so like the other macros(functions) it needs to refer to the main macro 'mainphone' so it knows to go through the list of macros(functions) one for each site, then start over with x=3, go through the list again (sites) and then x=4 and so on...


    This is what I'm having trouble with. Just adding x=2 works perfect for the data on row 2, but when it comes around again and now it needs to look at the data on row 3, x=3 and so on...

Participate now!

Don’t have an account yet? Register yourself now and be a part of our community!