We recommend switching to the latest versions of Edge, Firefox, Chrome or Safari. Using Internet Explorer will result in a loss of website functionality.

How to aquire data from a website via http calls to a sql web tool

Comments

1 comment

  • Avatar
    Christina Costello

    Hi Mark, just posting the resolution that you and Stony came up with here so that others can use it.

    Essentially this task requires creating a string of Filters and HTTP nodes. The filter node can set up the URL and BODY you need, and then the HTTP node sends that request, similarly to how you would in a browser.

    If the website has an official API to retrieve the data, it would make this task easier. However, if they don’t have a formal API, then you’re going to need to simulate the steps that a desktop browser goes through to retrieve the data.
    The first difficulty you will encounter is how to log in to the site. Any given website may involve the use of any one (or more) of these:

    1. Open website, no credentials
    2. Trivial login – username/password only
    3. Secured trivial – where they’ve added a SSL certificate that you must install locally before you can access the site
    4. Secured access using username/password within the headers of the HTTP standard
    5. Secured access using username/password in the body of a HTTP POST request
    6. You make one request to the site, and get back a cookie that grants you access
    7. You make one request to the site, and get back a session variable that is then used to go deeper
    8. Oauth1, Oauth2
    9. Kerberos


    Often, this task requires a string of 3 to 5 HTTP nodes to make it work. But, it’s different and unique for every website, and there will be a fair bit of trial and error to get it to work.

    The next difficulty is that several sites cause you trouble getting to your files even after you get in. That’ll likely be another set of coding to retrieve the correct file. In Google Chrome, without any page open, open the Developer Tools (F12) -> Network tab. Then go through the full set of steps to download a file. The information that Chrome traces there will most likely be the steps you need to emulate with discrete nodes. You obviously can ignore the requests for images and logos, but the login steps should be listed there.

    Note: you should plan for the fact that any of the schemes above will likely only be “alive” for between 6-18 months....it's likely the website will make alterations and you often have to start over with coding the logic, so keeping this logic working in the long term will require some work.

    The attached graph provides an example of this. It works for a single website. The website used in this example requires a “token” called CSRF before you can actually issue the login request. Your target website might not require that.

     

    Attached files

    Shop_Inventory.brg

     

    0
    Comment actions Permalink

Please sign in to leave a comment.