HTTP node: NoHttpResponseException
Dear community,
Can't get the following to work in both LAE 6.1 and Data360 Analyze (same error message):
I'm trying to use the GET method in the HTTP node to parse around 30,000 URL's, all from the same domain. All I want to know is the status message (such as 200 OK).
Some of these URL's are 'bad' and consistently generate the following error message:
"A connection error occurred for GET request to http://XXXXXX: Details: org.apache.http.NoHttpResponseException: XXXXX failed to respond.
Error Code: brain.node.http.transportProtocolError"
If I paste such a 'bad' URL in a web browser, It gives a response like "Hmmm...can’t reach this page".
So what I would expect the HTTP node to do is NOT to just stop working in the middle of parsing 30,000 URL's, but to simply continue and accumulate such errors in a second pin (for example). Unfortunately, such functionality does not exist (yet), a second pin cannot even be added...
Is there perhaps another way to solve or circumvent this issue? The thing is: I have no way of knowing beforehand whether an URL is 'bad'.
Any help would be greatly appriciated!
Best regards, Bart.
-
I deleted my first reply to replace with this data flow. It gives 3 different options to check if a web page exists:
1. Using subprocess module - it will execute a curl command and give the result from it.
2. Using os module - another curl command but you won't see the result, just a result code.
3. Using urllib2 - probably the most correct way to do it. You must set the parameter DontRunInContainer for it to work. The documentation talks about how to do this in more detail.
Hope this helps!
Attached files
Please sign in to leave a comment.
Comments
2 comments