PHP - building a scraper - find final destination of redirected URL?

Status
Not open for further replies.

Icecube

Up 24h/day
Mar 14, 2007
1,169
9
0
Europe
hi guys

this scraper should parse pages where links to external sites are served through a redirection
I mean, to link example.com Web Page they'll send you to http://link.thissite.com/some-weird-url&anykind=ofcrap, that redirects you to example.com

how can I find out where it redirects?

suggestions,link,guides,whatever is welcome

should it be done with http headers?
it's just the only way I can imagine, but I've never worked with headers so, again, any suggestion is welcome
 


looks exactly what I need

say I want to store the last location in some variable...how would I do it?

will it put the last location in the scraped array instead of the plain scraped URL? sounds too good to be true...

I am learning curl, I've just read smaxor's tutorials so I can't say I can really use it
 
server reponse headers is the solution.

you have to grep for "Location: (.+?)\r\n" - if you're using file_get_contents use
Code:
$http_response_header

thats an array iirc that will get filled by php after the request.

:error:
 
Do you really need the last URL? When using FOLLOWLOCATION curl will return the last page in any redirection-chain, which is what I presume is what you are after when talking about a scraper.
 
I am not sure which one I should use, I need to store the final site where the user lands when clicking that link, I don't care through which hell of redirections he's going through

I didn't try the stuff I have been suggested yet
 
The solution has already been spelled out for you. If you use CURL, use my solution. Otherwise, test the other methods given.
 
Status
Not open for further replies.