PHP/Curl / perl coders.

Status
Not open for further replies.

kasodo

New member
Sep 7, 2006
55
1
0
Rating - 0%
0   0   0
Before I go to a freelance site to find someone to do this project I thought I would check the community for an experienced php/curl / perl coder. The project has to do with scraping a website and a multitude of other tasks. It will require a very solid background in these languages. PM me for more details.
 


If this'll help i'm releasing some stuff similiar to what your wanting
Blue Hat SEO-Advanced SEO Tactics » Complete Guide To Scraping Pt. 2 - Crawling
that is a universal site crawler. Coming out in the next few days will be a universal site scraper that will work in conjunction with that script. I can't give away too much but it'll give something along the lines of
$var1="grab any text between here"
$var2="and here"

So you may not have to hire anyone. Good luck in your quest though.
 
This is more of a script for automating a log on process and performing actions too. Scraping is a minor part of the script. You can make pretty good money using what I want made.
 
For building a desktop scraper, I recommend the Chilkat Spider ActiveX control and it's free!

Here's some info about it:

Spider ActiveX Component for Crawling the Web
  • Crawl a single website.
  • Accumulate outbound links for crawling other websites.
  • Cache pages so future crawls can fetch from cache.
  • Robots.txt compliant.
  • Fetch the HTML content of each page crawled.
  • Able to crawl HTTPS pages.
  • Define "avoid" patterns to avoid URLs matching specific wildcard patterns.
  • Define "avoid" patterns for avoiding matching outbound links.
  • Read and connect timeouts.
  • Maximum URL size to avoid ever-growing URLs.
  • Maximum response size to avoid pages with very large or infinite content.
  • Wind-down count to set a limit on pages spidered per site.
Using this control I've built huge databases of scraped content. Perfect for BH projects. But you still need to Markov (or similar method) the scraped content if you want to rank it.​
 
Status
Not open for further replies.