Extend Node.js module for web page scraping and interaction

Closed Posted May 8, 2015 Paid on delivery
Closed Paid on delivery

We need a robust web scraping/interaction module that's easy to use from code and uses ES6 ("Harmony") Promises.

-----USE CASE-----

We are planning to scrape 10-20 websites regularly and automate regular interactive tasks on several hundred more. Our current solution leverages PhantomJS and the node-phantom-simple wrapper, interacting with the web page and returning data by executing scripts within the page's javascript context. What we've got is serviceable, but our soon-to-be-several-hundred interaction scripts need to be as low-verbosity as possible, so we need more features to make this module as comprehensive a solution as possible.

Unlike a traditional web scraping system, this module's goal is not primarily to extract data, but to interact with the web page. Scraping is just a handy side benefit. We will be scraping 10-20 sites regularly, and automating interactions like account signups for hundreds more.

-----TECHNICAL REQUIREMENTS-----

This module needs to be able to handle at least the following, but we're looking for developers who can suggest and implement other relevant features too:

• Generalize to support WebDriver and Selenium, not just PhantomJS

• Authentication (HTTP Basic at least, per-site TLS client certs would be nice)

• Better handling of error conditions: Needs to be able to throw errors when certain selectors are detected (user-friendly styled error messages are usually delivered this way)

• Better handling of page transitions: Needs to be able to not just wait for a selector, but throw if the page loads and it's not there

• File downloads: Needs to be able to click buttons, get redirected to PDFs and CSVs, and return them as data

We run on [login to view URL], so you get most of the proposed Harmony features, with the notable exception of destructured assignment. We use the bluebird promises library on top of that because it is nice and convenient.

-----WHY YOU SHOULD CARE-----

We are a startup that is manpower-limited, and so if you perform well on this job, there's plenty of potential for repeat business. This job is sort of a "screener", if you will. We're searching for someone who works well with us, someone who we can hire again in the future. Since we might have to put out another screener if we hire someone who turns out to be a non-ideal candidate, we're only looking to commit a small sum, in case we have to try several times before we find an ideal candidate. We hope to solicit offers with a number of different price/feature set combinations, but we're expecting to spend about $500. You could convince me to spend more, but you'd have to justify it by wowing me somehow.

PS: I'm a developer just like you, and I know my stuff. That means I'll be opinionated when it matters on coding style and quality, so be wary: you're not going to get away with any hand-waving on this project. On the other hand, it means you'll have a boss who isn't a clueless end-user, and who respects talent!

JavaScript Node.js Software Architecture

Project ID: #7636011

About the project

5 proposals Remote project Active Jun 14, 2015

5 freelancers are bidding on average $505 for this job

mantislin

Hi sir, I am scraping expert, I have did too many similar projects, please check my feedback then you will know. Can you tell me more details? then I will provide demo data for you. Thanks, Kimi

$444 USD in 6 days
(139 Reviews)
7.3
ibapi

A proposal has not yet been provided

$473 USD in 10 days
(102 Reviews)
6.8
qexon

Hi, We are interested in this project. We are an IT team of experts and professionals. We are known for our hard work and quality and client's satisfaction is our main concern. We strive to give the best results to More

$500 USD in 30 days
(9 Reviews)
4.7
aftabbajaj1

Hi, I am professional in this task.I would like to assist you in your projects. Get it done professionally get it done right the 1st time. I am here to make long term relationship. I would like to get all details from More

$444 USD in 10 days
(0 Reviews)
0.0
codeheadmsu

A proposal has not yet been provided

$666 USD in 10 days
(0 Reviews)
0.0