Article Scrapping Script

Cancelled Posted Sep 26, 2007 Paid on delivery
Cancelled Paid on delivery

This project should be a web-based solution (i.e. PHP). Here's how it would work: 1. User is asked to specify a text file (.txt) to upload. This text file would just contain keywords, like: above ground pools acid reflux cures awnings bariatrics recumbent bike ...and so on. 2. The program would then go to an article directory (I will supply the URLS) and grab the first 20 articles found for each keyword. The program would parse out all HTML code in the articles. The program would remove all author and article source information. That means all that would remain of the articles is just the plain body text. The text for each article would be placed into an individual .txt file. (random file names can be used) 3. The program would create a zip file of all the individual text files for each keyword. So for the "above ground pools" keyword, all of those text files would go into a file named "[url removed, login to view]"...and so on. 4. The program would then display a link to each of the zip files (so the user can just click on the links to download each zip file). 5. The program would have a "Remove Zip Files" button, which the user would click to delete all the zip files from the server after they have been downloaded.

## Deliverables

1) Complete and fully-functional working program(s) in executable form as well as complete source code of all work done.

2) Deliverables must be in ready-to-run condition, as follows (depending on the nature of the deliverables):

a) For web sites or other server-side deliverables intended to only ever exist in one place in the Buyer's environment--Deliverables must be installed by the Seller in ready-to-run condition in the Buyer's environment.

b) For all others including desktop software or software the buyer intends to distribute: A software installation package that will install the software in ready-to-run condition on the platform(s) specified in this bid request.

3) All deliverables will be considered "work made for hire" under U.S. Copyright law. Buyer will receive exclusive and complete copyrights to all work purchased. (No GPL, GNU, 3rd party components, etc. unless all copyright ramifications are explained AND AGREED TO by the buyer on the site per the coder's Seller Legal Agreement).

## Platform

Linux based web server

Engineering MySQL PHP Software Architecture Software Testing Web Hosting Website Management Website Testing

Project ID: #3330495

About the project

3 proposals Remote project Active Oct 2, 2007

3 freelancers are bidding on average $65 for this job

bahe

See private message.

$59.5 USD in 3 days
(147 Reviews)
6.5
rylkov

See private message.

$85 USD in 3 days
(62 Reviews)
5.5
matthewscript

See private message.

$51 USD in 3 days
(9 Reviews)
3.2