Save all content of webpages (including FRAMES)

The project is designed to create a piece of software that will save all content of webpages (including FRAMES) for any given list of URLs. Basically, the software should do the following:

• After execution, it should ask the user to paste a list of URLs from Excel

• For each URL, it should save the full contents (including content of all FRAMES) of the page located at that URL into a separate folder on the hard drive

Now the full details:

• The software has to be Windows-based

• It can be written using any programming language

• The most important requirement is the ability to save ALL contents, in particular content of FRAMES. It should save all files separately (related css files, images, html file and javascript files) – it should basically save it “faithfully” – just as browsers see it (please see the following note)

• For this reason, it might be easier (might not be – we don’t know the best way and this is just an option to consider) to create this software in a form of a Google Chrome extension or a Mozilla Firefox add-on, because both Chrome and Firefox can save all contents of pages as they displays them – with frames, images, etc. (Chrome’s default “Save As” does that, while Firefox uses another add-on – “Mozilla Archive Format” – to save pages “faithfully”). However, we are not sure if Chrome and Firefox have any disk write APIs, so this might not work. For your own testing purposes, it might be a good idea to compare the results with the way Chrome saves pages.

• The software must have the following adjustable parameters:

- Minimum pause between processing next URL (in seconds) – MIN_WAIT

- Maximum pause between processing next URL (in seconds) – MAX_WAIT

- Download folder (folder on the hard drive)

• This is how the software should work:

- User starts the software

- The software asks for a list of URLs

- It should be capable of accepting lists of up to 10,000 URLs

- We need to make input easy. We produce links in Excel, so we should simply select a range of cells with URLs (in one column), copy them and paste them into the software.

- Then we should be able to set two pause parameters – MIN_WAIT and MAX_WAIT - min and max pause between finishing processing one URL and moving on to the next one. For example, MIN_WAIT =2sec, MAX_WAIT =10sec. Then for each URL that the software is about to load, it should wait a random amount of seconds between the MIN_WAIT and MAX_WAIT number of seconds before attempting to open and save it.

- Then we should be able to select the download folder. By default, the software should remember previous choice.

- Then we should hit a “start” button and for each URL the software should do the following:

a) Create a new folder for the contents of this URL within the Download folder. The individual folder’s name should follow this format: “YYYY-MM-DD-HH-MM-SS”, which is basically the time of creation.

b) Save all contents of this URL into this individual folder.

c) Add a line to the program log (see below).

d) Generate a random number of seconds between MIN_WAIT and MAX_WAIT and wait that number of seconds before moving on the next URL

e) Logging. The software should maintain a log file (text file) of all URLs that have been processed. For each URL it should save one line of text using the following format: “YYYY-MM-DD-HH-MM-SS: URL” – the timestamp should be same as the timestamp in the folder name for any given URL

- The software must be able to work “quietly” – either in the tray or (if part of a browser) in the taskbar. Basically, it shouldn’t pop up for each URL or anything like it – the user should be able to use the PC for other tasks while the software is running.

- Finally, the software should have a line with progress text to show that, for example, “120 or 1500 URLs processed”.

Please see more details in the attached file.

Skills: Software Architecture

See more: which programming language is best, we make webpages, the d programming language, the best way to start programming, the best programming language to start with, the best programming language, software testing how to start, progress programming language, programming language list, programming language d, processing programming language, min range, make your own programming language, javascript programming software download, javascript programming language, how to write your own programming language, how to write the best content, how to write programming in excel, how to write a programming language, how to make your own programming language

About the Employer:
( 0 reviews ) Sunbury, United Kingdom

Project ID: #1511927

3 freelancers are bidding on average $800 for this job


I can do it

$750 USD in 20 days
(9 Reviews)

Hi, I'm a proficient network systems programmer. i've done a similar project before. please check PM. Angad.

$900 USD in 11 days
(1 Review)

Im new in, but have large experience in software development, specially under .NET technologies. Im planing to deliver your need in C# solution, that will make http requests, parse html responde and s More

$750 USD in 15 days
(0 Reviews)