The project you are proposing needs to be carefully planned and something key here is the scraping platform you are going to choose, for several reasons, probably the most important:
- Development Speed: The least the time, the least the cost. Here I always advise a visual programming environment with a large toolset.
- Efficiency: you really need to download documents fast, however, you need to have a strategy because you cannot overload servers. You need a technology that can do multithreaded downloading but with supporting rules to avoid overload.
- Maintenance: This one is really important, since you are working with different sites. You need to take into account that a site is likely to change breaking up your parsing logic. In this case I dont recommend you to do any programming, but again use a visual environment that let you write robust expressions that will not only hardly break, but also are easy to identify and correct.
- Data Integration: What you want to do with the data after you've extracted it? you need a platform that will allow you to do this.
Finally, I am expert web scraper with more than 10 years of experience in web scraping and data integration. I have extracted billion of records from ecommerce websites for product repricing, stock sync, etc.
Please contact me on PM so that I can give more details about my offer. Basically consists on an affordable scalable and visual scraping platform and about a few hours of my work to scrap each website.