Closed

Need an expert data scientist developer to develop a set of scrapers to scrape firmware files and their info from several vendors websites -- 2

A set of crawlers based on Scrapy framework that can download and synchronize all of products' firmware (including all versions) from web pages of a given list of predefined vendors and store the firmware information (meta data) in SQLite DB. The mandatory metadata fields include ( Manufacturer, Model, Version, Type, Name, Release Date (if available), Download link, ( calculated Sha2 hash of the file)i.e. ( Cisco, Video Surveillance 6030 IP Camera, 2.7.0, IP Camera, [login to view URL], 21/08/2015, "link", “Sha2” ) There is a non-mandatory binary field which indicates if the device is discontinued or not depending on the availability of such information on the website of the vendor. The firmware files itself will be stored in the file system and will be referenced in SQLite. The developer is required to follow DB schema and code templates provided by us. It's also the responsibility of the developer to test crawler and ensure completeness of the solution in terms of full coverage of the firmware files and product pages.

There are no GUI components on the server that runs crawlers. Therefore, headless browsing mode should be used.

Solution Scope

1. Crawlers will be written per vendor. This is required because each vendor website will have its own implementation of the firmware download page.

2. The user should be able to pause and resume crawling jobs.

3. Crawlers should detect previously downloaded files and only download updated and new content and firmware files. At first execution of each crawler, it will download all the available firmware files but the subsequent crawler runs will only download new firmware files which are added since the last crawling. This will be achieved by analysing data available in SQLite and skipping the files that have already been downloaded and processed.

3. The developer is required to manually analyze each provided vendor site before writing a crawler to identify the following required information:

a. URLs for the firmware download page including all of the firmware versions for each product

b. URLs/files for each product that include these info which are required to be scraped: "Manufacturer", "Model", "Version", "Type", "Release Date", "if the product is discontinued"

c. Credential Requirements (Simple Signups, Specific Signups, No Signups)

d. Any Captcha on the page

e. Any honeypot traps

4. If a vendor site requires credential for firmware download, the developer is required to sign up an account using a gmail address dedicated for this project

5. Script will try to imitate human like behaviour (to a limit) while scraping the web page as well as using Tor if required, so that if the vendor site has scraper/crawler detection logic implemented, it can be skipped. This can be achieved by adding random delays, random view time, avoiding honeypot traps through manual analysis

Solution Brief

*The crawler set is expected to contain 100 vendors ( each vendor could be pretty different from the others ) and the milestones are defined per vendor and each milestone is max 50€ which is paid after we verify the completeness of each crawler and see no errors. The developer MUST test the completeness of each crawler before delivering to us and present test completion evidence in the form of a populated SQLite database of that vendor.

*The NDA must be signed before the beginning of the project.

*Please only apply when you fully read and understand the project and agree with the conditions.

Skills: Scrapy, Web Scraping, Python, Data Science, Software Development

About the Employer:
( 4 reviews ) Brussels, Belgium

Project ID: #25625646

19 freelancers are bidding on average €3950 for this job

mmadi

Hello Zahra K., Please discuss with me more in details about your project (Need an expert data scientist developer to develop a set of scrapers to scrape firmware files and their info from several vendors websites - More

€3500 EUR in 36 days
(9 Reviews)
6.0
RaspberryOculus

Hi I understand the firmware and I have built a couple projects using a scrappy and python and I can crawl thousand some website with my current system please send me a private chat message

€3176 EUR in 8 days
(4 Reviews)
6.0
dstepanenko

Hello,    I'm data scientist with huge expertise and mathematician with a number of publications. Also I'm participant and problem writer of many algorithm competitions (Topcoder, ACM ICPC).         Feel free to cont More

€4000 EUR in 7 days
(28 Reviews)
5.5
arturkandalyan

Hi sir, I am a Web & Data Scraping Expert who have career for 6 years over. I am very happy to bid on your job. I have already worked on several similar projects for collect data & contact & business information such a More

€4000 EUR in 7 days
(6 Reviews)
4.8
nikhil929

Greetings, Encourage Infotech is a Team of Leading Python developers that specializes in the Scraping and Data mining Industry. our Previous work includes Scraping in the Retail and Gambling Industry to Analyse the Up More

€4000 EUR in 7 days
(11 Reviews)
4.3
Palakash21

I am a full-time Full Stack Freelance having 7+ years of experience and have team working on Web App Developer-Designer (Specialising in CRM, ERP, Ecommerce, Website Developing & Designing, Android Apps, Ios Apps, web- More

€4000 EUR in 7 days
(4 Reviews)
4.0
bisquitue

Hi there, I have several years of experience building advanced web scraping scripts. My price will €35,- per site and I will be able to create scrapers for the 100 vendors in 20 days. Hit me up with a PM if you're in More

€3500 EUR in 20 days
(4 Reviews)
3.2
Infogrex123

Hi, I hope you are doing well! Infogrex Technologies is an IT technology company based in Hyderabad, India. We are a diverse team of Data Scientists, Market Researchers, Analysts, Programmers, and Project Managers. More

€3242 EUR in 10 days
(1 Review)
2.8
rahulkumartest

Hi, I hope you are doing well. I am a full-time freelancer working as a data science developer having 3+ years of experience and I worked with more than 100+ clients. Let's have a chat to discuss in detail as per your More

€3200 EUR in 30 days
(1 Review)
2.0
bitnetservices

This will be going project better to put on hourly basis since every vendor website is different and every scrape code will be different from previous one. Although data collected will be dumped in necessary format, More

€4000 EUR in 7 days
(1 Review)
1.2
altr1m

Hi, Manager! How are you? I have gone through your requirements and I am very pleased because this job is a good fit for my skill set. I have been working as a full-stack developer for 7+ years and I have a good experi More

€4000 EUR in 20 days
(1 Review)
0.4
ayninfo

Dear Sir,  Greetings for the day !!!  AYN INFOTECH is India's Fastest Growing IT Software Consulting Company with the latest Tech Stack powered by AI. We have a huge amount of experience in Web and application deve More

€5000 EUR in 45 days
(0 Reviews)
0.0
johnprogramming2

Hey!, I’ve carefully checked your requirements and really interested in this job. I’m full stack node.js developer working at large-scale apps as a lead developer with U.S. and European teams. I’m offering best qualit More

€4440 EUR in 7 days
(0 Reviews)
0.0
nalliancetech

Narinder Alliance Technologies LLC An IT Consulting and Software Development company. We have a team expert in Web Designing, Application Development and Databases. We have worked on various projects across various in More

€5000 EUR in 120 days
(0 Reviews)
0.0
AdamenkoV

Dear Client, I have rich experience about scrapping script making using peppeteer or apify, seleinum. Also, I can make scrapping script using python. I have made scrapping script before about youtube, amazon, raktuen, More

€4000 EUR in 7 days
(0 Reviews)
0.0
jaswanthrocky666

Hi, Relevant Skills and Experience:: PHP, Andriod, Software architecture, Mysql, Reactive Native,HTML, CSS, Bootstrap, Javascript, Jquery, Angular JS, Node JS, C programming, Python, Java, Wordpress, Drupal, Joomla, More

€4000 EUR in 7 days
(0 Reviews)
0.0
adev90

Hi I read your all project requirements carefully. I can start with NDA sign and finish it with correct scraping. After build scraping tool, I can manage and improve it's functions for a long term. Regards.

€4000 EUR in 30 days
(0 Reviews)
0.0
deremwit

I am an Information Technology enthusiast with a Bachelor of Science degree in Information Technology.I specialize in data science with sub skills in Data Analysis and Visualization in Python,Excel and Tableau,software More

€4000 EUR in 3 days
(0 Reviews)
0.0
Rogerfalcone

Hi, sir How are you? Thanks for taking your valuable time reviewing my proposal. I have seen your project description very carefully. I am pretty sure that I am the best candidate for this job. For why , I have a rich More

€4000 EUR in 35 days
(0 Reviews)
0.0