I need a Web Scraping application to scrap reviews of products in polish website ceneo.pl. It has to work as an ETL process. Extract, Transform and Load. Extract raw data, then transform it to get useful data and last, load it to database.
Data to be scraped:
I need a Web Scraping application to scrap reviews of products in polish website ceneo.pl. It has to work as an ETL process. Extract, Transform and Load. Extract raw data, then transform it to get useful, clear data and last, load it to database.
Data to be scraped for each review (by class):
a. product-reviewer (if lack of info, it shoul write "Anonim")
c. datetime (when posted)
h. vote-yes-(some number?) vote-no-(...) How many thumbs up and how many down. If "0", then it should remain empty in database.
4. Data to be scraped for each product:
a. Device Type
d. Additional comments
First user need to input ID of a product (e.g http://www.ceneo.pl/37164441 the number is ID) to scrape, then should be 2 buttons: [Extract Tranform Load] - which should stop after each step (Extract -> output of raw data [continue button] Transform -> output clear data [continue button] Load -> output data loaded to DB [Back to choices button with question - "Do you want to clear DB?" if yes -> clear and show home page, if no show home page]) and [ETL] which should show the final data in database (with the same back to choices button as before).
Application must have a button to clear database in home page too, to show how it is loaded, and app should delete every single file used for the ETL process when it is finished. Of course it can't be any duplicates in database. There should be an information about how many reviews was scraped. Of course it should work for all subpages if there is any.
I prefer PHP curl and simplehtmldom libraries, html5, CSS3 of course and MySQL for it.
I does not have to look extraordinarily, it has to work.
I need this app no later than 12 january 2017.
Thanks in advance!