Find Jobs
Hire Freelancers

The project\'s goal is to develop a focused web crawler that produces categorizable web links to be structured within an open source database

€1500-3000 EUR

Closed
Posted over 10 years ago

€1500-3000 EUR

Paid on delivery
Project Summary: The project\'s goal is to develop a focused web crawler that produces categorizable web links to be structured within an open source database. The web crawler should be language independent, and allow high user flexibility both in terms of sources and keyword combinations to be crawled on a daily basis. From an IT perspective, it could mean to program a \"focused web crawler\" that can search specific domains (mostly news and specific industry sources), index the resulting pages\' content und filter these content\'s based on an intelligent algorithm (\"text search\") that takes into account a given selection of keyword combinations. We are open to discuss other ways of realizing the project in case the freelancer is able to convincingly argue a better/ easier/ more cost efficient methodology. A typical scope of a daily crawl for one language could involve about 500 sources and about 200 keyword combinations. As a result, we would expect the crawler to find about 5-50 new results (links) for each of such daily crawls. The resulting links and meta data (such as frequency of keywords found, date, source, mime-type) should subsequently be stored in a database to be further analyzed. Required capabilities: • Experience in Python as the preferred programming language, alternatively Java • Experience in Lucene/ Solr/ Nutch as the preferred frameworks and technologies to be used. Potentially alternative search technologies. • Experience with necessary open source databases for the input (keyword combinations, web sources) and the output data (links, meta data) Contracting: • The project\'s time frame is estimated to be around 4 weeks, 12 days for developing the application and 8 days for testing/ modifying. • The proposed fee would range between 1.500-3.000€, depending on the candidate\'s experience. Part of the fee will also depend on the quality and completeness of the results links. • The IP rights and the entire code on the final product will stay with the customer. During the testing phase, the customer should have full access to the final test version, without any limitations. If interested, please write an e-mail to: [REMOVED BY FREELANCER.COM ADMIN] with your comments and conditions. Thank you
Project ID: 4864590

About the project

3 proposals
Remote project
Active 11 yrs ago

Looking to make some money?

Benefits of bidding on Freelancer

Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs
3 freelancers are bidding on average €2,645 EUR for this job
User Avatar
Let's start!
€2,000 EUR in 15 days
5.0 (11 reviews)
3.3
3.3
User Avatar
Hello. We have a senior software developer who can develop the web crawler you need. Please check the PM to see the CV of our developer and reasons why to choose us. Looking forward to your reply.
€3,157 EUR in 0 day
0.0 (0 reviews)
0.0
0.0
User Avatar
I'm very interested in this project. I'm a telecommunications engineer currently doing a PhD in complex systems sciences. I'm very proficient in maths, algorithms, data mining and statistics. Good programmer in Java and Python, also Matlab/Octave and R. Very used to do database projects. Check my profile to see the work I've done.
€2,777 EUR in 3 days
0.0 (0 reviews)
0.0
0.0

About the client

Flag of GERMANY
Berlin, Germany
0.0
0
Member since Aug 26, 2013

Client Verification

Thanks! We’ve emailed you a link to claim your free credit.
Something went wrong while sending your email. Please try again.
Registered Users Total Jobs Posted
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Loading preview
Permission granted for Geolocation.
Your login session has expired and you have been logged out. Please log in again.