file system crawler

in the process of building a desktop search engine, we need a file system crawler. it should be designed such that additional filter modules can be added to support indexing of new document types. in addition, it should support recursive crawling. for example, it should be able to index a word document within an email attachment of a zipped outlook PST file if all necessary filter modules have been installed. it should also support multi-language.

interested bidders should describe your experience in this field along with a proposed project plan covering the crawler design (with diagrams preferred), timeline and bid price. we also prefer bidders who provides idea on integrating the crawler with existing open source search engines such as SWISH++, etc. the use of 3rd party filter libraries is encouraged to speed up development.

check out:

[url removed, login to view]

## Deliverables

the following is the list of formats the crawler should support:

Document formats:

Adobe Acrobat Reader (.pdf)

Adobe PageMaker 4.0, 5.0, 6.0, 6.5 (.pm4, .pm5, pm6, .p65, .pmd)

AmiPro (.sam)

ChiWriter (.chi)

Compressed HTML (.chm)

Hyper Text Markup Language (.html)

Help files (.hlp)

Microsoft Excel 2, 3, 4, 5, 95, 97, 2000, XP (.xls)

Microsoft Power Point (.ppt)

Microsoft Word (.doc)

Microsoft Word for Macintosh (.mcw)

Microsoft Word Templates (.dot)

Microsoft Write (.wri)

Plain Text (.txt)

PROMPT translator (.std)

Rich Text Format (.rtf)

Word and Deed (.w&d)

Word Perfect (.wpd)

Works for Windows (.wps)

XML Extensible Markup Language (.xml)

Email format:

Outlook 97/98/2000/2002(XP)/2003

Outlook Express

Microsoft Exchange 95/97/98/2000/2001/2002/2003

.MSG, .EML messages

.MBX Unix mailboxes

Searching in e-mail messages attachments

Archive formats:

PKARC by PKWARE (.arc)

ARJ by ARJ Software (.arj)

Cabinet by Microsoft (.cab)

GZIP (.gz)

RAR by [url removed, login to view] (.rar)

ZIP PKZIP by PKWare (.zip)



1) Complete and fully-functional working program(s) in executable form as well as complete source code of all work done.

2) Deliverables must be in ready-to-run condition, as follows (depending on the nature of the deliverables):

a) For web sites or other server-side deliverables intended to only ever exist in one place in the Buyer's environment--Deliverables must be installed by the Seller in ready-to-run condition in the Buyer's environment.

b) For all others including desktop software or software the buyer intends to distribute: A software installation package that will install the software in ready-to-run condition on the platform(s) specified in this bid request.

3) All deliverables will be considered "work made for hire" under U.S. Copyright law. Buyer will receive exclusive and complete copyrights to all work purchased. (No GPL, GNU, 3rd party components, etc. unless all copyright ramifications are explained AND AGREED TO by the buyer on the site per the coder's Seller Legal Agreement).

## Platform


Skills: .NET, C Programming, C# Programming, Database Administration, Engineering, MySQL, PHP, Software Architecture, Software Testing, SQL, Visual Basic

See more: file system crawler, xml programming language, w.s. development, working of web crawler, word templates price list, word express web design, web markup language, web development process ppt, web development price plan, web design programming pdf, unix programming language, unix programming environment pdf, unix programming environment, types of searching in c, translator works, translator search engines, translator price project, the unix programming environment pdf, the unix programming environment, the d programming language pdf

About the Employer:
( 8 reviews ) Hong Kong

Project ID: #3457883

3 freelancers are bidding on average $2859 for this job


See private message.

$34 USD in 14 days
(2 Reviews)

See private message.

$42.5 USD in 14 days
(5 Reviews)

See private message.

$8500 USD in 14 days
(0 Reviews)