Create a tesseract configuration to split documents based on pagecode T pages

Closed Posted 6 years ago Paid on delivery
Closed Paid on delivery

we have two type of documents:

- multipage PDF files (could already contain also OCR detected text)

- multipage Tiff files

These pages contain the standarized patchcode T separator pages.

Samples of the patchcode T

- [login to view URL] on page 11

- [login to view URL] on page 75

Your job is to provide us a shell script which

- gets as input either a PDF file or a Tiff file (choosable by param)

- parses through the file and splits the file the by given patchcode T into multiple files (with same filetype)

- does OCR of the content (shall be switchable with on/off to decide if OCR shall be done or not)

Ensure the pagecode page can have any arbitrary content between the code lines (like in the samples)


alternative to Shell-Script is also a Java-Implementation

Java OCR Shell Script

Project ID: #15704623

About the project

4 proposals Remote project Active 6 years ago

4 freelancers are bidding on average €90 for this job

iitmshanker

A proposal has not yet been provided

€200 EUR in 2 days
(1 Review)
3.2
ranzhie07

i have existing project here ready and similar to your needs i use enhanced tesseract ocr Stay tuned, I'm still working on this proposal.

€61 EUR in 0 days
(1 Review)
1.1
sreejith1993

Hi, I hope you are doing fine, I have relevant experience in parsing PDF and reading TIFF images using Java. I have also worked on Tesseract OCR engine as well and I assure you i am the best fit. Relevant Skills and More

€45 EUR in 10 days
(0 Reviews)
0.0
livezingy

Thanks very much for your invitation! I'm not the right one, because I'm not familiar with Shell-Script and Java. I come here to say thanks and sorry to you . Best Wishes for you! Relevant Skills and Experience More

€55 EUR in 10 days
(0 Reviews)
0.0