Create a tesseract configuration to split documents based on pagecode T pages
€15-50 EUR
Paid on delivery
we have two type of documents:
- multipage PDF files (could already contain also OCR detected text)
- multipage Tiff files
These pages contain the standarized patchcode T separator pages.
Samples of the patchcode T
- [login to view URL] on page 11
- [login to view URL] on page 75
Your job is to provide us a shell script which
- gets as input either a PDF file or a Tiff file (choosable by param)
- parses through the file and splits the file the by given patchcode T into multiple files (with same filetype)
- does OCR of the content (shall be switchable with on/off to decide if OCR shall be done or not)
Ensure the pagecode page can have any arbitrary content between the code lines (like in the samples)
alternative to Shell-Script is also a Java-Implementation
Project ID: #15704623
About the project
4 freelancers are bidding on average €90 for this job
i have existing project here ready and similar to your needs i use enhanced tesseract ocr Stay tuned, I'm still working on this proposal.
Hi, I hope you are doing fine, I have relevant experience in parsing PDF and reading TIFF images using Java. I have also worked on Tesseract OCR engine as well and I assure you i am the best fit. Relevant Skills and More