Development of MS file support

¥25000-75000 JPY

Closed

Posted

over 9 years ago

¥25000-75000 JPY

Paid on delivery

We are developing online translation management system called YarakuZen. In YarakuZen, users can manage translation with dictionary, machine translation, translation memory and human translation at one stop. [login to view URL] It supports only text at the moment and we are trying to support 3 kinds of MS documents in OOXML. The spec/flow will be: 1. Import zipped docx/xlsx/pptx to internal (folder) structures with OOXML. 2. Extract translatable texts along with XML tags as minimally complete as possible, such as everything under <w:p> for docx, except those are unrelated to layout. It's ok to leave OLE documents, for instance, xlsx in docx, unprocessed for the time being. 3. YarakuZen get the extracted texts, translate them, and send them back to the OOXML library. 4. Replace extracted texts with translated ones, keep layout remaining intact. 5. Export manipulated OOXML to be translated docx/xlsx/pptx. Requirement: • Support DOCX, XLSX, PPTX in Office 2010 standard • PHP 5.4+ • Delivery Deadline 8/25(Mon) Nice-to-have bonus points: 1. Concatenate text chunks split by unrelated tags; in other words, restore texts as human will read them as a whole segment. 2. Collapse/fold tags in-between extracted texts and expand/unfold them when replacing translated texts. For instance about the requirement and the above two bonus points: <w:p w:rsidR="00983391" w:rsidRDefault="009A1105"> <w:pPr> <w:rPr> <w:rFonts w:hint="eastAsia"/> </w:rPr> </w:pPr> <w:r> <w:rPr> <w:rFonts w:hint="eastAsia"/> </w:rPr> <w:t>あいうえ</w:t> </w:r> <w:proofErr w:type="gramStart"/> <w:r> <w:rPr> <w:rFonts w:hint="eastAsia"/> </w:rPr> <w:t>お</w:t> </w:r> <w:proofErr w:type="gramEnd"/> </w:p> The spec/flow indicated the expected intermediate data could be <w:r> <w:rPr> <w:rFonts w:hint="eastAsia"/> </w:rPr> <w:t>あいうえ</w:t> </w:r> <w:r> <w:rPr> <w:rFonts w:hint="eastAsia"/> </w:rPr> <w:t>お</w:t> </w:r> Note that <w:proofErr w:type="gramStart"/> is removed between the above two <w:r> blocks. Bonus point 1 expected the intermediate data to be <w:t>あいうえお</w:t> Bonus point 2, on the other hand, expected the library will keep track on tags and generate intermediate data such as <tag1>あいうえお</tag1> If point 1 is somewhat too ambiguous to develop, point 2 may still be feasible independently as <tag1>あいうえ</tag1> <tag2>お</tag2> Please get back to me with a rough estimate.

Project ID: 6290476

About the project

3 proposals

Remote project

Active 10 yrs ago

Looking to make some money?

Email address

Benefits of bidding on Freelancer

Set your budget and timeframe

Get paid for your work

Outline your proposal

It's free to sign up and bid on jobs

3 freelancers are bidding on average ¥76,650 JPY for this job

@khaledkee

I will try to make it customizable and generalized. The input would be an input upload file field in a form. I'm fine with the MS Office 2010 and 2007 standard. I can manipulate DOCX and XLSX files but PPTX has some restrictions. The required Deadline is somehow hard but I'll try to stick to it. I can work with PHP 5.4+ file system and other function totally fine. I'd developed a lot of word processors, so I guess extracting text will be fine. In my understanding for bonuses, you want only the text. I see they are required not bonuses. Regarding point 3: I'm not sure I can't do anything with OLE - somehow difficult. I can extract non-ambiguous texts. I mean paragraphs, Tables, shapes, smart arts, word arts, headers, footers, hyperlinks, and other objects that stores texts in <w:t> tag. No ActiveX objects, Embedded objects, watermarks (possible only for word not powerpoint), Reviews, footnote. There will be different library for every extension. I mean I won't use standard OOXML library. Simply, because it doesn't exist for PHP. I've to understand the input\output mechanism of YarakuZen. So I invite you, sir, to contact me to discuss further details. I plan to start with Word. I've already multiple ways to manipulate word documents programmatically. Then, I would go with Excel. It may be easy, though I'm to extract only cell content (No graphs and other objects for the current plan). In the powerpoint phase, I've only one or two ways. So I'm restricted to some conditions.

¥100,000 JPY in 18 days