grep the openoffice writer file

Hi,

I want to make a sed/grep combination on open office template file to change just a few key words. But when I did this grep -a TIME work1.doc there is nothing happened. Even withou -a flags, the result is still the same.

work1.doc is a Thai language openoffice template file for writer.

Any helps and hints are welcome.
 
An openoffice file is actually just a ZIP archive. You will thus need to extract it, edit the content (probably content.xml), and rebuild the ZIP archive.

tar -xf can extract a ZIP archive, but cannot rebuild it ; you will need something like archivers/zip.
 
Fred said:
An openoffice file is actually just a ZIP archive. You will thus need to extract it, edit the content (probably content.xml), and rebuild the ZIP archive.

tar -xf can extract a ZIP archive, but cannot rebuild it ; you will need something like archivers/zip.

Would you please demonstrate, I can not find any xml file in .doc file.
 
What Fred seems to think is that your document is a ZIP file, so you would do:

Code:
unzip work1.doc

And that would result in some files being extracted from your document.

However, I don't personally believe that the .doc is a ZIP file.

Generally, .doc files are binary files, meaning the formatting and text are encoded in some kind of binary code that is not generally known to the public. The .doc is not a plain text file, and so you cannot grep or replace text in it from the command line. It may be possible to save your document to some kind of XML in which case such greps and replacements may in fact be possible.

If you want to read your .doc file as raw text (or bytes), I suggest you open it with less or vi. That may shed some light as to the format it's in. Who knows, it may indeed be a Zip file.
 
jotawski said:
Would you please demonstrate, I can not find any xml file in .doc file.

apologized me, there is xml tag in that file too.
[~] % grep -n -a xml work1.doc
Code:
81:<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="3.1.1-111">
82:   <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
84:            xmlns:dc="http://purl.org/dc/elements/1.1/">
88:            xmlns:xap="http://ns.adobe.com/xap/1.0/">
95:            xmlns:xapMM="http://ns.adobe.com/xap/1.0/mm/"
96:            xmlns:stRef="http://ns.adobe.com/xap/1.0/sType/ResourceRef#">
105:            xmlns:tiff="http://ns.adobe.com/tiff/1.0/">
113:            xmlns:exif="http://ns.adobe.com/exif/1.0/">
120:            xmlns:photoshop="http://ns.adobe.com/photoshop/1.0/">
[~] %

but that is not what I'm interested in, I simply want to replace some words like TIME, CUSTOMERS, ISOTOPES with the real value enterred later.

Many thanks indeed for every helps and hints.
 
You mentionned an "open office template", and there is an "openoffice 3.1.0" tag on this thread, so I assumed that you were dealing with ODT files, which are definitely ZIP archives containing (amongst others) an XML file with the text of your document.

If you are instead talking about DOC files produced by Word, then, as you discovered and as others pointed out, this is false, and going to be harder. You may want to look into the Win32::Word::* packages for Perl, or whatever is your favourite language.
 
Thanks indeed for your hints. I am now reading, or more specific is studying, http://search.cpan.org/~dami/MsOffice-Word-HTML-Writer-0.07/lib/MsOffice/Word/HTML/Writer.pm given by your link.

The story is that, girl prepares ms-word documents for her boss and she complains that she has to write every things almost always the same for every customers. I offered myself to assist her by using my little knowledge of grep/sed to replace just a few variables like DATE, CUSTOMERS, PRICE and so on. I got REAL.doc and make it work1.doc with openoffice.

But the real world is not so simple and that's why I am asking.

Many thanks indeed for all helps and hints and more suggestions are welcome.
 
Back
Top