Viewing docx files

Any suggestions on a docx viewer? Just want to view them, not edit.
OpenOffice is too much and Abiword seems to have trouble with docx.
 
LibreOffice is quite big as well, but it somehow manages to compile a lot faster than OpenOffice.org, so it might be worth a try.
 
Re:

Beastie said:
If all you want to do is read them, then uncompress them and read the contents using any text editor.
What command would I use to do that?
 
Thanks @Beastie! That's really useful. Unfortunately the content shows up as document.xml. So what does one do with that? A text editor show garbage. Konqueror reads it, but runs some words together. Not bad though, and will do in a pinch. Firefox just displays a mess of markup. I wonder if there's a simple (command line is best) program to convert .xml to plain text.
 
Last edited by a moderator:
cpm said:
You can use xmlto(1) for that purpose, as following:
% xmlto txt document.xml

I tried that with several documents from two different sources and the result is this:
Code:
Document /home/ole/tmp/word/document.xml does not validate
 
OJ said:
cpm said:
You can use xmlto(1) for that purpose, as following:
% xmlto txt document.xml

I tried that with several documents from two different sources and the result is this:
Code:
Document /home/ole/tmp/word/document.xml does not validate

You need to pass or use --skip-validation option or fix the document syntax :)
 
cpm said:
You need to pass or use --skip-validation option or fix the document syntax :)
Oops, sorry I forgot to mention that I already tried that. Perhaps Microsoft has their own proprietary format for XML since that just gives a .txt file with a great pile of markup. Like this:

Code:
<w:document><w:body><w:p><w:pPr><w:jc></w:jc><w:rPr><w:b></w:b><w:i></w:i>
<w:sz></w:sz><w:szCs></w:szCs><w:u></w:u></w:rPr></w:pPr><w:r><w:rPr><w:b></
w:b><w:i></w:i><w:sz></w:sz><w:szCs></w:szCs><w:u></w:u></w:rPr><w:t>Attn:
Residents of </w:t></w:r><w:proofErr></w:proofErr><w:r><w:rPr><w:b></w:b><w:i>
</w:i><w:sz></w:sz><w:szCs></w:szCs><w:u></w:u></w:rPr><w:t>Coalmont</w:t></
 
You can strip out all XML tags of word/document.xml, e.g. % unzip document.docx word/document.xml | sed 's#</w:p>#\n\n#g;s#<[^>]*>##g'
 
Back
Top