This is sort of a placeholder post. Busy meeting a deadline, but this should help future Steve and anyone else when you need to turn your Lyx document into a Word document while keeping the format mostly sane. Broken, but sane.
- Export as
Latex (plain)
.
- Run
latex <name of tex file, with or without extension>
- Run
- Run
- Run
- Try to run
htlatex <filename> "html,0,charset=utf-8" "" -dhtml/
html
: format to output
0
: normally chapters go into their own page, putting 0 here forces everything into a single page
charset=utf-8
: let us be civilised
-dhtml/
: puts the output files in a html sub-directory. Note that you can't have a space between -d
and the html/
- If the above fails with something like 'illegal storage address', and you get a warning about
text4ht.env
not been found, then you need to find where it is in your TeX installation, and:export TEX4HTENV
and try again
- Copy
text4ht.env
into your working directory- This approach also lets you affect locally some export parameters. More on this later...
- Open the html to verify correctness. You might object to the poor graphics quality. In this case copy
text4ht.env
into the working directory if you haven't done so, and then modify it so it uses a high density when converting images.
- See this tex.stackexchange.com answer for more details
- In my case, since dvipng was been used, I replaced all instances of
- with
- It also helps if you
- strip away html comments
- These look like
<!-- xxx -->
- centre aligned image divs
- remove
<hr/>
instances
- These changes will make the import into Libre/OpenOffice go easier
- Open the html file in Libre/OpenOffice
- File > Export > ODT
- Close html file
- Open exported ODT
- Edit > Links
- Select all links
- Break Links
- Verify that the ODT file is now much larger!
- File > Save As > Word 97 (doc)
Phew! To help future visitors, a simple python script to fix up the html as I have described is included at the end of this post. You will need
lxml
and
cssselector
installed.
Cheers,
Steve
#!/usr/bin/env python
from lxml.html import parse, HtmlComment
from lxml import etree
def main(*args):
if len(args) == 0:
return 1
doc = parse(args[0]).getroot()
body = doc.cssselect('body')[0]
# replace <hr/> with <br/> to make doc conversion easier
for hr in body.cssselect('hr'):
p = hr.getparent()
p.remove(hr)
br = etree.Element('br')
p.append(br)
# remove comments because for some reason libreoffice opens up
# html comments as document comments, slowing things down
for node in doc.getiterator():
if isinstance(node, HtmlComment):
node.getparent().remove(node)
# centre align all figures
for div in doc.cssselect('div.figure'):
div.attrib['style']='text-align:center'
print etree.tostring(doc, method='html', encoding='utf-8')
if __name__ == '__main__':
import sys
sys.exit(main(*sys.argv[1:]))