Version: 6 Jan , 2006
| Recent versions of commonly used word processing software (like MS Word) allow to save documents in html format (as "web pages") without any knowledge of HTML (Hyper Text Markup Language). For most purposes (like student papers) this automatic conversion works well enough; however, when transferring such documents (like submitting them as an attachment to an e-mail message or via the Blackboard (Bb) drop box special issues arise. These are discussed in this document -- providing both step-by-step "how to" instructions for various software constellations as well as some explanation of the technical issues involved. To have at least some conceptual understanding of why certain steps are necessary is important for anyone who wants to become "information literate". A companion piece ("How to produce simple html (web) documents with WP or MS Word") discusses the steps of producing such documents in the first place. |
| As file extensions are very important for this discussion, I strongly suggest that you have Windows Explorer show file extensions (by default, it does not). To do this, go to "Tools"/"Folder Options", select the "View" tab and make sure that the box labeled "Hide file extensions for known file types" is not checked. |

The convenience comes at a price, though. As MS Word 2002/3 tries to establish full equivalence between its proprietary .doc format and the .htm format, all images are stored in two formats, the .jpg format traditionally used on the web and the newer .png (portable network graphics) format which is not yet supported by all web browsers. So, the resulting .mht file is larger than necessary. Also, MS Word 2002/3 (as well as MS Word 2000) does not really generate HTML code, but rather more complex XML code which can lead to other problems.
MS Office 2000 web archive add-in. This "add-in" can be downloaded for free from Microsoft. The term "add-in" is somewhat misleading as it is a separate program. If it installs correctly, the right-click menu (when used with a file list in Windows Explorer or the "Open" Window in MS Word 2000) will show an additional entry "Save as web archive" when an htm file is selected (highlighted) and an additional entry "Unpack Web Archive" when an .mht file is selected. Simply selecting these options will execute the operation (and additional files will be added to the folder). As with using MS Office XP/2002/2003, do not "filter" the initial htm file.
Note that you must have MS IE as well as MS Outlook Express installed in order for this add-in to work. You do not have to actively use MS Outlook Express. Also, it is installed automatically with newer versions of MS IE -- unless you take special steps to prevent this (via a "custom installation"). More information in the MS Knowledge Base.
MS Internet Explorer (MS IE). At this point, I do not recommend to use MS IE to create .mht files for HTML documents created on your computer, e.g., by opening a local htm file and then saving it as "web archive". MS IE converts "relative" links into "absolute" links which creates problems for "internal" links. The "tag" for an internal link is typically of the form <a href="#note1"> and is interpreted by a browser relative to the current document. Converted to an "absolute" link, the tag becomes something like <a href="file:///c:\My Documents\GSR\miller_paper.htm#note1">. Once transferred, the HTML document is not likely to be in a folder called "c:\My Documents\GSR\" again, and thus the target cannot be found.
Use MS IE and the "web archive" (mht file) feature only to archive external web pages (downloaded from the Internet), e.g., to preserve evidence from a web page likely to change. Note that the same problem with internal links occurs. However, as long as the page stays the same on the external web server and viewing takes place with a live connection to the Internet, this flaw will not become apparent.
If you don't have access to MS Word
XP/2002/2003 and
you can't get the "MS Office 2000 web archive add-in" to work, and thus
MS IE is your only option, make sure that you save HTML documents
created
with MS Office as "filtered" or images may not be included. (Note this
advice is the exact opposite to what you should do when creating .mht
files
with either MS Word 2002/3 or the add-in.) The only way to make .mht
files
created with MS IE fully functional, is to edit the source code
afterwards
and change all internal links back to relative links. This can be done
with any text editor, but -- if you are not somewhat familiar with HTML
source code -- you may make some unintended changes as well. So, this
is
a last resort only.
The first task ("identifying the files") is easy to achieve when MS Word is used to create the HTML document. If the main file is called "miller_paper.htm" than all associated files are stored in a subfolder called "miller_paper_files" which is located in the same folder as the main file.
Note that you cannot simply move the
associated
files to the same folder as the main file, as the specific folder
location
is used in the underlying HTML source code. So, you must "zip with
relative
path information" or -- after transfer -- the unzipped document will
not
display correctly.
This 4:23 min
"screen movie" (a .wmv file requiring a recent version of the
Windows Media Player to view) demonstrates the process in great
detail -- assuming that you have WinZip installed on your computer.
When using e-mail, however, the sender should make sure that the
file
is sent as a separate attachment -- not displayed in the body of the
message.
Many mail programs (clients) offer a choice of keeping attachment
separate
or having them included with the body of yourt message. Some, however,
(most notably AOL mail)
incorporate
htm (and mht) files in the body of the text, offering no choice. When
using
such an e-mail program, it is necessary to use a work around by adding
an extension like ".bin" or ".exe" to the actual file name. This will
fool
such mail programs into assuming that the file cannot be displayed in
the
body of the message and thus it will be sent as a genuine attachment.
The
recipient, then, simply has to remove the extra (fake) extension.
Example:
If the file "miller_paper.mht" is supposed to be sent via AOL or a
similar
mail program, name it "miller_paper.mht.bin" first and then add it to
the
message as attachment. This, however, can create other problems as an
e-mail client may be set up to refuse "executables" as attachment (and
the extensions .bin and .exe arenormally used for executables). So, you
may want to add a fake .pdf (this will pass), but the recipient needs
to be warned that this a fake extension only and needs to be removed
upon receipt. The file will not open in the Acrobat/Adobe reader.
(Changing the extension does not change the internal structure of a
file!).