Fog Creek Software
Discussion Board




Mozilla Firebird can't save as mht...

That's too bad.

Alex
Tuesday, December 09, 2003

There's really no good reason why it couldn't, either. It's an open file format, not something Microsoft invented.

Brad Wilson (dotnetguy.techieswithcats.com)
Tuesday, December 09, 2003

It also sucks big time. The links are hard coded so if you email somebody the file, or simply change your folder structure it no longer works.

When I first heard about it I convertedall my web page completes to it. A grave mistake. I'm slowly changing the recoverable ones back.

Stephen Jones
Tuesday, December 09, 2003

Apparently there's a request about this already in the system:

http://bugzilla.mozilla.org/show_bug.cgi?id=40873

and before this thread decends into another open vs closed source row someone's organising a code bounty (posh name for a whip round if you ask me) at 

http://www.dezyne.net/codebounty/

although I think they've only got ~$200 so far.

A cynic writes
Tuesday, December 09, 2003

I prefer to create PDFs out of web pages.  Use PDFCreator from sourceforge or PDF995 (http://www.pdf995.com) and you can turn anything into PDF by printing to the driver.

NoName
Tuesday, December 09, 2003

What sucks? MHT totally encapsulates a web page into one file, you can do anything with it (move it, email it, whatever).

Now you should never keep your 'source code' this way, and probably shouldn't stick them on web sites, but as an archive format it works well.

mb
Tuesday, December 09, 2003

You can't email them with Outlook unless you zip them because something gets messed up with the MIME formatting.

Wayne
Tuesday, December 09, 2003

Before you tell me to RTFM, I still don't get how the format works. It's just a plain-text html.

It doesn't use the NTFS streams for all I figure, because mht files used to open fine on my Windows 98 too.

Alex
Tuesday, December 09, 2003

It's just a MIME multipart message with a bit of magic to specify that an absolute url (e.g. http://www.example.com/image.gif) should be extracted from an 'attachment' in the message.

mb
Tuesday, December 09, 2003

I miss CTRL + TAB getting you to the address bar. Much easier key combination than ALT + D.

www.MarkTAW.com
Tuesday, December 09, 2003

Then change it back to Cntrl-Tab if you want.

Simon Lucy
Tuesday, December 09, 2003

"I prefer to create PDFs out of web pages.  Use PDFCreator from sourceforge or PDF995 (http://www.pdf995.com) and you can turn anything into PDF by printing to the driver."

To create a PDF from a web page with Mozilla Firebird for Linux, select "Print To File" and save the page as a PostScript file. This file can be converted to PDF (e.g., with ps2pdf).

ME
Wednesday, December 10, 2003

...and that works with almost any browser in windows too. Print to file and convert the .ps (or .prn) file to pdf using the windows version of ps2pdf.

uncronopio
Wednesday, December 10, 2003

The point is that the llinks withing the mht are for some reason hard-coded to the path to the document. So when that path changes things start not to work. Been there, done that!

Stephen Jones
Wednesday, December 10, 2003

"...and that works with almost any browser in windows too. Print to file and convert the .ps (or .prn) file to pdf using the windows version of ps2pdf. "

Thanks!

I'd tried it before under Windows and wasn't able to convert the output to PDF. At the time, I didn't investigate further. This time I took a closer look at the output file: It was a HP Printer Job Language file, set to use PCL instead of PostScript. If the printer language had been PostScript, then yes, I could have stripped the PJL headers and converted the remaining PostScript to PDF.

(I gather that the output depends on which drivers are installed. Under Linux/UNIX, it seems that "Print To File" always produces a PostScript file.)

ME
Wednesday, December 10, 2003

"The point is that the llinks withing the mht are for some reason hard-coded to the path to the document. So when that path changes things start not to work. Been there, done that! "

Still not quite sure what the issue is, though I think there is one. I'm guessing they are entirely related to *external* links. (e.g. if you have a page which has an a href to another page, but that other page is not stored in the archive)

IE seems to turn relative links into fixed links when you save, so if you have a file at c:\a\b.htm, relatively linking to x.htm, and save it, IE will hard code it to c:\a\x.htm. Then if you move b.htm and x.htm, the link will break. Well, this is almost but not quite the same as if you moved the files outside of an MHT.
Similarly, even without that, the links are all relative to the content-locaiton specified within the MHT--it's simulating an external page. So even if the link were relative, it's relative to the *original* (or specified) URL.

It's the old metadata-stored-with-data problem. I don't see any good way around it, you could re-write the MHT to have some sort of bogus HREF (perhaps blank would make it work the way you expect)? I think it's more of an implementation issue than a spec issue.

mb
Wednesday, December 10, 2003

This is the whole problem.

Now if you save as web page complete IE will move the folder with the images to the same location that the html file is moved to, so the problem doesn't arise.

Stephen Jones
Thursday, December 11, 2003

*  Recent Topics

*  Fog Creek Home