IE6 random crash, error in mshtml.dll
As alluded to in the previous post, over the last week I’ve had a nightmare of an IE6 random crash bug to deal with at work, with IE6 reporting an error in mshtml.dll. For those of you that care about IE6 bugs, please read on. For those of you that may have further insight, please do comment. For those of you that aren’t web developers, you should skip this post. It’s all nerdy from here on. If you’d like to live the bug with me, continue reading. Alternatively you can skip to the technical details and video of the bug itself right now.
The nightmare begins
Sticking with me? Excellent. Pull up a chair and help yourself to a coffee. *offers some coffee*
I had recently implemented a new CSS driven drop-down menu for a client, and in order to get the menu working on IE6 (which does not support the
:hover pseudo-class on non-anchor elements) I had used the tried and tested
csshover.htc file to get IE6 to behave like any modern browser in that regard. I tested the menu, as normal, in our browser suite and found that it worked well everywhere we tested. We put the change onto our staging server and got client approval to make the new functionality live. We launched the site live to the public.
Some time later we got a support request alerting us that someone using IE6 could not use the website in question because it was crashing their browser. A crashing browser is the worst thing you can have on a client site - it causes big problems, it’s not like having a display glitch, a crash means potential for lost revenue and annoyed customers. You don’t get a worse problem short of a disaster server-side. “A crash is impossible,” I thought, “we’ve tested it in IE6 already and it doesn’t crash”. So I loaded up Virtual PC, fired up IE6, and set about navigating the live site. All was well. I clicked around for a few minutes. The browser crashed. S***.
First things first when bug hunting - find out what triggers the bug. From there you can figure out what causes it, and then fix it with a work-around. Oh but it wasn’t going to be that easy. There was no set way to trigger the crash. It wasn’t a certain page, it wasn’t a certain image, it wasn’t a certain behaviour, menu item, or sequence of clicks. It was simply a case of “navigate around the site long enough, and eventually IE6 will crash”. It would crash on a page that had worked ten times immediately before. It crashed if you clicked really fast between pages, it crashed if you patiently waited for each page to load. It crashed on page load, it crashed on page exit. It even crashed when following a link to a plain-jane .jpg image - no XHTML/JS/CSS at all! But it would not crash predictably.
Being utterly unable to find a reproducible trigger, that left me with no option but to strip back the latest modifications piece by piece until the crash behaviour stopped. At that point I’d know what was triggering the crash. So I started stripping things out. As luck had it removing the .htc file was one of the first things I tried as it was the most obvious change. It stopped the crash, so as a quick fix we removed the file from the live site. The live site now had a fancy drop-down menu in all modern browsers, and defaulted to the old behaviour of loading a category page listing sub-categories on IE6. Thank god for graceful degradation, I thought. Crisis averted, live site ‘fixed’ in under four hours, now to restore the enhanced menu in IE6…
Here’s where it gets a bit crazy
With the client’s live site patched up to avoid crashes, and the trigger for the bug apparently discovered and eliminated, I set about using an alternative method to regain the drop-down menu for IE6 on our development server. I used the Son of Suckerfish method and got it working wonderfully. I tested the site for a while … and IE6 crashed. S***.
By this point I was well and truly sick of clicking around, and it was taking a long time to get the browser to crash, so I downloaded a lovely little add-on for IE6 called iMacro (there’s a version for Firefox), recorded a series of clicks as I navigated the site, and saved the Macro. From this point on I stopped manually clicking around and just set the macro to run sixty times through. The macro consisted of navigating through 18 pages. 60x18 = 1080 clicks. Crashes were a lot less common without the htc file, but they still happened. It took anywhere from 7 clicks up to 702 clicks before a crash occurred.
No luck, with JS disabled, it still crashed.
Must be the new CSS I put in for the menu - not too surprising given IE6’s failings with CSS in general. I removed the new CSS. It still crashed. I reverted to the old CSS files entirely. It still crashed. Incredulous, I loaded up a back-up version of the site prior to our latest modifications and ran that through the test macro. The old site, sans new menu and other changes, crashed. Oh s***.
It is fair to say that the old site was even harder to crash, but crash it assuredly did. It’s also fair to say that with the traffic that site gets there is no way that a crash in IE6 could have gone undetected and unreported for the almost two years that site has been live. It couldn’t have for even a week, the numbers going through it were simply too great - someone would have seen it and reported it. So something somewhere had changed either with the server or with a Windows Update client-side. We suspect the Windows Update option considering only the day before all our PCs at work had been hit with a batch of updates, and our server has had no settings change in the time of the site changes.
Having ruled out any of the new features, functions, or styles as being the cause it left me with stripping back the site architecture until crashes stopped. At that point I’d know where the problem was. JS had already been removed, so I removed the CSS. 1080 clicks - no crash. Ah HA!
I disabled CSS block by block, until I ended up with a screen.css (the main CSS file) consisting of absolutely nothing but commented out CSS and an
@import at the top. It still crashed.
Must be something in the imported file. I looked at the imported file. It was essentially blank, a hold-over from a template, it had no actual CSS in it. I emptied the file anyway, and saved it. screen.css now was importing a totally blank file, and otherwise contained no CSS at all. It still crashed.
My jaw hit the floor.
It had to be HTML. It can’t be JS because there isn’t any being loaded. It can’t be the CSS because there’s no CSS being applied. I started stripping HTML out in huge chunks. Still it crashed. Until I ended up with what follows in the test case:
Making IE6 crash (randomly) with three one-line CSS files and two identical 9 line HTML files. All validated.
There appear to be two (perhaps three, I’ll get to that) elements required to get IE6 to randomly crash simply by navigating around a website for long enough. Take a look at the test case where, if you click the two links (essentially loading a new version of exactly the same page), IE6 will crash with an error in
After much re-arranging and general fiddling it seems two things are required to trigger the crash:
- the print media stylesheet must precede the screen media stylesheet in the
- there must be an @import directive in the screen.css file (you can place it anywhere, it just has to be there. Doesn’t even need to load a real file)
Stopping the random IE6 crash
- Don’t use @import in the CSS files
- Move the print link to after the screen link
Option one worked but meant in my case duplicating huge swaths of CSS into another file (I’d been using the @import to load the normal ‘low resolution’ screen.css into the ‘high resolution’ screen_high.css - then over-riding rules where needed, thus keeping file sizes and load times down).
Option two was the least painful and it works all on it’s own too.
The mysterious third factor
Although I can reproduce the crash every single time on my VM’d IE6 at work - with the same VM and the same IE6 on my laptop, it does not crash. There are no differences between the Virtual Machines, and they both load the same website, the only difference is the hardware. My laptop is considerably faster than the work PC. So the “possible third factor” is either a slow PC being required to exhibit the crash, or a slow internet connection. The slow PC angle would seem to logically hold-up, other people have experienced the crash (as indicated by the initial support request from our client), and the PCs visitors are likely to be using on that site, when studying the demographics and audience, are likely to be old and slow.
Technical details and notes
The crash is known to happen in IE6 v 6.0.2900.2180.xpsp_sp2_gdr.070227-2254; Update Versions:; SP2;
It could not be replicated on an older IE6 running on a (slow) Win2000 machine with SP1.
In order to be certain it wasn’t a weird standards type problem I also stripped out the DOCTYPE, it still crashed. The missing DOCTYPE causes validation failure, but rest assured it still crashes with one in.