Ξ bigXi

March 30, 2006

Mystery solved

Filed under: Ajax,IE Memory Leak,JavaScript — bigxi @ 9:46 pm

QuirksMode discussed a mystery about IE memory leaks. Here is the code:

    window.onload = init;    function init()
    {
        createLinks();
        var x = document.getElementsByTagName('a');
        for (var i=0;i<x.length;i++)
        {
            x[i].onclick = function () {
                this.firstChild.nodeValue = ' Clicked! – ';
            }
        }
    }

At first look, this code should leak because there is a circular reference:

Any link has a reference to the anonymous onclick handler.
The anonymous onclick handler forms a closure which contains a reference to the variable x.
x is an array that references all links in the page.

The fact is, however, it doesn't leak. Go test for yourself. Write a createLinks function that appends 10,000 links to the page. Bounce your browser between the page and a blank page as many times as you want. The memory doesn't grow.

Among the various explanations, Laurens van den Oever mentioned something along the lines that, in fact, the variable x is not an array but a NodeList, which upon page unload would empty itself and therefore, there's no longer a circular reference. But still some people aren't convinced. So here's my proof that he is actually right.

This is my modified test:

<html>
<body>
<script language="JavaScript">
    window.onload = init;    function init()
    {
        createLinks();
        var x = document.getElementsByTagName('a');
        for (var i=0;i<x.length;i++)
        {
            x[i].onclick = function () {
                var msg = "";
                for (var i = 0; i < x.length; i++) {
                    msg += x[i].innerHTML + "\n";
                }
                alert("I have references to:\n" + msg);
            }
        }
    }     function createLinks() {
        for (var i = 0; i < 5; i++) {
            var alink = document.createElement('A');
            alink.href = "#"
            alink.innerHTML = "Link #" + i;
            document.body.appendChild(alink);
        }
    }

    function removeLinksExceptFirstOne() {
        var x = document.getElementsByTagName('a');
        var cntLink = x.length;
        for (var i = 1; i < cntLink; i++) {
            document.body.removeChild(x[1]);
        }
    }
</script>
    <p><button onclick="removeLinksExceptFirstOne();">
    Remove Links Except First One</button></p>
</body>
</html>

The the onclick handler will now report which elements it has references to. When you load the page and click on one of the links, it says it has references to all links in the page. Now click on the button to remove all links except the first one. Click on the link that's left. There's only one reference! So, indeed, when the page unloads, x will be empty. No more circular references.

In the next test I introduce a global array y (a true one) and use that to setup the onclick handlers instead of x:

<html>
<body>
<script language="JavaScript">
    var y = [];    window.onload = init;    function init() {
        createLinks();
        var x = document.getElementsByTagName('a');
        for (var i = 0; i < x.length; i++) {
            y[i] = x[i];
        }         for (var i=0;i<y.length;i++)
        {
            y[i].onclick = function () {
                var msg = "";
                for (var i = 0; i < y.length; i++) {
                    msg += y[i].innerHTML + "\n";
                }
                alert("I have references to:\n" + msg);
            }
        }
    }

    function createLinks() {
        for (var i = 0; i < 5; i++) {
            var alink = document.createElement('A');
            alink.href = "#"
            alink.innerHTML = "Link #" + i;
            document.body.appendChild(alink);
        }
    }

And sure enough, the onclick handlers still have references to all links after they are removed from the page.

Now change removeLinksExceptionFirstOne to this:

You'll see again that the onclick handlers no longer have references to any links.

It's important to understand that closures don't necessarily mean circular references. You can break circular references to avoid memory leak at any point of program execution. There's no need to wait till the page unloads. Often you'll find that the best place to break any circular references is at the point where the closure is formed.

Comments (2)

March 28, 2006

IE memory leak, revisited

Filed under: Ajax,IE Memory Leak,JavaScript — bigxi @ 9:43 pm

Prelude

By now it is well known that IE (from version 4 to version 6) leaks memory with DHTML (or Ajax, if you prefer). As pointed out in numerous articles (see list at the end), the major source of memory leak is circular references formed with both JavaScript objects and DOM nodes (host objects, or ActiveX objects). A natural question one would ask is: since the problem is so well known, will Microsoft fix it? With Ajax becoming more popular each day, can we expect IE 7 be leak free?

An answer to that question would depend on the answer to why IE leaks in the first place. Unfortunately, after searching hi and lo, near and far, I can only find this little snippet from this MSDN blog entry:

"This page used to say that IE tears down the div when the page is navigated away, but it turns out that that's not right. Though IE did briefly do that, the application compatibility lab discovered that there were actually web pages that broke when those semantics were implemented. (No, I don't know the details.) The IE team considers breaking existing web pages that used to work to be way, way worse than leaking a little memory here and there, so they've decided to take the hit and leak the memory in this case."

So, IE leaks because of compatibility issues. Web pages actually break when garbage is rightfully collected. I can't help but wonder what those web pages look like and who owns them. Since the leak mainly concerns circular references involving ActiveX objects, can we boldly assume that the offending pages have ActiveX controls embedded? Maybe some of them belong to Microsoft? If that's the case, it is indeed much better to take the "hit" and leak the memory here and there.

Looks like we have to live with IE memory leaks until the compatibility issues are gone.

Coping with IE memory leak

Many articles on the web refer to Joel Webber's DHTML Leaks Like a Sieve, which is since lost from the web. Joel Webber even wrote a nice little tool called Drip to detect memory leaks in IE, which was once slashdotted but also lost – for a while, until found again on OutOfHanwell.

The "official" dose of medicine from Microsoft is probably Justin Roger's Understanding and Solving Internet Explorer Leak Patternson MSDN, which is used by many as the blueprint of IE memory leak remedy. In his article, Justin described various patterns that would lead to IE leak and ways to fix the leaks. Interesting enough, he also described two other patterns besides the circular reference leak we are all familiar about. These are: the DOM insertion order leak and dynamic scripting leak, which are described in the "Cross-page Leaks" and "Pseudo-Leaks" sections of the article, respectively. I will also talk about these types of leaks later in this article.

To avoid leak caused by circular references, the suggestions offered are:

Do not form circular references
If you have to form circular references (especially with event handlers through closures, which are so easy and convenient), break them up with an onunload handler after the page unloads.

Along the lines of the second suggestion are some more complex and systematic schemes to register event handlers so that the event handlers are automatically unregistered upon page unload, therefore breaking any circular references that may be formed during event registration. Among them are:

EventCache by Mark Wubben
EventManager by Keith Gaughan

However, as I will discuss below, onunload handlers may be effective for cross-page leaks, but they generally can't handle same-page leaks.

What's a memory leak anyway?

In the C and C++ world, the answer is simple. Any memory that is no longer referenceable but yet not released is leaked. When you allocate memory, your application footprint grows. When you release memory, the footprint shrinks.

Not so clear-cut for JavaScript. With JavaScript, you no longer have direct control of memory usage. There's a garbage collector running in the background and it decides how and when to reclaim a piece of memory that's no longer used. The garbage collector does so by determining which objects are no longer accessible and therefore can be deallocated from memory. It can detect circular loops among "garbage" by an algorithm called "mark and sweep", i.e., if a group of objects hold references to each other but are nonetheless no longer referenced by anything from the active execution path of the program, they can be marked as garbage collectively and cleaned up.

With JavaScript, you can have variables and object references going out of scope and yet see the memory consumption going up. And that does not necessarily mean that there is a memory leak. It may simply be that the garbage collector hasn't got a chance to collect the garbage. Or, on the reverse, if the garbage collector is at its work when you create new objects, you may see the memory usage shrink.

Therefore, for JavaScript, there is a leak when the garbage collector is unable (vs. not cleaning up at this moment) to clean up some garbage. To determine that the garbage collector is unable to do cleanup, you have to repeat your test many many times and see the memory consumption growing without bounds where you'd expect to see a flat-line otherwise.

When you navigate the browser from one web page to another, none of the objects in the document of the previous page should be left over to the new page. Those objects used to construct the previous page should be cleaned up, sooner or later. If there are leftovers from the previous pages, and they accumulate and never go away, you have a cross-page memory leak. Likewise, if you are dynamically adding and removing objects but nonetheless stay on the same page, the garbage collector should be able to reclaim the resources used up by the removed elements (sooner or later). If the removed objects accumulate and never go away, you have a same-page leak.

Cross-page leaks are bad enough that you'll eventually have to close your browser and restart it to make it functional again. Same-page leaks are less harmful but may be important for some Ajax applications where the whole application is fitted in one page and you never navigate away from it unless exiting the application.

Since the page is never unloaded, onunload handlers do nothing to remedy same-page leaks. In fact, schemes like EventManager or EventCache guarantee that there is same-page leak.

Closures are your friends

There once was a saying: "closures are you friends". But since the IE bully appeared people are shying away from closures whenever they can (or can't). However, there's a little secret about our quiet friends that would make it easier to befriend them again. And that's what I'm going to tell you here.

Let's start with a simple test (test1):

<html>
   <body>
        <button onclick="startTest();">Start!</button>
        <script language="JavaScript">
            function startTest() {
                for (var i = 0; i < 5000; i++) {
                    var element = document.createElement("DIV");
                    element.innerHTML = "Div #" + i;
                    hookupEvent(element);
                    document.body.appendChild(element);
                }
            }

            function hookupEvent(element) {
                element.onclick = function() { alert('Clicked: ' + this.innerHTML); }
            }
        </script>
   </body>
</html>

Does this page leak? Definitely. The DOM node element has a reference to an anonymous JavaScript function (the onclick handler), which in turn has a reference back to element through the closure formed inside hookupEvent. There is a circular reference loop encompassing both JavaScript and DOM objects. Test it for yourself. Here's how: load the page, click "Start!", reload page, click "Start!", …, and watch the memory usage grow with the Windows Task Manager. Repeat until you are satisfied.

Now let's modify test1 a bit and call it test2:

<html>
   <body>
        <button onclick="startTest();">Start!</button>
        <script language="JavaScript">
            function startTest() {
                for (var i = 0; i < 5000; i++) {
                    var element = document.createElement("DIV");
                    element.innerHTML = "Div #" + i;
                    element.onclick = function() { alert('Clicked: ' + element.innerHTML); };
                    document.body.appendChild(element);
                }
            }
        </script>
   </body>
</html>

Is there a leak here? It definitely seem so. There's still a closure and the circular references still exist. Test it, please!

Did you see the memory consumption grow? No? Are you surprised?

Now click the "Start!" button and bring the DIVs back. Click on DIV #0. What does it say? Click on DIV #4999?! In fact, no matter which DIV you click, it'll always say "Clicked DIV#4999".

As it turns out, JavaScript closures are not TRUE closures – in the sense that they don't enclose the values at the moment they are formed. Instead, they only enclose a scope for their own existence. The values inside the scope can be updated even after the closure is formed. Comparing our test1 to test2, 5000 copies of the anonymous onclick handler were created in test1, while there's only one copy of it in test2. To write it out more explicitly, test2 is equivalent to:

<html>
   <body>
        <button onclick="startTest();">Start!</button>
        <script language="JavaScript">
            function startTest() {
                for (var i = 0; i < 5000; i++) {
                    var element = document.createElement("DIV");
                    element.innerHTML = "Div #" + i;
                    element.onclick = onclickHandler;
                    document.body.appendChild(element);
                }

                function onclickHandler() {
                    alert('Clicked: ' + element.innerHTML);
                }
            }
        </script>
   </body>
</html>

Strictly speaking, there's still a leak in test2. But instead of leaking 5000 elements, only the last element is leaked.

Take insight from the previous example, we can now easily make our test1 leak free:

            function hookupEvent(element) {
               element.onclick = function() { alert('Clicked: ' + this.innerHTML); }
               element = null;
           }

The only thing I did was to set element to null after the event handler is attached. And suddenly we are leak free! The closure is still there, but there's no circular reference anymore.

In general, we should understand that closures don't necessarily mean circular references. And in some instances breaking up circular references is actually easier to do at the point of closure formation than at page unload time.

What about insertion order leaks?

One leak pattern that is presented in Justin Rogers' MSDN article is the "insertion order leak", which as far as I know isn't discussed anywhere else.

Here's the test that leaks (test3):

                var hostElement = document.getElementById("hostElement");
                var parentDiv = document.createElement("<div onClick='foo()'>");
                var childDiv = document.createElement("<div onClick='foo()'>");
                parentDiv.appendChild(childDiv);
                hostElement.appendChild(parentDiv);

And the test that does not leak (test4):

                var hostElement = document.getElementById("hostElement");
                var parentDiv = document.createElement("<div onClick='foo()'>");
                var childDiv = document.createElement("<div onClick='foo()'>");
                hostElement.appendChild(parentDiv);
                parentDiv.appendChild(childDiv);

The only difference between test3 and test4 is the order of attachment to the DOM tree. If the dynamic elements are pre-assembled first, there is a leak; if they are attached directly to the DOM without pre-assembly, there's no leak.

However, I found the explanations in Justin's article somewhat hard to believe. So I modified the tests as follows:

test5:

                var parentDiv = document.createElement("<div>");
                parentDiv.onclick = function() { foo(); }
                var childDiv = document.createElement("<div>");
                childDiv.onclick = function() { foo(); }
                parentDiv.appendChild(childDiv);
                hostElement.appendChild(parentDiv);

test6:

                var parentDiv = document.createElement("<div>");
                parentDiv.onclick = function() { foo(); }
                var childDiv = document.createElement("<div>");
                childDiv.onclick = function() { foo(); }
                hostElement.appendChild(parentDiv);
                parentDiv.appendChild(childDiv);

And there's no leak either ways! Then I ran these two tests:

test7:

                var parentDiv = document.createElement("<div onClick='foo()'>");
                parentDiv.innerHTML = "parent";
                var childDiv = document.createElement("<div onClick='foo()'>");
                childDiv.innerHTML = "child";
                parentDiv.appendChild(childDiv);
                hostElement.appendChild(parentDiv);

test8:

                var parentDiv = document.createElement("<div onClick='foo()'>");
                parentDiv.innerHTML = "parent";
                var childDiv = document.createElement("<div onClick='foo()'>");
                childDiv.innerHTML = "child";
                hostElement.appendChild(parentDiv);
                parentDiv.appendChild(childDiv);

WARNING: don't ever try test7 or test8 5000 times, it may crash your machine!

So what's my take? I don't believe there's such a thing as insertion order leak – at least not as demonstrated by test3 and test4. The leak has something to do with the way IE parses and processes the string passed to document.createElement.

BTW, none of the leaks appeared in the above tests are cross-page. Memory usage returns to normal when you refresh the page.

Dynamic scripting leaks are real leaks

In his MSDN article, Justin Rogers demonstrated another type of leak with this test:

    <body>
        <button onclick="LeakMemory()">Memory Leaking Insert</button>
        <script id="hostElement">function foo() { }</script>
    </body>
</html>

Basically, when you click the "Memory Leaking Insert" button, the script element is re-written 5000 times. The memory footprint grows each time you click the button and never falls back, unless you leave the page. However, I don't quite agree with Justin's classification of this leak as a "pseudo-leak". I believe it's a real leak in the sense that scripts that are no longer needed and no longer reachable cannot be garbage collected. Somehow, I think this leak is related to the "insertion order leak" discussed above.

Resources

Comments (7)

Ξ bigXi

March 30, 2006

Mystery solved

March 28, 2006

IE memory leak, revisited

Pages

Archives

Categories