Posted by Alex Hastings on 02/25/2013 - Aptiverse
One of our upcoming projects aims to deliver a rich, real-time experience directly in the browser. In the course of this work, our engineers bumped into a series of elusive performance issues that seemed to affect only the beta testers who were using Google Chrome.
At first, we suspected that this is our fault. But when we finally looked under the hood of seemingly the fastest browser on the market, we discovered a series of design decisions that may elude conventional benchmarks, but undoubtedly affect the performance of many real-world sites all across the web.Chrome history and caching behavior is broken
In a majority of web browsers, the size of the browser history and document cache is capped in one way or another: for example, if you have not visited facebook.com for a couple of weeks, any record of this will eventually disappear down the memory hole.
This is not the case for Chrome: the browser keeps all the cached information indefinitely; perhaps this is driven by some hypothetical assumptions about browsing performance, and perhaps it simply is driven by the desire to collect more information to provide you with more relevant ads. Whatever the reason, the outcome is simple: over time, cache lookups get progressively more expensive; some of this is unavoidable, and some may be made worse by a faulty hash map implementation in infinite_cache.cc.
To quantify the cache-driven slowdown in Chrome, we asked 20 volunteers to use their favorite browsers while simultaneously running a special piece of instrumentation that measured the latency between an attempt to initiate navigation and the initiation of network traffic.
For Chrome, the initial latency proved to be minimal, but rapidly degraded over the course of 30 days, just as the size of cache increased without upper bound: In comparison, the same measurement for Firefox showed slightly worse initial performance, but also a much less pronounced decline, levelling off in about two days:
Over time, every new resource loaded within Chrome will degreade the browser's performance. Because the cache is sharded by domain name, this effect is particularly pronounced for any web applications that make use of the history.pushState() API. The use of this particular API is precisely what caused problems for us.WebKit layout engine suffers from recursion-related bottlenecks
According to one of the popular schools of software engineering, writing short and self-documenting methods leads to more readable and safer code. This philosophy is embraced by the developers working on WebKit: in fact, the code responsible for rendering a typical web page averages just 2.1 effective C++ statements per function in WebKit, compared to 6.3 for Firefox - and an estimated count of 7.1 for Internet Explorer.Unfortunately, this coding strategy appears to have a distinct disadvantage in a browser: the extensive use of dynamic class inheritance prevents the compiler from efficiently inlining short functions as necessary, and imposes a significant performance impact due to nearly constant vtable lookups and the inherent overhead of pushing parameters to the stack and passing control to another location in the codebase.
To estimate the impact of this problem, we measured a cost-weighed ratio of vtable lookups and parameter pushes to the remaining code, as sampled when visiting some of the most popular destinations on the Internet:
For sites such as Facebook or Amazon, Chrome experiences an overhead peaking at 20% of the overall page rendering time. Interestingly, the overhead is much lower on Google's properties; we believe this is because to targeted browser optimizations that favor certain document layouts, such as deeply-nested DOM with relatively little branching.Process isolation makes cross-page communications too slow
Chrome uses process isolation to ensure that a programming error on a particular website generally would not impact the stability of other web applications loaded in the browser. The mechanism is based on a relatively dated concept, superseded by compile-time code validation techniques such as Google's own Native Client; nevertheless, Google positions Native Client as a competitor to existing standards, rather than leveraging it to improve the stability and security of WebKit and V8.
An unintended consequence of this design is that all attempts to synchronize the state between two processes are inevitably very expensive. This is because the synchronization needs to occur over a low-throughput, queue-based IPC subsystem, accompanied by resource-intensive and unpredictably timed context switches mediated by the operating system. To understant the scale of this effect, we looked at the latency of synchronous, cross-document DOM writes and window.postMessage() roundtrips. The measurements were performed during at constant intervals during a normal browsing session. For Chrome, we observed wildly divergent and unpredictable timings: Closing words
Our findings paint an interesting picture: Google Chrome performs admirably in a range of commonly-tested scenarios, but at the same time, suffers from a significant number of fundamental design issues that undermine its performance for the applications that are starting to emerge today.
Some of these issues - such as the "infinite history" or the antiquated style of process isolation - may be driven by Google's business needs, rather than the well-being of the Internet as a whole. Until this situation changes, our only recourse is to test a lot - and always have a backup plan. Full blog