Author Topic: Blog: A closer look at the performance of Google Chrome  (Read 375 times)

Offline menotu

  • PCLinuxOS Tester
  • Super Villain
  • *******
  • Posts: 15316
  • ┌∩┐(◕_◕)┌∩┐
Blog: A closer look at the performance of Google Chrome
« on: February 27, 2013, 05:15:27 AM »
Posted by Alex Hastings  on 02/25/2013 - Aptiverse

One of our upcoming projects aims to deliver a rich, real-time experience directly in the browser. In the course of this work, our engineers bumped into a series of elusive performance issues that seemed to affect only the beta testers who were using Google Chrome.

At first, we suspected that this is our fault. But when we finally looked under the hood of seemingly the fastest browser on the market, we discovered a series of design decisions that may elude conventional benchmarks, but undoubtedly affect the performance of many real-world sites all across the web.

Chrome history and caching behavior is broken

In a majority of web browsers, the size of the browser history and document cache is capped in one way or another: for example, if you have not visited facebook.com for a couple of weeks, any record of this will eventually disappear down the memory hole.

This is not the case for Chrome: the browser keeps all the cached information indefinitely; perhaps this is driven by some hypothetical assumptions about browsing performance, and perhaps it simply is driven by the desire to collect more information to provide you with more relevant ads. Whatever the reason, the outcome is simple: over time, cache lookups get progressively more expensive; some of this is unavoidable, and some may be made worse by a faulty hash map implementation in infinite_cache.cc.

To quantify the cache-driven slowdown in Chrome, we asked 20 volunteers to use their favorite browsers while simultaneously running a special piece of instrumentation that measured the latency between an attempt to initiate navigation and the initiation of network traffic.

For Chrome, the initial latency proved to be minimal, but rapidly degraded over the course of 30 days, just as the size of cache increased without upper bound:



In comparison, the same measurement for Firefox showed slightly worse initial performance, but also a much less pronounced decline, levelling off in about two days:



Over time, every new resource loaded within Chrome will degreade the browser's performance. Because the cache is sharded by domain name, this effect is particularly pronounced for any web applications that make use of the history.pushState() API. The use of this particular API is precisely what caused problems for us.

WebKit layout engine suffers from recursion-related bottlenecks

According to one of the popular schools of software engineering, writing short and self-documenting methods leads to more readable and safer code. This philosophy is embraced by the developers working on WebKit: in fact, the code responsible for rendering a typical web page averages just 2.1 effective C++ statements per function in WebKit, compared to 6.3 for Firefox - and an estimated count of 7.1 for Internet Explorer.

Unfortunately, this coding strategy appears to have a distinct disadvantage in a browser: the extensive use of dynamic class inheritance prevents the compiler from efficiently inlining short functions as necessary, and imposes a significant performance impact due to nearly constant vtable lookups and the inherent overhead of pushing parameters to the stack and passing control to another location in the codebase.

To estimate the impact of this problem, we measured a cost-weighed ratio of vtable lookups and parameter pushes to the remaining code, as sampled when visiting some of the most popular destinations on the Internet:



For sites such as Facebook or Amazon, Chrome experiences an overhead peaking at 20% of the overall page rendering time. Interestingly, the overhead is much lower on Google's properties; we believe this is because to targeted browser optimizations that favor certain document layouts, such as deeply-nested DOM with relatively little branching.

Process isolation makes cross-page communications too slow

Chrome uses process isolation to ensure that a programming error on a particular website generally would not impact the stability of other web applications loaded in the browser. The mechanism is based on a relatively dated concept, superseded by compile-time code validation techniques such as Google's own Native Client; nevertheless, Google positions Native Client as a competitor to existing standards, rather than leveraging it to improve the stability and security of WebKit and V8.

An unintended consequence of this design is that all attempts to synchronize the state between two processes are inevitably very expensive. This is because the synchronization needs to occur over a low-throughput, queue-based IPC subsystem, accompanied by resource-intensive and unpredictably timed context switches mediated by the operating system. To understant the scale of this effect, we looked at the latency of synchronous, cross-document DOM writes and window.postMessage() roundtrips. The measurements were performed during at constant intervals during a normal browsing session. For Chrome, we observed wildly divergent and unpredictable timings:


Closing words

Our findings paint an interesting picture: Google Chrome performs admirably in a range of commonly-tested scenarios, but at the same time, suffers from a significant number of fundamental design issues that undermine its performance for the applications that are starting to emerge today.

Some of these issues - such as the "infinite history" or the antiquated style of process isolation - may be driven by Google's business needs, rather than the well-being of the Internet as a whole. Until this situation changes, our only recourse is to test a lot - and always have a backup plan.

Full blog
« Last Edit: February 27, 2013, 05:40:26 AM by menotu »
PCLinuxOS 32bit KDE 4.10.1; kernel-3.4.11-pclos1.bfs & 64bit 3.2.18bfs; NVidia GeForce 8400GS 1GB 310.19 driver

Sony Vaio SVE1513A4ESI Laptop, Intel Core i5, 2.6GHz, 6GB RAM, 750GB, 15.6" Intel HD Graphics 4000

Offline menotu

  • PCLinuxOS Tester
  • Super Villain
  • *******
  • Posts: 15316
  • ┌∩┐(◕_◕)┌∩┐
Re: Blog: A closer look at the performance of Google Chrome
« Reply #1 on: February 27, 2013, 05:46:04 AM »
From Aaron Seigo - Wednesday, 27 Feb 2013

process separation

When Google Chrome first came out sporting its process separation feature where each tab is in its own process, it was broadly hailed as the best thing ever. The idea was to increase stability and security.

This was during a time when Plasma Desktop was still facing a number of implementation hurdles that impacted stability. So a number of well-meaning people decided that I should be informed about this revolutionary new idea in Chrome and every component in Plasma Desktop should be put into its own process.

This was applying a solution from one problem space to another that just didn't map at all. For technical reasons it was not feasible and I offered alternatives. Still, there was a good year or so when pretty much every single day I would get at least one message from someone suggesting we redesign Plasma to be separated across different processes. Some were kinder than others, but after a while even the nicest of people sounded like fingernails on a chalkboard.

Since then we've moved on to the alternative approaches we identified well before Chrome was even available: interpreted rather than compiled code. Today we're walking with both feet into the brave new world of QML. As of Qt5 we have QML2 which brings with it scene graph based rendering that can be done in hardware where available and a runtime that makes writing beautiful interfaces that are also stable very easy. It has been a long road to this point and has involved countless dozens of bright people working hard.

In the meantime, how is that process separation in Chrome working out? Here's the conclusion of one analysis based on actually measuring things (from the Aptiverse post - also see above)

"The [process separation] mechanism is based on a relatively dated concept, superseded by compile-time code validation techniques such as Google's own Native Client [..]

An unintended consequence of this design is that all attempts to synchronize the state between two processes are inevitably very expensive. [..] It seems that the apparent improvements in the stability of Chrome come at a very significant cost; and that unless the architecture is revised substantially, it may hinder the development of tightly-integrated and highly responsive web apps."

I don't hold quite so dim a view of process separation used in the right places as one might come away with after reading that article, but it is a cautionary tale about believing in silver bullets. Different problems are different and very few design concepts carry only benefit with no cost.


http://aseigo.blogspot.co.uk/2013/02/process-separation.html
PCLinuxOS 32bit KDE 4.10.1; kernel-3.4.11-pclos1.bfs & 64bit 3.2.18bfs; NVidia GeForce 8400GS 1GB 310.19 driver

Sony Vaio SVE1513A4ESI Laptop, Intel Core i5, 2.6GHz, 6GB RAM, 750GB, 15.6" Intel HD Graphics 4000