PostgreSQL La base de donnees la plus sophistiquee au monde.

PostgreSQL Use Case : "Le bon coin" website

Paris, le march 6, 2012,

Christophe Legendre, technical director, talks about using PostgreSQL for the site LeBonCoin.fr.

leboncoin.fr

For those unfamiliar with LeBonCoin.fr, according to Alexa Internet it is the most visited site for personal and professional classifieds in France. (see http://www.alexa.com/topsites/countries/FR for the complete list)

History

The site started in April 2006 and owes its original format to a Swedish group who started the project with blocket.se.

Because of its Swedish orgin, leboncoin.fr ran on MySQL but scaling and performance issues caused them to look into migrating to PostgreSQL.

Christophe Legendre came into the scene in 2008 when the version of PostgreSQL used in production was 8.1, hosted on a small HP server (G5) that had just eight discs for about 800,000 ads.

Since then, the site's growth has been maintained by upgrades to its hardware as well as to PostgreSQL. In just a four year span, everything has been multiplied by over a factor of 10! Not just in hardware but in ads. Today, the site hosts no fewer than 17.6 million classified ads.

The site is currently run on an HP Proliant DL980 with an HP 3PAR V800 bay. Just one machine, but what a machine! Judge for yourself:

  • DL980
    • 160 cores
    • 1 Terabyte of RAM
  • 3PAR V800
    • 192 discs (32 SSDs and 160 fiber-channels)
    • 400 Gigabytes of cache

There is an automatic tiering mechanism that swaps data to and from the faster parts of the bay (the SSDs) based on usage.

This bay regularly handles 80,000 I/O per second. That's pretty impressive but this machine is capable of 200,000 I/O per second. That's some comfortable margin right there.

The main database isn't even all that big, weighing in at 2.8 TB. That's certainly a respectable size, but a far cry from the record setters the PostgreSQL community has seen in the last few years.

However, the database is under some heavy constraints. One characteristic that seems surprising at first: the ratio of writes to reads on the database is 70/30. On would think a site for classified ads would have the opposite ratio. The reason for it is leboncoin.fr makes massive use of web caches (varnish) in order to read as seldom as possible from the database.

The writes are due to about 500,000 new ads published every day, and the several thousand others that get modified.

As for the infrastructure, it goes without saying that it's 100% Open Source: CentOS, Apache, Varnish, PostgreSQL… and Slony!

Why Slony?

Slony is a well known replication tool for PostgreSQL. It allows you to create Master/Slave PostgreSQL clusters. More precisely, it enables asynchronous and asymmetrical replication. Slony has had this capability for a long time, well before replication was integrated into PostgreSQL 9.0. One of its characteristics is that it can replicate between PostgreSQL servers of different versions, something that cannot be said about the integrated replication system.

That's the reason it was chosen at leboncoin.fr, it allows the site to upgrade PostgreSQL versions without ever stopping service.

To upgrade the master, just promote the slave to be the new master, take the old master offline and upgrade it, then repromote it to master once it has received all the data from the temporary master. Then do the same thing for the slave and its slaves.

That's another particularity of Slony: it allows so-called “cascading” replication, where a slave can have its own slaves. This style of cascading replication should be available in PostgreSQL 9.2.

Of course, with the advancements of PostgreSQL, Christophe Legendre isn't ruling out getting rid of Slony one day, but it isn't a priority for the moment. PostgreSQL 9.2 would have to have cascading replication, and then testing to ensure it's all viable. It most certainly will be, it's just a matter of time.

And so, thanks to Slony, leboncoin.fr uses the very latest minor version of PostgreSQL 8.4.

The site is redundant in two different datacenters, the first Slony slave being the failover if ever the master had the slightest hiccup. In fact, the entire infrastructure is duplicated at two remote sites.

These two sites are connected by two 10 GB fibers. The Internet transit is also handled by two 10 GB fibers. That's more than enough to handle the 3 GB of bandwidth currently used.

PostgreSQL Configuration

The PostgreSQL configuration on the master is also jaw-dropping, if you've ever modified postgresql.conf. Here are some highlights:

  • max_connections = 600
  • shared_buffers = 130 GB
  • effective_cache_size = 750 GB
  • work_mem = 24 MB
  • random_page_cost = 2
  • about 160 WAL files in rotation

To be sure, few PostgreSQL users can boast 750 GB of effective cache or 140 GB of shared buffers!

When asked about the ease of configuring PostgreSQL, Christophe Legendre doesn't mince words: “I challenge anyone to do easier than postgresql.conf!”.

Any problems with PostgreSQL?

On this question, Christophe Legendre was very clear. Yes, there was, once, a power problem at the main site. When the power came back on, PostgreSQL went into crash recovery mode and managed to put the database into a stable state with no apparent loss of data.

Christophe Legendre swears that “after that kind of event, one gains unlimited confidence in PostgreSQL!”.

He sees two major inconveniences with using PostgreSQL however.

Index Creation

Although PostgreSQL can create indexes without locking (see CREATE INDEX […] CONCURRENTLY […]), it still takes too long! Fortunately, it's not something that is done every day.

Partitioning

The biggest table has about 1.7 billion rows.

Obviously, partitioning was considered, but while reading the PostgreSQL documentation, Christophe Legendre made the same remark that many others do: it's way too complicated to set up, and maintenance is high. For example, PostgreSQL doesn't automatically create new child tables.

In the end, this table wasn't partitioned, but that doesn't pose a problem since it represents 180 GB of data and 350 GB of index. The platform can easily cope.

This is also a work in progress for the PostgreSQL developer community: we know partitioning needs to be reworked. Today, discussion is ongoing about the syntax.

Conclusion

Christophe Legendre talked more about the ease of use of PostgreSQL and the fact that he was able to become a skilled user all on his own. It's an undeniable advantage in his eyes: it's a simple motor with only one configuration file, and detailed documentation, translated into his native French. He uses the system views and tables (pg_stat*) to easily obtain all the statistics he needs on a daily basis.

In addition to the documentation, he reads the C source code, which allows him to know exactly what's going on when he wonders about some things in PostgreSQL.

There is no doubt about it, PostgreSQL is a crucial piece of the leboncoin.fr infrastructure, and time has show that it was an excellent choice.

The website's popularity is also proof of that, with 200 million hits and 4 million unique visitors every day. Few infrastructures can boast that kind of traffic.

 
en/temoignages/le_bon_coin.txt · Dernière modification : 2012/03/07 20:31 de vik