PostgreSQL ở Skype

Thứ sáu - 04/04/2008 08:34

PostgreSQL at Skype

(Văn bản này là một phiên bản được sửa đổi của một “cuộc nói chuyện chớp nhoáng” mà tôi đã thực hiện tại Hội nghị thượng đỉnh thường niên của PostgreSQL)

(this text is a modified version of a "lightning talk" I gave at the Postgresql Anniversary Summit )

Theo: https://developer.skype.com/SkypeGarage/DbProjects/SkypePostgresqlWhitepaper

Lời người dịch: Một phần được dịch của bài viết này là để minh chứng cho việc “Skype đã và đang sử dụng PostgreSQL như một cơ sở dữ liệu chính cho hầu hết các nhu cầu nghiệp vụ” của hãng với hàng trục triệu người sử dụng trên toàn thế giới một trong những dịch vụ chat cả bằng văn bản, âm thanh và hình ảnh nổi tiếng bậc nhất trong cho tới nay. Điều này chỉ để nói lên một điều: Phần mềm tự do nguồn mở hiện nay không chỉ làm việc tốt trong các ứng dụng mang tính sống còn như trong quân đội Mỹ, ngân hàng Mỹ, thị trường chứng khoán Mỹ, thị trường chứng khoán Nhật, hàng không Đức, cảnh sát Pháp..., mà còn trong những ứng dụng, dịch vụ chất lượng cao hàng đầu thế giới, với số lượng người sử dụng vô tận mà trong nhiều trường hợp các phần mềm sở hữu độc quyền chưa hẳn đã có thể làm được.

Skype đã và đang sử dụng PostgreSQL như một cơ sở dữ liệu chính cho hầu hết các nhu cầu nghiệp vụ của chúng tôi ngay từ khi bắt đầu và chúng tôi đã khá thành công trong việc phát triển các cơ sở dữ liệu khi công ty tăng trưởng, cả về số lượng người sử dụng và độ phức tạp của các dịch vụ mà chúng tôi đưa ra.

Chúng tôi đã gia tăng các cơ sở dữ liệu của chúng tôi theo nhiều cách mà chúng có thể quen thuộc với nhiều người, những ai đã đi từ lúc khởi đầu bé nhỏ đến một công ty lớn.

Đầu tiên chúng tôi đã có một cơ sở dữ liệu, rồi sau đó đã bổ sung thêm một cơ sở dữ liệu khác nữa riêng biệt về chức năng, rồi cái thứ 3, rồi chia đôi cái thứ 2 thành 2 phần theo chức năng, rồi lại chia cả 2 phần đó một lần nữa, rồi đã nhân bản một vài dữ liệu thường được đọc nhất lên các phần kết quả, .v.v.

Skype has been using postgreSQL as the main DB for most of our business needs right f-rom the start and we have been quite successful in growing the DBs as the company has grown, both in number of users and in the complexity of services we offer.

We have grown our databases in ways that are probably familiar to many who have gone f-rom a tiny startup to a large company.

First we had one DB, then added another functionally separate one, then third, then split the 2nd second one to 2 parts by functionality, then partitioned both parts again, then replicated some of read-mostly data out of on of resulting partitions, etc.

https://developer.skype.com/SkypeGarage/DbProjects/SkypePostgresqlWhitepaper?...

SLIDE 1: DB growth, logical (vertical?) partitioning

SLIDE 1: cơ sở dữ liệu phát triển, việc phân chia logic (theo chiều thẳng đứng?)

Chúng tôi đã thuê một vài nhà tư vấn, những người đã đánh giá những thiết lập cài đặt cơ sở dữ liệu của chúng tôi, và không thấy bất kỳ lỗi lớn nào với nó, mà nó là vừa tốt và xấu.

Tốt đối với tôi, vì nó kiểm tra tính đúng đắn công việc của dba của tôi cho tới nay, còn xấu vì lý do rằng không còn có những thủ đoạn một cách dễ dàng để có được các lệnh của tốc độ thực thi lớn gia tăng trên nền tảng các phần cứng của chúng tôi (dual hoặc quad Opterons với SCSI RAID).

Sau việc đánh giá hiện trạng lúc đó, chúng tôi đã thảo luận những kịch bản phát triển trong tương lai với từng nhà tư vấn. Đó là chúng tôi đã thử đặt ra ngoài cách gia tăng khả năng của cơ sở dữ liệu của chúng tôi sao cho chúng tôi vẫn có thể hoạt động tốt khi chúng toio có 1 triệu người sử dụng.

Chúng tôi đã chọn 1 triệu người sử dụng như một cái đích cho những tính toán của chúng tôi chỉ để có một con số tròn trịa đẹp đẽ, chúng tôi hy vọng sẽ có nhiều hơn con số đó.

Công nghệ P2P của Skype làm cho điều này dễ dàng hơn cho các máy chủ hơn là những cái công nghệ máy chủ – máy trạm truyền thống, nhưng chúng tôi vẫn còn thấy nhu cầu cho ít nhất là sự tăng trưởng 100 lần, mà không một máy tính đơn lẻ nào với dù nhiều CPU và RAM và SAN (mạng lưu trữ cục bộ) tới mấy cũng không thể cung cấp nổi, ít nhất nếu không sử dụng PostgreSQL (VÀ chúng tôi đã không nhằm tới việc thay thế PostgreSQL, cũng không nằm trong số những lựa chọn đầu tiên).

May thay hầu hết các dữ liệu của chúng tôi có thể chia ra được theo tên người sử dụng, vì thế chúng tôi đã quyết định bỏ qua quá trình mua các máy chủ đơn lẻ lớn hơn và lớn hơn và chỉ thực hiện việc phân chia theo chiều ngang ngay lập tức.

We hired some consultants, who evaluated our db setups, and did not find any big faults with it, which was both good and bad.

Good for me, as it validated my dba work done so far, but bad for the reason that there were no more easy tricks to get orders of magnitude performance growth on our hardware platform (dual or quad Opterons with SCSI RAID).

After evaluating the then current situation, we discussed the future growth scenarios with each of the consultants. That is we tried to lay out a path how to grow our DB capacity so that we could still function well when we have 1 billion users.

We chose 1 billion users as a target for our calculations just to get a nice round number, we hope there will be more.

Skype's p2p technology makes this easier for servers than traditional client-server ways , but we still saw the need for at least 100X growth, which no single computer with however many CPU's and RAM and good SANs could provide, at least not using PostgreSQL. (And we were not looking at replacing postgreSQL, not among the first options anyway).

Fortunately most of our data can be partitionable by username, so we decided to skip the cycle of buying bigger and bigger single servers and just do the horisontal partitioning right away.

(Từ phần này trở đi không dịch, chỉ để tham khảo)

PL/Proxy

One really good decision we had made at the very start, was doing all DB access through functions, which had allowed me to do all kinds of performance tweaks behind the scenes without disturbing any frontend servers.

Doing it all through functions also showed us the way we could do the horisontal partitioning in a way that would be easy to manage and still scalable (or vv ;D )

We wrote another embedded PL language Pl/Proxy, so that we can replace the original PL/PgSQL or Pl/Python or any other PL function with a PL/Proxy function with same input parameters and with same return type, and the language handler will call the same function in the right partition.

So how the language handler knows, which partition to call ?

The info needed for choosing the right platform is the "source code" of the PL/Proxy function!

CRE-ATE FUNCTION pwd_check(text, text) RETURNS boolean

SE-LECT 1 F-ROM users WHE-RE name=$1 AND PWD=$2;

IF FOUND THEN RETURN true;

ELSE RETURN false;

$$ LANGUAGE plpgsql;

CRE-ATE FUNCTION pwd_check(text, text) RETURNS boolean

CLUSTER userdb_cluster;

PARTITION BY hashtext($1);

$$ LANGUAGE plproxy;

SLIDE2: - PL/PgSQL function --> PL/Proxy function

A setup for Pl/Proxy using two partitions with WAL-shipping hot standbys looks like this. This setup quarantees not losing data on single server chrash. To be fully redundant you would need two Pl/Proxy servers with failover.

https://developer.skype.com/SkypeGarage/DbProjects/SkypePostgresqlWhitepaper?...

SLIDE3: DB instance running PL/Proxy PL/Proxy servers can be configured to connect to all databases and thus form a uniform "DB-bus", which is scalable (if problems with throughput at proxy level, just add more proxies) and redundant (if one fails just connect to another).

If any partition server fails, only data on that partition is unavailable during partition server failover.

If there are performance problems with partition servers, just add more partitions.

https://developer.skype.com/SkypeGarage/DbProjects/SkypePostgresqlWhitepaper?...

SLIDE4: scaling PL/Proxy clusters.

WHE-RE ARE WE NOW

PL/Proxy is running on production servers with up to 16 partitions

able to pass 1000-2000 requests/sec with on PC-type Dual Opteron servers, depending on amount of stats and logging .

only basic functionality is implemented

reconfiguring (adding more partitions) requires a restart

WHE-RE WOULD WE LIKE TO BE

add different types of proxy functions

functions that run te same function on all partitions

functions that merge results preserving order

reconfiguring without restart - pl/proxy handler should just block during reconfigure and then re-run the blocked function

SLIDE5: - status of PL/Proxy

Skytools: PgQ and Londiste

PL/Proxy is our solution to make the OLTP part almost infinitely scalable, but just being able to do zillions of small transactions solves only part of the problem. You often need to get the data generated by the OLTP part copied or moved into other databases or external systems, be it an OLAP database, presentation layer of some web application, system for printing invoices or sending emails.

Initially we were quite successful at using Slony1 for that, but as the complexity and loads grew Slony1 started to cause us greater and greater pains. We thought about how to solve these issues and found out that we don't need the full power and complexity of Slony.

So we lifted some really cool ideas (and code) f-rom slony and developed queueing and replication toolkit.

these are covered in ../SkyTools

Enjoy!

Dịch tài liệu: Lê Trung Nghĩa

ltnghia@yahoo.com