Monday, April 11, 2011

MySQL Cluster API, the NDB API

As mentioned in a previous blog the programming API on the client side is a very important part of the performance of MySQL Cluster. Every API that is used to access the Data Server in MySQL Cluster uses the NDB API. The NDB API is used in the NDB storage handler to make it possible to access data from MySQL APIs which is residing in MySQL Cluster.

The base of the good performance of the programming API is the ability to batch operations in various manners. In early MySQL Cluster history the MySQL Storage Engine API had very few interfaces that allowed for handling multiple records at a time. As we progressed, the Storage Engine API have added several APIs that can handle multiple records at a time. There is even some development work which has been presented at the UC 2010 where the Storage Engine API now can push entire queries down to the storage engine, even join queries. This has also been presented at a recent webinar with engineers.

The NDB API uses a model where one first defines the operation to issue towards the database. The calls to build an operation doesn't interact with the actual database. The actual message is sent to the data node only after the execute method have been called. The NDB API is designed to handle batching of operations in two levels. The first level is that it is possible to batch inside one thread. This means that one can open several transactions in parallel within the same thread and execute them in parallel with one execute call. In addition it is also possible to have several threads working in parallel and it is possible for every one of those threads to also be executing multiple transactions in parallel.

So the possibilities for parallelism using the NDB API is tremendous. Much of the cost of accessing a database is paid in the networking, so by using the parallel transactions inside a thread (called Asynchronous NDB API) and by using the multithreaded capabilities of the NDB API, it is possible to decrease the networking cost greatly by making TCP/IP packets larger. Mostly the cost of sending a TCP/IP packet is Fixed_cost + #Bytes * Byte_cost. The fixed cost was in the past about the same cost as sending 60 bytes. This extra cost of small messages have to be paid both in the server part and in the client part. Thus it pays off very well to send larger messages. When the message sizes goes towards 1 kByte, the extra cost is down in the range of 6-7% extra cost compared to infinite-sized messages whereas a 200 byte message have an additional 30% added cost.

An additional benefit of batching is that there will be less context switches since handling of several messages in parallel can be handled without context switches.

You can learn more about performance optimization of your own applications by reading this whitepaper.

No comments: