Thursday, October 20, 2016

MySQL Cluster 7.5 is GA, best cluster release ever

I have been fairly quiet on my blog for some time. We've been very busy
developing new features for MySQL Cluster 7.5 and ensuring that the
quality is improved even further.

We're now very pleased to release a new version of MySQL Cluster.

MySQL Cluster 7.5 contains a number of new things that makes MySQL
Cluster even better.
1) You can declare a table as a READ_BACKUP table. This means that
the updating transactions will receive the commit acknowledge
a little bit later to ensure that we can always use any of the
replicas for reading. We will use the nearest replica for
committed reads, for locking reads we will still use the primary
replica to avoid deadlocks.

For applications that are mostly read-focused one can make it easier
to set this variable by setting the ndb-read-backup config variable
to 1 in the MySQL Server configuration. This means that all tables
created from this server will create tables with the READ_BACKUP
flag set to true.

2) You can declare a table as a FULLY_REPLICATED table. This means
that this table will get as many replica as there are nodes in the
cluster.

Fully replicated tables (global tables) provides a very important way
to scale your applications using MySQL Cluster. Many applications have
small tables that is mainly read, these tables can now fully replicated
and thus be read from any data node in the cluster.

Another potential use of fully replicated tables is to use MySQL Cluster
for read scalability. You can start out with a 2-node cluster where both
nodes are able to read the data. If you then need to scale to 4 nodes
you add 2 more nodes to the cluster, you create one more node group
for the 2 new nodes. Then you can reorganise the tables such that they
are fully replicated on all 4 nodes. You can continue in this manner
all the way until you have 48 nodes. All through this process any
MySQL Server can be used to both read and write the data in the cluster.

If the application is an SQL application it is a good idea to place
the MySQL Server is placed on the same computer as a data node.
So in this manner MySQL Cluster have a very easy manner of scaling up
your application for reads.

A third variant to use fully replicated tables is to place hot-spot rows
that are often read and seldom updated into fully replicated tables.

It is possible to set a config variable ndb-fully-replicated to 1 to
ensure that all tables are created as fully replicated. This should
obviously only be used for read scaling applications.

3) We have made it easier to vary the number of partitions in a table.
Normally we will have 2 partitions per LDM thread per node group. So
this means in a 4 node cluster with 2 node groups that have 4 LDMs per
node, there would be 16 partitions in a normal table.

Now it is possible to decide to have less partitions that are still
linked to the number of node groups in the cluster. So one can define
e.g. to have 1 partition per node group. This means that the command
ALTER TABLE REORGANIZE will add more partitions if node groups have
been added since the table last was altered. It will keep the number
to in this case 1 partition per node group. We support seven different
variants here.

The idea with this is to be able to decrease the number of partitions
for smaller tables while still using the automated partitioning
features of MySQL Cluster.

4) In earlier versions of MySQL Cluster we introduced the possibility to
use special send threads to do all the sending. In 7.5 the send threads
have been redesigned such that they now share the burden of sending with
all the other block threads in the NDB data nodes. The block threads will
assist when they are not very loaded. So the send thread assistance is
adaptive based on the CPU usage in the threads.

In order to develop this feature the NDB data nodes now have a very good
view on how much CPU usage each thread is using. This information is used
for internal algorithms, but it is also available as new NDBINFO tables.

5) Earlier versions of MySQL Cluster had a scaling issue with executing
extremely many scans in parallel. In MySQL Cluster 7.5 we have effectively
removed this limitation. Now it is possible to handle many millions of
scans per second for each table for each data node. So the main limitation
is the CPU available now for processing in general.

6) We have also continued working on improving our testing to ensure that
we always deliver better quality with new releases.

7) We have also worked on improving the scalability of the NDB API to
ensure that we deliver high performance even in situations with hundreds
or even thousands of concurrent threads working against the cluster.

8) Last but not least MySQL Cluster 7.5 is based on MySQL 5.7. So this means
that we get a lot of new optimizer features from the MySQL Server, we get
improved scalability of the MySQL Server parts and a lot more.

My personal favorite among those features is definitely the fully replicated
tables and read backup tables. This opens up a new category of application
optimisations in a scalable system.

No comments: