Tuesday, November 19, 2013

MySQL Cluster run-time environment: Part 2: How to configure

The first selection in selecting run-time environment is done by selecting the binary. The original NDB run-time environment was a single-threaded environment where we executed everything except IO threads in a single thread. This environment actually still exists and is selected by running with the binary ndbd. The new multithreaded environment is selected by running with the binary ndbmtd. The ndbmtd can clearly scale far beyond ndbd, but ndbd can still have advantages in environments which has a low load and needs optimisation on latency. Since ndbd does receive on socket, execution of signals and send of signals in the same thread it can have shorter latency at the cost of scalability. Most of the rest of this description discusses ndbmtd configuration. The ThreadConfig variable doesn't apply to running with ndbd since ndbd uses hard-coded thread configuation consisting of 1 main thread.

When configuring the MySQL Cluster run-time environment the most important variable is ThreadConfig. One can alternatively also use MaxNoOfExecutionThreads and LockExecuteThreadToCPU as we will show below. There is also LockMaintThreadsToCPU that can be used to bind IO threads to a specific CPU, it is now recommended to do this using ThreadConfig for the locking of io threads instead since it has more options on how to do this. RealtimeScheduler and SchedulerSpinTimer can be used to set real-time scheduling and spin time on threads. If ThreadConfig is also used then these variables are only used as default settings that can be overridden by the ThreadConfig setting. SchedulerExecutionTimer is not applicable when running with ndbmtd. It is only applicable when running with ndbd.

LockPagesInMemory is part of the MySQL Cluster run-time environment and can have a heavy impact on response times. Setting this hasn't changed and its setting is independent on the rest of the settings discussed in this blog.

NoOfFragmentLogParts is an important variable that should be set equal to number of ldm threads if number of ldm threads is larger than 4, it cannot be set to anything smaller than 4. One can set NoOfFragmentLogParts larger than number of ldm threads, there is no advantage of this that I can think of. It can be set to the following values: 4, 6, 8, 12, 16, 24, 32.

The types of threads we have are ldm, tc, main, rep, send, recv, io and wd threads. These were discussed in part 1 of this blog. main and rep have a fixed amount of threads equal to one. The wd threads are always three with specific roles, however from a configuration point of view the number of wd threads is one. ldm, tc, send and recv threads have a configurable amount of threads. There can be 1,2,4,6,8,12,16,24 or 32 ldm threads. There can be anywhere between 1 to 32 tc threads and anywhere between 1 and 16 send threads and likewise between 1 and 16 recv threads. The number of ldm, tc, send, recv, main, rep and wd threads are fixed once the configuration is given, when the ndbmtd process is started these threads are started and not stopped until the process is stopped. The io threads are different, the io threads are handled by a dynamic pool of threads.

The io thread pool is controlled by two variables, the InitialNoOfOpenFiles which gives the initial amount of io threads (one thread handles one open file). The second variable is MaxNoOfOpenFiles that specifies the maximum of io threads that will be created. DiskIoThreadPool can be used to set the number of parallel accesses to disk data files, it doesn't affect the number of io threads. It only affects that more than one thread at a time can access a disk data file.

The preferred manner of configuring most of this environment is by using the ThreadConfig variable. This variable accepts a string and the string is parsed. I will start by giving an example.

ThreadConfig="ldm={count=4,cpubind=0-3,spintime=50},tc={count=1,cpubind=4,realtime=1},main={count=1},rep={count=1},send={count=1,cpubind=5},recv={count=1,cpubind=6},io={cpubind=7},wd={cpubind=7}"

In this example we see that for each thread type we can specify the following information, count, this is the number of threads of this type, count must be 1 for main and rep and also for io and wd. For io and wd threads the count is actually ignored since IO threads are handled with a dynamic pool and wd threads are 3 specific threads (WatchDog, SocketServer and SocketClient thread).

cpubind can be given for all thread types, cpubind is a comma separated list of CPUs where one can also use ‘-‘ to indicate an array of CPUs. So e.g. 0-3 is equivalent to 0,1,2-3. Threads are assigned to CPUs in order, so for the ldm's in the above config the first ldm thread will be bound to CPU 0, the second ldm thread to CPU 1, the next to CPU 2 and the last ldm thread to CPU 3. CPU 3 here is the CPU ordering imposed by the OS. On Linux one can get details on this by running 'cat /proc/cpuinfo', this will provide exact information about the CPU Socket the CPU belongs to, the CPU core the CPU belongs and which of the hyperthreads it runs, it will also list a bunch of other things. One must have at least one CPU per thread assigned here, otherwise an error will occur, having more CPUs specified is ok, but only the first ones provided will actually be used. cpubind always means that the thread can only execute on the specified CPU.

Setting realtime=1 means that the thread will be using the real-time scheduler of the OS. To protect the OS from starvation we will yield in this case if we execute for too long with real-time priority and allow other threads to get some work done as well. As can be seen from the example we can use real-time setting on a subset of the threads. The global configuration variable RealtimeScheduler is still used and sets the default value for a thread, so if this is set to 1, then we need to set realtime=0 on a thread to ensure we don't get real-time scheduling on it. Use of the real-time scheduler is mainly intended to lower the variance of the latency. Using it isn't likely to improve performance except possibly in a highly loaded system where other processes are not requiring real-time scheduling. Given that the OS often expects real-time threads to execute for short time with high requirements on short response time, it is not recommended to mix this configuration option with spintime on the same thread.

Setting spintime=50 means that we will spend at least 50 microseconds of executing signals and waiting for more signals before setting the thread to sleep again. This parameter is mainly intended as well to improve our latency, by keeping the threads ready to go, we can react faster to a query. We cannot set the spintime higher than 500 microseconds. It will consume considerably more CPU resources setting spintime and provides a tiny bit of extra throughput.

In the latest version of MySQL Cluster 7.x we also introduced the ability to bind threads a set of CPUs. Each CPU can only belong to one CPU set, on Solaris a cpu set is exclusive also to usage of the data node. In Linux it only sets the whereabouts of the MySQL Cluster data node threads. Other threads can still be scheduled on the same CPUs.

As an example we could use the following config:
ThreadConfig="ldm={count=4,cpuset=0-3},tc={count=1,cpuset=4-7},main={count=1,cpuset=4-7},rep={count=1,cpuset=4-7},send={count=1,cpuset=4-7},recv={count=1,cpuset=4-7},io={cpuset=4-7},wd={cpuset=4-7}"

In this example we have the same amount of threads as in the previous example, but we divided the threads into two CPU sets. The first CPU set covers CPU 0-3 and is only used by the 4 LDM threads. The other CPU set covers CPU 4-7 and takes care of the remaining threads in the data node. In this manner we can arbitrarily configure scheduling domains. cpuset have the advantage that the OS gets some liberty to do dynamic scheduling, the default config with no CPU sets or CPU bindings means that we have one CPU set consisting of all CPUs. By using cpuset and cpubind we can mix usage of OS scheduler and our own fixed scheduler. Usually we want the threads with most load to have their own CPUs, for other threads that have more variable load it makes sense to use cpusets such that the OS can dynamically schedule the threads dependent on the current load of the system.

The thread types that have variable number of threads is currently ldm, tc, send and recv threads. In 7.3.3 we have extended the support of threads such that we can have 32 ldm threads, 32 TC threads, 16 send threads and 16 recv threads. TC, send and recv can have an arbitrary number and it would work also to change the ThreadConfig variable, at least after an initial node restart. However the number of LDM threads has more restrictions, first of all it cannot be changed other than after starting up a completely new cluster. Also we only allow for the following number of LDM threads, 1,2,4,6,8,12,16,24 and 32. A very important part of the config which is easy to forget is that if we increase the number of LDM threads beyond 4, then we also need to set NoOfFragmentLogParts to at least the same as the number of LDM threads. Normally one would simply set this variable to the same value as the number of LDM threads.

A final note on the usage of ThreadConfig is that one can even divide one thread type into several groups, so one could write ldm={cpubind=0},ldm={cpubind=1},ldm={cpubind=2},ldm={cpubind=3) as equivalent to ldm={count=4,cpubind=0-3}.

If one wants to avoid discovering the best run-time configuration one can also still use the MaxNoOfExecutionThreads variable together with LockExecuteThreadToCPU to get a simpler, but not as flexible to configure the run-time environment. The number of ldm, tc, send and recv threads is in this case dependent on MaxNoOfExecutionThreads through a lookup in a table which is found in the code (there is a program to generate this table).

Here is the table (it is found in mt_thr_config.cpp, both the table and the program generating it). Each entry in the table have five entries, the first one is the value of the MaxNoOfExecutionThreads, the next four ones are number of ldm threads, number of tc threads, number of send threads and finally number of recv threads.

  static const struct entry
  {
    Uint32 M;
    Uint32 lqh;
    Uint32 tc;
    Uint32 send;
    Uint32 recv;
  } table[] = {
    { 9, 4, 2, 0, 1 },
    { 10, 4, 2, 1, 1 },
    { 11, 4, 3, 1, 1 },
    { 12, 6, 2, 1, 1 },
    { 13, 6, 3, 1, 1 },
    { 14, 6, 3, 1, 2 },
    { 15, 6, 3, 2, 2 },
    { 16, 8, 3, 1, 2 },
    { 17, 8, 4, 1, 2 },
    { 18, 8, 4, 2, 2 },
    { 19, 8, 5, 2, 2 },
    { 20, 8, 5, 2, 3 },
    { 21, 8, 5, 3, 3 },
    { 22, 8, 6, 3, 3 },
    { 23, 8, 7, 3, 3 },
    { 24, 12, 5, 2, 3 },
    { 25, 12, 6, 2, 3 },
    { 26, 12, 6, 3, 3 },
    { 27, 12, 7, 3, 3 },
    { 28, 12, 7, 3, 4 },
    { 29, 12, 8, 3, 4 },
    { 30, 12, 8, 4, 4 },
    { 31, 12, 9, 4, 4 },
    { 32, 16, 8, 3, 3 },
    { 33, 16, 8, 3, 4 },
    { 34, 16, 8, 4, 4 },
    { 35, 16, 9, 4, 4 },
    { 36, 16, 10, 4, 4 },
    { 37, 16, 10, 4, 5 },
    { 38, 16, 11, 4, 5 },
    { 39, 16, 11, 5, 5 },
    { 40, 16, 12, 5, 5 },
    { 41, 16, 12, 5, 6 },
    { 42, 16, 13, 5, 6 },
    { 43, 16, 13, 6, 6 },
    { 44, 16, 14, 6, 6 },
    { 45, 16, 14, 6, 7 },
    { 46, 16, 15, 6, 7 },
    { 47, 16, 15, 7, 7 },
    { 48, 24, 12, 5, 5 },
    { 49, 24, 12, 5, 6 },
    { 50, 24, 13, 5, 6 },
    { 51, 24, 13, 6, 6 },
    { 52, 24, 14, 6, 6 },
    { 53, 24, 14, 6, 7 },
    { 54, 24, 15, 6, 7 },
    { 55, 24, 15, 7, 7 },
    { 56, 24, 16, 7, 7 },
    { 57, 24, 16, 7, 8 },
    { 58, 24, 17, 7, 8 },
    { 59, 24, 17, 8, 8 },
    { 60, 24, 18, 8, 8 },
    { 61, 24, 18, 8, 9 },
    { 62, 24, 19, 8, 9 },
    { 63, 24, 19, 9, 9 },
    { 64, 32, 16, 7, 7 },
    { 65, 32, 16, 7, 8 },
    { 66, 32, 17, 7, 8 },
    { 67, 32, 17, 8, 8 },
    { 68, 32, 18, 8, 8 },
    { 69, 32, 18, 8, 9 },
    { 70, 32, 19, 8, 9 },
    { 71, 32, 20, 8, 9 },
    { 72, 32, 20, 8, 10}

3 comments:

Cuong Doan said...

Thank you for this blog and Also thank you for being a good Christian.

Cuong Doan said...

Hello Mikael,

Thank you for this blog.

I am trying the example threadconfig you posted:
ThreadConfig="ldm={count=4,cpuset=0-3},tc={count=1,cpuset=4-7},main={count=1,cpuset=4-7},rep={count=1,cpuset=4-7},send={count=1,cpuset=4-7},recv={count=1,cpuset=4-7},io={cpuset=4-7},wd={cpuset=4-7}"

when I started ndb_mgmd the following errors are generated:
at line 128: IO threads explicitly bound, but IDX_BLD threads not. Binding IDX_BLD to 0,1,2,3,4,5,6,7.

I then started the ndb_mgmd with early working config.ini, but one mangement is started correctly and the second management node seems to start correctly, but it isn't. I do a ps -ef | grep ndb_mgmd; I don't see a proccess.

I hope you explain the errors means and how to resolve the issue I encounter. I am trying to erase all RPM nd install again.

Thank in advanced.

Best Regards,

Cuong Doan Nguyen





Mikael Ronstrom said...

The information on IDX_BLD is not an error, it is an informational
message. The problem with the non-starting ndb_mgmd is harder to
figure out without looking at the logs.