Mikael Ronstrom: February 2006

Thursday, February 16, 2006

How to define a table that uses disk data in MySQL Cluster

At first before creating a table that uses non-indexed fields on
disk in MySQL Cluster, it is necessary to create a LOGFILE
GROUP and a TABLESPACE.

CREATE LOGFILE GROUP lg1
ADD UNDOFILE 'undofile.dat'
INITIAL_SIZE 16M
UNDO_BUFFER_SIZE = 1M
ENGINE = NDB;

This creates a file on each node for storing UNDO log records.

CREATE TABLESPACE ts1
ADD DATAFILE 'datafile.dat'
USE LOGFILE GROUP lg1
INITIAL_SIZE 12M
ENGINE NDB;

This creates a file on each node for storing the disk data parts
of a disk-based table. All tables in the TABLESPACE is connected
to the LOGFILE GROUP previously defined.

Now we are ready to define the actual disk-based table.

CREATE TABLE t1 (a int, b int, c int, d int, e int,
primary key(a), index(a,b))
TABLESPACE ts1 STORAGE DISK
engine=ndb;

This defines a table where a and b will be in memory since they are
part of indexes. c,d and e will be stored as they are not part of
indexes.

If later one wants to add a new file to the tablespace one does this
by:
ALTER TABLESPACE ts1
ADD DATAFILE 'datafile02.dat'
INITIAL_SIZE 4M
ENGINE=NDB;

and similarly can be done for the LOGFILE GROUP
ALTER LOGFILE GROUP lg1
ADD UNDOFILE 'undofile02.dat'
INITIAL_SIZE = 4M
ENGINE=NDB;

Tablespaces and logfile groups can be dropped as well but only when
all objects in them are dropped. So to drop everything after the above
changes one does.

DROP TABLE t1;
ALTER TABLESPACE ts1
DROP DATAFILE 'datafile.dat'
ENGINE=NDB;
ALTER TABLESPACE ts1
DROP DATAFILE 'datafile02.dat'
ENGINE=NDB;
DROP TABLESPACE ts1
ENGINE=NDB;
DROP LOGFILE GROUP lg1
ENGINE=NDB;

Tuesday, February 14, 2006

5.1.6 release crammed with goodies

The new 5.1.6 release comes crammed with goodies for MySQL Cluster:

1) It is now possible to replicate between clusters.
2) Non-indexed attributes can now be stored on disk.
3) Disk-stored attributes uses a new page cache, data is stored in
tablespaces and UNDO log records for disk-stored attributes are
stored in LOGFILE GROUP's.
4) The default partitioning scheme has been changed to even distribution
and the previous linear hashing scheme can still be used by using
PARTITION BY LINEAR KEY().
5) Automatic partitioning can still be used, but NDB tables can also be
partitioned by the user. All the partitioning defined in the MySQL manual can
also be used to partition MySQL Cluster tables.
6) With user defined partitioning it is possible to use heterogenous machines
by using NODEGROUP when defining a partition. This defines in which
NODEGROUP a partition is to be stored.
7) ndb_restore can handle restoring backup's created in one cluster with
user-defined partition and later restored in a cluster with different set of
nodegroups.

Friday, February 10, 2006

Cluster disk data presentation at VLDB 2005

At VLDB2005 in Trondheim, Norway I did a presentation that explained
the architecture of the implementation of disk data in MySQL Cluster.
It also explains the thinking process for the new node recovery
algorithm introduced as part of this work.

Presentation:

http://www.vldb2005.org/program/slides/wed/s1108-ronstrom.pdf

Paper:

http://www.vldb2005.org/program/paper/wed/p1108-ronstrom.pdf

Partition Pruning uncovered

The partitioning development for 5.1 is close to its completion (apart from the
stabilisation phase). An important part of this development is the ability to
prune away partitions not used in the query. Here follows some guidelines how
the optimiser handles this at the moment.

For most functions the optimizer can only handle equality conditions thus:
where partition_field = 2
or
where partiiton_field IN (2,3,4)

In this case the optimiser can use this independent of what the partition
function is. It will do so by actually applying the partition function on the
equal value thus:
partition_function(2) = partition 3
partition_function(3) = partition 5
partition_function(4) = partition 3

It will set a bit in partition_used bitmap to indicate which partitions are used
to indicate to the executer which partitions to scan.

So this will work for all partition functions.

Range optimisations will work for the following partition functions using
RANGE or LIST partitioning

PARTITION BY X(f(partition_field)) where f(partition_field) is:
f(partition_field) = partition_field
f(partition_field) = year(field)
f(partition_field) = FUNC_TO_DAYS(partition_field)
(if partition_field is of date type or converted to in last two examples)

Range optimisations will also work for all partition functions and all
partitioning in the following case: When the range is either of type a list
of values (1,3,5) or where the range is a short range e.g. BETWEEN (3,5)
which is simply converted to (3,4,5).

For Range partitions the actual range of partitions is found by applying
partition_function(start_range_value) = partition 3
partition_function(end_range_value) = partition 5
=> Scan partition 3,4 and 5

For LIST partitions one will do it slightly different
partition_function(start_range_value) = x
partition_function(end_range_value) = y
(The partition functions used here must be increasing or decreasing)
Then scan all lists values from x to y (it is kept in a sorted list to enable quick
fetch of a partition given a value) and for each list value found mark the
partition as used.

For Range types that will work for all partition types the partition function will
be evaluated for each value in the interval and its corresponding partition will
be marked for use.

These optimisations apply also to subpartition so you can have conditions on
only subpartition fields and you will get pruning for subpartitions only and vice
versa for partitions and also when conditions on both at the same time.

So to use an example

CREATE TABLE t1 (a date)
PARTITION BY RANGE(year(a))
(PARTITION p0 VALUES LESS THAN (1995), ,,,
PARTITION p9 VALUES LESS THAN (2004));

SELECT * from t1 WHERE a >= '1995-01-01 AND a <= '1997-12-31';

Here we have a range optimised function =>
year(start_range_value) = year('1995-01-01') = 1995 => p0
year(end_range_value) = year('1997-12-31') = 1997 => p2

Thus p0 - > p2 will be scanned, thus p0, p1 and p2.

If we replace RANGE by LIST only and all else is kept constant then
we will scan list values from 1995 to 1997 and find that p0, p1 and p2
is to be scanned.

For HASH partitioned variants there will not be any optimisations in this
case due to that the interval is to wide since the data type of the equal
condition is date and there more than a thousand dates in the range.
However if the user makes a slight change to the data model and instead
use the following:

CREATE TABLE t1 (a_year int)
PARTITION BY HASH(any_function(a_year))
PARTITIONS 10;

SELECT * from t1 WHERE a_year >= 1995 AND a_year <= 1997;

Then the range [1995,1997] will always be good enough for partition pruning.
And thus a maximum of 3 partitions will be scanned in the query.

Mikael Ronstrom