Planet MySQL

聚合内容
Planet MySQL - http://www.planetmysql.org/
更新时间: 14 小时 33 分钟 前

Farewell CHM, hello EPUB!

周一, 2010/09/06 - 17:23

For a long time, the MySQL Documentation Team has been providing CHM files for most MySQL documentation we publish. Like many other formats, CHM-format docs can be downloaded from http://dev.mysql.com/doc. CHM (Compiled HTML Help) has been the de facto standard help file format on Windows since 1997, but the technology behind it is outdated and has all kinds of quirks. The successor format introduced with Windows Vista is AP Help, but it hasn't taken off in practice so far. So, with CHM being outdated and AP Help spread anything but widely, lots of vendors have started providing documentation on Windows in PDF or HTML format.

Building CHM-format documentation is a challenge of its own. I'll not go into details here, so let me just state that it requires a dedicated Windows box (or VM), and while it can be automated using Power Shell commands, there's no way to find out whether or not a CHM file was built correctly, except by manual inspection. This makes it different from all other documentation formats where technical QA is done (successfully) in an automated fashion.

With the increasing complexity and size of our documentation (the MySQL 5.1 Manual contains more than 1.6 million words now!), providing CHM has become more and more of a pain, because builds tend to break more often. We've stopped shipping CHM with the MySQL Server on Windows months ago because we simply couldn't guarantee that the help file shipped with the software would work. Also, we're running short on hardware resources, so we'd rather stop wasting the resources we have on building a format that's of limited use, anyway.

This is why we'll stop providing CHM for any of the documentation we publish.

To alleviate potential pains anyone might have with this decision, let me tell you that we've started providing EPUB-format docs. EPUB (see http://en.wikipedia.org/wiki/EPUB) is an open standard format for screen readers, mobile or not, and is fairly easy (and not resource-intensive!) to compile. Thanks to Lenz for suggesting to build EPUB!

Go to http://dev.mysql.com/doc to grab MySQL documentation in EPUB format. To read EPUB on desktop machines, I use a Firefox add-on, unsurprisingly called epubreader, which loads EPUB documents fast and renders them nicely. That said, please be aware that EPUB can't do anything about the fact that the MySQL Reference Manual is huge, so downloading it to a mobile device can take a while. The MySQL 5.1 Manual is currently a whopping 15 MB!


PlanetMySQL Voting: Vote UP / Vote DOWN

VLDB 2010

周一, 2010/09/06 - 08:22

I will be at VLDB 2010 next week.  If anyone on this blog is attending and wants to catch up to discuss start ups and innovation in DB, NoSQL, Big Data etc drop me a line and I will try to meet up.


PlanetMySQL Voting: Vote UP / Vote DOWN

XS4ALL offer IPv6 connectivity to retail customers

周一, 2010/09/06 - 06:38

Good news. I was told by a colleague that the Dutch ISP XS4ALL is offering IPv6 connectivity to its retail customers. You can see here although the comments are in Dutch.  They also provide a list of ADSL routers which should work for their service. The Cisco name may not be surprising but this is good publicity for Draytek and AVM FRITZ!box for their products. Hopefully it will also stimulate other SOHO router providers into the act to get their names on the list.  Let us hope that more ISPs start to offer this sort of service to their customers.

I’m not sure if there’s a list of residential ISPs in each country which provide IPv6 connectivity. If not it might be worth making one and updating it as new providers off this service.

I checked my ISP, Jazztel, and was not really surprised to see that neither technical support or the sales staff really new what I was talking about when I asked if they were planning on offering IPv6 support. That’s unfortunate, but I don’t think any other Spanish ISP is any better.  That is any other residential ISP in Spain. I expect some of the larger bigger ISPs are likely to off this to business customers.


PlanetMySQL Voting: Vote UP / Vote DOWN

Integrating MySQL and Hadoop - or - A different approach on using CSV files in MySQL

周一, 2010/09/06 - 05:46

We use both MySQL and Hadoop a lot. If you utilize each system to its strengths then this is a powerful combination. One problem we are constantly facing is to make data extracted from our Hadoop cluster available in MySQL.

The problem

Look at this simple example: Let’s say we have a table customer:

CREATE TABLE customer {

    id UNSIGNED INT NOT NULL,
    firstname VARCHAR(100) NOT NULL,
    lastname VARCHAR(100) NOT NULL,
    city VARCHAR(100) NOT NULL,

    PRIMARY KEY(id)
}

In addition to that we store orders customers made in Hadoop. An order includes: customerId, date, itemId, price. Note that these structures serve as a very simplified example.

Let’s say we want to find the first 50 customers, that placed at least one order sorted by firstname ascending. If both tables were in MySQL we could use a single SQL statement like:

SELECT DISTINCT c.id, c.firstname FROM customer c JOIN ORDER o ON c.id = o.customerId ORDER BY c.firstname ASC LIMIT 50

Having the orders in Hadoop we have basically two options:

  1. We write a Map-Reduce job that reads all customers from MySQL and joins them with the orders stored in Hadoop’s HDFS. The output is sorted by firstname ascending. From the result we use only the first 50 entries.
  2. We write a Map-Reduce job to extract all distinct customerIds, write them to a table in MySQL and use a SELECT with a JOIN.

In most cases option 2 will be the better choice if we have a non-trivial number of rows in our customer table. And that’s for three reasons:

  1. MySQL is not optimized for streaming rows. As our Map-Reduce job would always have to read the whole table, we would stream a lot.
  2. You cannot easily write something like a LIMIT clause in Map-Reduce. Even if you could, you’d likely have to read through all customer entries anyway. So the amount of data processed by the Map-Reduce job is significant higher if you use aproach 1.
  3. If you just started to move to Hadoop, most of the data structures like categories, product information etc. are still kept in MySQL and most of the business logic relies on SQL. In most application you would not move all your data to Hadoop anyway. So storing Hadoop’s result in MySQL simply integrates better with your existing application.

So, storing Map-Reduce results in MySQL seems to be the better option most of the time. But you still have to write all customerIds extracted by the Map-Reduce job into a table. And that is a performance killer. Even if you use HEAP tables it puts a lot of pressure on MySQL. Other options like CSV storage engine are not feasible since they do not provide any keys. And joining without a key is never a good idea.

Introducing: MySQL UDF csv_find() and csv_get()

One of the big advantages of Map-Reduce is that it produces output sorted by whatever we want. So we could output sorted CSV files. And we could perform binary search on these sorted CSV files. Great!

I have written two MySQL User Defined Functions (UDF) that provide find and get functionality on sorted CSV files.

How to use it?

Taking our example from above we transfer the resulting CSV file from HDFS to the local filesystem of our MySQL server and write a query like this:

SELECT * FROM customer WHERE csv_find(‘/tmp/myHadoopResult.csv’, customer.id) = 1 ORDER BY firstname ASC LIMIT 50

And this is a lot faster than inserting the Map-Reduce result into a table. It might even be faster than our original SELECT statement where we assumed both tables customer and order are in MySQL since we are not using a JOIN at all. More on performance later on.

How does it work?

On initialization of csv_find the CSV file will be loaded into memory using mmap. And since the first column of the CSV file is sorted in ascending order we can simply use binary search on each call to csv_find.

If you need to access other columns of a CSV use csv_get(<file expression>, <key expression>, <column expression>). Example:

SELECT customer.lastname, csv_get(‘purchases.csv’, customer.id, 2) AS price FROM customer

assuming that column 2 contains the price of a product purchased.

Prerequisites

The following assumptions are made and must be met by your CSV files:

  • Column delimiter is ‘\t’ and row delimiter is ‘\n’. You can change this at compile time.
  • The first column must be sorted in UTF-8 binary ascending order. “binary” means that it has to be sorted by byte value and not by a specific collation. For example ‘ä’ (0xc3 0xb6) comes after ‘z’ (0x7a). In bash you would sort a file like this: LC_ALL=C sort < input.csv > ordered.csv

    Remember that sorting comes for free in Map-Reduce.

  • No escaping is done. If you need it, you could do the following: First, escape everything in your CSV, say by replacing ‘\n’ with ‘\\n’ and then use csv_find or csv_get like this: csv_find(<file expression>, REPLACE(<key expression>, ‘\n‘, ‘\\n’))
  • Some MySQL APIs (at least JDBC) treat results of an UDF as binary data. You have to explicitly cast the return value of csv_get like this: SELECT CAST(csv_get(<file expression>, <key expression>, <column expression>) AS CHAR)

For more information take look into the source code documentation.

Usage patterns other than integrating with Hadoop

We use csv_find and csv_get not only to integrate with Hadoop but to integrate multiple MySQL servers. To make data from one MySQL server available in another you could export it like this:

SELECT * FROM customer WHERE  <some condition> ORDER BY BINARY id ASC INTO OUTFILE ‘/tmp/customer.csv’

Then copy the file over to the other MySQL server (or use NFS). Of course you could use FEDERATED storage engine. We decided not to because it has/had some glitches.

Another useful application is to replace complicated JOINs or SUBSELECTs. MySQL is good at performing some JOINs but really poor at a lot others, especially SUBSELECTs.

A brief performance evaluation

First we create a test CSV file:

#> for a in $(seq 1000000 2000000); do echo $a >> /tmp/random.csv; done

Then we load it into a table:

mysql> CREATE TABLE rand (id VARCHAR(255) NOT NULL, PRIMARY KEY(id));
Query OK, 0 rows affected (0.00 sec)

mysql> LOAD DATA INFILE ‘/tmp/random.csv’ INTO TABLE rand;
Query OK, 1000001 rows affected (5.60 sec)

To test performance of JOIN vs. csv_find we create a second table containing the same rows:

mysql> CREATE TABLE rand2 (id VARCHAR(255) NOT NULL, PRIMARY KEY(id));
Query OK, 0 rows affected (0.01 sec)

mysql> LOAD DATA INFILE ‘/tmp/random.csv’ INTO TABLE rand2;
Query OK, 1000001 rows affected (5.75 sec)

We see that importing 1 million rows already took 5.75 seconds.

Now lets compare the actual JOIN and csv_find:

mysql> SELECT COUNT(*) FROM rand JOIN rand2 ON rand.id = rand2.id;
1 row IN SET (2.37 sec)

mysql> SELECT COUNT(*) FROM rand WHERE csv_find(‘/tmp/random.csv’, id) = 1;
1 row IN SET (1.83 sec)

We see 1.83 seconds for csv_find vs. 2.37 seconds for a JOIN.

Taking the time spent in LOAD DATA into account we even have 1.83 seconds vs. 8.12 seconds meaning csv_find is 4 times faster.

Since most Map-Reduce jobs do not use LOAD DATA but a ton of INSERT statements the real performance might be even worse. Not to mention the load massive INSERTs put on the MySQL server.

rand2 is an InnoDB table. Let’s retry with a memory table:

mysql> SET max_heap_table_size = 64 * 1024 * 1024;
Query OK, 0 rows affected (0.00 sec)

mysql> CREATE TABLE rand3 (id VARCHAR(10) NOT NULL, PRIMARY KEY(id)) ENGINE=HEAP;
Query OK, 0 rows affected (0.00 sec)

mysql> LOAD DATA INFILE ‘/tmp/random.csv’ INTO TABLE rand3;
Query OK, 1000001 rows affected (1.94 sec)
Records: 1000001  Deleted: 0  Skipped: 0  Warnings: 0

mysql> SELECT COUNT(*) FROM rand JOIN rand3 ON rand.id = rand3.id;
1 row IN SET (1.80 sec)

As you can see, execution time of both queries is nearly equal, but we still need 1.94 seconds to load the data into table. Thus csv_find is still twice as fast compared to a JOIN on a HEAP table.

But, did you notice this statement? SET max_heap_table_size = 64 * 1024 * 1024

We had to raise the maximum heap table size since the contents of our 7.7 MB test file would not fit into the default 16 megabytes.

Actually the HEAP table uses about 50 MB of RAM compared to just exactly 7.7 MB for csv_find.

And because RAM is a limiting factor you cannot use HEAP all that often anyway. csv_find and csv_get allocate as much memory as the file size. You can limit the maximum allowed file size at compile time.

Where to download?

Download mysql_udf_csv_binary_search-0.1.tar.gz.

This package includes instructions on how to install (see README) as well as a comprehensive test suite containing both, unit and integration tests (make test).

A note on Windows: Since I don’t use Windows there are no build instructions for this OS. I tried to write portable code but since I cannot test it, I don’t know if it is working. It would be great if someone out there could contribute a Windows version.

Code has been tested on MySQL version 5.1.41 as well as 5.0.83.

Final words

This package provides fast and simple integration of sorted CSV files coming from any source.

Comments and improvements are welcome.


PlanetMySQL Voting: Vote UP / Vote DOWN

Micro-benchmarking pthread_cond_broadcast()

周一, 2010/09/06 - 00:38

In my work on group commit for MariaDB, I have the following situation:

A group of threads are going to participate in group commit. This means that one of the threads, called the group leader, will run an fsync() for all of them, while the other threads wait. Once the group leader is done, it needs to wake up all of the other threads.

The obvious way to do this is to have the group leader call pthread_cond_broadcast() on a condition that the other threads are waiting for with pthread_cond_wait():

bool wakeup= false; pthread_cond_t wakeup_cond; pthread_mutex_t wakeup_mutex

Waiter: pthread_mutex_lock(&wakeup_mutex); while (!wakeup) pthread_cond_wait(&wakeup_cond, &wakeup_mutex); pthread_mutex_unlock(&wakeup_mutex); // Continue processing after group commit is now done.

Group leader:

pthread_mutex_lock(&wakeup_mutex); wakeup= true; pthread_cond_broadcast(&wakeup_cond); pthread_mutex_unlock(&wakeup_mutex);

Note the association of the condition with a mutex. This association is inherent in the way pthread condition variables work. The mutex must be locked when calling into pthread_mutex_wait(), and will be obtained again before the call returns. (Check the man page for pthread_cond_wait() for details).

Now, when I think about how these condition variables work, something strikes me as somewhat odd.

The idea is that the broadcast signals every waiting thread to wake up. However, because of the associated mutex, only one thread will actually be able to wake up; this thread will obtain a lock on the mutex, and all other to-be-awoken threads will now have to wait for this mutex! Only after the first thread releases this mutex will the next thread wakeup holding the mutex, then after releasing the third thread can wake up, and so on.

So if we have say 100 threads waiting, the last one will have to wait for the first 99 threads to each be scheduled and each release the mutex, one after the other in a completely serialised fashion.

But what I really want is to just let them all run at once in parallel (or at least as many as my machine has spare cores for). There is another way to achieve this, by simply using a separate condition and mutex for each thread, and have the group leader signal each one individually:

Waiter: pthread_mutex_lock(&me->wakeup_mutex); while (!me->wakeup) pthread_cond_wait(&me->wakeup_cond, &me->wakeup_mutex); pthread_mutex_unlock(&me->wakeup_mutex);

Group leader:

for waiter in <all waiters> pthread_mutex_lock(&waiter->wakeup_mutex); waiter->wakeup= true; pthread_cond_signal(&wakeup_cond); pthread_mutex_unlock(&wakeup_mutex);

This way, every waiter is free to start running as soon as woken up by the leader; no waiters have to wait for one another. This seems advantageous, especially as number of cores increases (rumours are that 48 core machines are becoming commodity).

"Seems" advantageous. But is it really? Let us micro-benchmark it.

For this, I start up 5000 threads. Each thread goes to wait on a condition, either a single shared one, or distinct in each thread. The main program then signals the threads to wakeup, either with a single pthread_cond_broadcast(), or with one pthread_cond_signal() per thread. Each thread records the time they woke up, and the main program collects these times and computes how long it took between starting to signal the condition(s) and wakeup of the last thread. (Here is the full C source code for the test program).

I ran the program on an Intel quad Core i7 with hyperthreading enabled, the most parallel machine I have easy access to. The results is the following:
pthread_cond_broadcast(): 46.9 msec pthread_cond_signal(): 17.6 msec
Conclusion: pthread_cond_broadcast() is slower, as I speculated. I would expect the effect to be more pronounced on systems with more cores; it would be interesting if readers with access to such systems could try the test program and comment below on the results.


PlanetMySQL Voting: Vote UP / Vote DOWN

My Opinion on NoSQL DBs

周日, 2010/09/05 - 18:59
I'll let the following express my opinion about NoSQL



and..



PlanetMySQL Voting: Vote UP / Vote DOWN

Why MySQL replication is better than mysqlbinlog for recovery

周日, 2010/09/05 - 00:12

You have a backup, and you have the binary logs between that backup and now. You need to do point-in-time recovery (PITR) for some reason. What do you do? The traditional answer is “restore the backup and then use mysqlbinlog to apply the binary logs.” But there’s a much better way to do it.

The better way is to set up a server instance with no data, and load the binary logs into it. I call this a “binlog server.” Then restore your backup and start the server as a replication slave of the binlog server. Let the roll-forward of the binlogs happen through replication, not through the mysqlbinlog tool.

Why is this better? Because replication is a more tested way of applying binary logs to a server. The results are much more likely to be correct, in my opinion. Plus, replication is easier and more convenient to use. You can do nice things like START SLAVE UNTIL, skip statements, stop and restart without having to figure out where you left off, and so on.

Replication also has the ability to correctly reproduce more types of changes than mysqlbinlog does. Try this with statement-based replication:

insert into tbl(col) values(connection_id());

That’ll work just fine through replication, because the SQL thread on the slave will change its connection ID to match the original. It won’t work through mysqlbinlog.

Related posts:

  1. MySQL disaster recovery by promoting a slave
  2. Progress on High Performance MySQL Backup and Recovery chapter
  3. How MySQL replication got out of sync
  4. High Performance MySQL, Second Edition: Backup and Recovery
  5. How to make MySQL replication reliable


PlanetMySQL Voting: Vote UP / Vote DOWN

Cassandra and Ganglia

周六, 2010/09/04 - 06:42


I finally got some time to do some house cleaning. One of my nagging low-hanging fruit stuff was stop running jconsole on one screen to see the state of all my cassandra boxes. I created a ganglia script to graph what is above. Above I am showing all the cassandra servers and their total row read stages as a gauge. Meaning that basically I am graphing the delta of the change between ganglia script runs. This gives me the reads over time based on deltas between runs.


How I have it set up is:

All data exposed by JMX to produce tpstats and cfstats is graphed via ganglia. The pattern for each graph is as follows

cass_{stat_class}_{key}

stat_class - tpc, tpp, tpa means complete, pending, active respectively
key - would be message deserialization for instance.

For column family stats I graph the keyspace stats as well as the specific column family stats exposed by cfstats. For instance below:



If your interested in the scripts I'll send it to you or put it up on code.google.com, its written in perl OOP perl and takes the same approach of packaging that maatkit tool kit for mySQL by Xarb and crew does (puts all the "classes" in the file as the application).

GmetricDelegate is the parent package
GmetricCassandra extends GmetricDelegate and overloads getData as well as defines what is an absolute stats vrs a gauge.

As you can see the pattern I also have
GmetricInnoDB
GmetricMySQL

and so on.

then on each server I run

/usr/bin/perl -w /home/scripts/ganglia_gmetric.pl --module=GmetricCassandra

this then talks to Ganglia through gmetric to report the stats.
PlanetMySQL Voting: Vote UP / Vote DOWN

MySQL Cluster: 5 Steps to Getting Started, then 5 More to Scale for the Web

周五, 2010/09/03 - 22:10

Join us for a live and interactive webinar session where we will demonstrate how to start an evaluation of the MySQL Cluster database in 5 easy steps, and then how to expand your deployment for web & telecoms-scale services.

Just register here:

http://www.mysql.com/news-and-events/web-seminars/display-566.html


Getting Started will describe how to:

  • Get the software
  • Install it
  • Configure it
  • Run it
  • Test it

Scaling for HA and the web will describe how to:

  • Review the requirements for a HA configuration
  • Install the software on more servers
  • Update & extend the configuration from a single host to 4
  • Roll out the changes
  • On-line scaling to add further nodes

When: Wednesday, September 08, 2010: 09:00 Pacific time (America)

Wed, Sep 08: 11:00 Central time (America) Wed, Sep 08: 12:00 Eastern time (America) Wed, Sep 08: 16:00 UTC Wed, Sep 08: 17:00 Western European time

The presentation will be approximately 45 minutes long followed by Q&A.


PlanetMySQL Voting: Vote UP / Vote DOWN

dbbenchmark.com – configuring OpenBSD for MySQL benchmarking

周五, 2010/09/03 - 11:51

Here are some quick commands for installing the proper packages and requirements for the MySQL dbbenchmark program.

export PKG_PATH="ftp://openbsd.mirrors.tds.net/pub/OpenBSD/4.7/packages/amd64/" pkg_add -i -v wget wget http://dbbenchmark.googlecode.com/files/dbbenchmark-version-0.1.beta_rev26.tar.gz pkg_add -i -v python Ambiguous: choose package for python a 0: 1: python-2.4.6p2 2: python-2.5.4p3 3: python-2.6.3p1 Your choice: 2 pkg_add -i -v py-mysql pkg_add -i -v mysql pkg_add -i -v mysql-server ln -s /usr/local/bin/python2.5 /usr/bin/python gzip -d dbbenchmark-version-0.1.beta_rev26.tar.gz tar -xvf dbbenchmark-version-0.1.beta_rev26.tar cd dbbenchmark-version-0.1.beta_rev26 ./dbbenchmark.py --print-sql - login to mysql and execute sql commands ./dbbenchmark.py
PlanetMySQL Voting: Vote UP / Vote DOWN

Replication and “the lost binlog”

周五, 2010/09/03 - 09:26

Unless you set sync_binlog = 1, a system crash on the master will likely fail any slave with an “Client requested master to start replication from impossible position” error. Generally, this kind of situation requires manual intervention. When we see this, we make sure things indeed failed “past the end” of a binlog (i.e. the bit that didn’t get to the physical platter before the crash), reposition the slave to the next binlog, and use the Maatkit tools to ensure the slave is properly synced.

sync_binlog=1 is a problem in itself, because it makes the server not just do one fsync per commit, but several and that’s serious overhead. sync_binlog is actually not a boolean but a “fsync binlog every N commits” where 0 meaning “never”. So you could set it to 10 (fsync every 10 commits) and thus reduce the loss a little bit while not doing too much harm to performance. But it’s not ideal and won’t always prevent the above error.


PlanetMySQL Voting: Vote UP / Vote DOWN

dbbenchmark.com – automated installer now available

周五, 2010/09/03 - 05:40

As previously mentioned, Darren Cassar has been working on a new automated installer for the DBbenchmark program. It’s now available for download: click here. All you need to do is save it to the directory that you want to install to and then make sure it’s executable: “chmod 700 installer.sh”, then run it “./installer.sh”.


PlanetMySQL Voting: Vote UP / Vote DOWN

How long Innodb Shutdown may take

周五, 2010/09/03 - 05:40

How long it may take MySQL with Innodb tables to shut down ? It can be quite a while.
In default configuration innodb_fast_shutdown=ON the main job Innodb has to do to complete shutdown is flushing dirty buffers. The number of dirty buffers in the buffer pool varies depending on innodb_max_dirty_pages_pct as well as workload and innodb_log_buffer_size and can be anywhere from 10 to 90% in the real life workloads. Innodb_buffer_pool_pages_dirty status will show you the actual data. Now the flush speed also depends on number of factors. First it is your storage configuration – you may be looking at less than 200 writes/sec for single entry level hard drive to tens of thousands of writes/sec for high end SSD card. Flushing can be done using multiple threads (in XtraDB and Innodb Plugin at least) so it scales well with multiple hard drives. The second important variable is your workload, especially how dirty pages would line up on the hard drive. If there are a lot of sequential pages which are dirty Innodb will be able to use larger size IOs – up to 1MB flushing dirty pages which can be a lot faster than flushing data page by page.

So if we have system with single hard drive doing 200 IO/ssc, 48G buffer pool which is 90% dirty and completely random page writes we’ll look at 13500 seconds or about 5min per 1GB of Buffer pool size.
This is worse case scenario though it is quite common in practice to see shutdown time of about 1min per GB of buffer pool per hard drive.

Baron has written a nice post how to decrease innodb shutdown time which you may want to read on this topic.

Entry posted by peter | No comment

Add to: | | | |


PlanetMySQL Voting: Vote UP / Vote DOWN

dbbenchmark.com – vote on next supported OS now!

周五, 2010/09/03 - 04:24

So far the benchmarking script supports Linux, FreeBSD, and OSX. I’m installing virtual machines today to get ready for development on the next OS that the community wants to have supported. Vote today for your choice. Development will begin Friday 2010-09-03.

Note: There is a poll embedded within this post, please visit the site to participate in this post's poll.
PlanetMySQL Voting: Vote UP / Vote DOWN

LucidDB has a new Logo/Mascot

周五, 2010/09/03 - 01:53

At yesterdays Eigenbase Developer Meetup at SQLstream’s offices in San Francisco we arrived at a new logo for LucidDB.  DynamoBI is thrilled to have supported and funded the design contest to arrive at our new mascot.  Over the coming months you’ll see the logo make it’s way out to the existing luciddb.org sites, wiki sites, etc.  I’m really happy to have a logo that matches the nature of our database - BAD ASS!


PlanetMySQL Voting: Vote UP / Vote DOWN

Loading Tables with TokuDB 4.0

周五, 2010/09/03 - 01:47

Often, the first step in evaluating and deploying a database is to load an existing dataset into the database. In the latest version, TokuDB makes use of multi-core parallelism to speed up loading (and new index creation). Using the loader, MySQL tables using TokuDB load 5x-8x faster than with previous versions of TokuDB.

Measuring Load Performance

We generated several different datasets to measure the performance of TokuDB when doing a LOAD DATA INFILE … command. To characterize performance, we vary

  • rows to load
  • keys per row
  • row length (including keys)

All generated keys, including the primary, are random, 8-byte values. The remaining data, needed to pad out the row length to specified length, is text.

Two files files are produced as part of data generation.

  1. data file, containing ‘|’ separated fields
  2. sql file, containing the CREATE TABLE command corresponding to the generated data

For instance, if the number of keys is 3 and the row length is 256 bytes, the following SQL statement is produced:

CREATE TABLE load_table (\ val0 BIGINT UNSIGNED NOT NULL,\ val1 BIGINT UNSIGNED NOT NULL,\ val2 BIGINT UNSIGNED NOT NULL,\ pad VARCHAR(232) NOT NULL,\ PRIMARY KEY (val0),\ KEY valkey1 (val1),\ KEY valkey2 (val2)\ ) ENGINE=tokudb

We can make the data generation program available if anyone is interested.

Load Test

A simple shell script

  • creates the test table
  • performs a LOAD DATA INFILE <datafile> INTO TABLE load_table FIELDS TERMINATED BY ‘|’
  • returns execution time

For the experiments to be meaningful, we created datasets that do not fit in memory.

Results

We ran our benchmark on an Amazon Web Services c1.large node with 8 cores and 7 GB of memory. The test loads 100M rows (NOT pre-sorted). The data file was on a 2 disk RAID-0, the MySQL DB files on a different 2 disk RAID-0.

TokuDB Version 3 (~single-threaded) v. TokuDB Version 4 (multi-threaded) Keys Row Len v3 rows/s v4 rows/s Speedup 1 64 27K 142K 5.1 4 64 13K 82K 6.2 1 256 7K 54K 7.2 4 256 5K 43K 8.2 Other metrics

Several metrics can be used to measure performance:

  • rows per second : data insert rate
  • key-value pairs per second : indicates how fast the primary and secondary indexes are being created
  • MB/s : how much raw data is being added to the database

Metrics for TokuDB v4:

Keys Row Len Rows/sec KV-pairs/sec MB/sec 1 64 142K 142K 9.1 4 64 82K 330K 5.3 1 256 54K 54K 13.9 4 256 43K 173K 11.1

These results show

  1. significant parallelization (we believe larger CPU core count machines will see even larger benefits)
  2. a significant jump in absolute load performance
  3. speed-ups are not limited to tables with many keys – even the 1 key tables are 5-7x faster

We will report further results, especially speedups on larger CPU count machines, as they become available.


PlanetMySQL Voting: Vote UP / Vote DOWN

Speaking at MySQL Sunday

周四, 2010/09/02 - 22:31
I am speaking at MySQL Sunday. The title for my talk is Success with MySQL and I will focus on things that  operations and users can do to make a MySQL deployment succeed. There are many interesting talks scheduled for Sunday, including several at the same time as mine. I hope to see you there.
PlanetMySQL Voting: Vote UP / Vote DOWN

Cluster - spins/contentions and thread stuck in..

周四, 2010/09/02 - 21:35
I get a number of question about contentions/"stuck in..". So here comes some explanation to:
  • Contention
  • Thread Stuck in
  • What you can do about it
In 99% of the cases the contentions written out in the out file of the data nodes (ndb_X_out.log) is nothing to pay attention to.

sendbufferpool waiting for lock, contentions: 6000 spins: 489200
sendbufferpool waiting for lock, contentions: 6200 spins: 494721

Each spin is read from the L1 cache (4 cycles on a Nehalem (3.2GHz), so about a nanosecond).
1 spin = 1.25E-09 seconds (1.25ns)

In the above we have:
(494721-489200)/(6200-6000)= 27 spins/contention
Time spent on a contention=27 x 1.25E-09=3.375E-08 seconds (0.03375 us)

So we don't have a problem..

Another example (here is a lock guarding a job buffer (JBA = JobBuffer A, in short it handles signals for heartbeats and some other small things, all traffic goes over JobBuffer B).

jbalock thr: 1 waiting for lock, contentions: 145000 spins: 3280892543
jbalock thr: 1 waiting for lock, contentions: 150000 spins: 3403539479

(3403539479-3280892543)/(150000-145000)=24529 spins/contention
Time spent on a contention: 3.06613E-05 seconds (30.66us )

This is a bit higher than I would have expected and I think more analysis is needed. However, i tend not to get these contentions on a busy system.

Ndb kernel thread X is stuck in ..

Ndb kernel thread 4 is stuck in: Job Handling elapsed=100 Watchdog: User time: 82 System time: 667
Ndb kernel thread 4 is stuck in: Job Handling elapsed=200
Watchdog: User time: 82 System time: 668
Ndb kernel thread 4 is stuck in: Job Handling elapsed=300 Watchdog: User time: 82 System time: 669
Ndb kernel thread 4 is stuck in: Job Handling elapsed=400 Watchdog: User time: 82 System time: 670

Here the important is to look at how User time and System time behaves.
If User time is constant (as it is here - 82ms), but the System time is growing (667, 668 etc) which indicates that the OS kernel is busy.
Slow network? Sub-optimal kernel version? NIC drivers? swapping? some kernel process using too much cpu?

If User time is growing it is probably because the ndb kernel is overloaded.

What can you do about this?
  • In config.ini:
    RealtimeScheduler=1
    LockExecThreadToCPU=[cpuids]
  • check that cpuspeed is not running ( yum remove cpuspeed )
  • .. and finally ask us to optimize more!
Also, pay attention if you get the contentions on an idle Cluster or a busy Cluster.
PlanetMySQL Voting: Vote UP / Vote DOWN

Oracle's MySQL - What's New? Live event in Milan on Sept, 28

周四, 2010/09/02 - 18:48
Join us at this live event in Milan to better understand what’s new with MySQL. You will learn more about the current and future state of MySQL, now part of the Oracle family of products. We will also cover Oracle’s investment in MySQL aiming to make it even a better MySQL.

In particular the following topics will be discussed:
  • Oracle’s MySQL Strategy
  • What’s New for:
    • The MySQL Server
    • MySQL Cluster
    • MySQL Enterprise
    • MySQL Workbench
Stay tuned because we are organizing a similar event in Rome that will be announced soon. Attendance is free, but you’ll need to register in advance. Seats are limited, register today!

When:



PlanetMySQL Voting: Vote UP / Vote DOWN

İstanbul 2010 Monty Program Ab Firma Toplantısı ve Konferans

周四, 2010/09/02 - 18:31

(English version) Herhalde bildiğiniz gibi, biz tamamen sanal şirketiz, yani ana merkez gib bir şeyimiz yok. Hepimiz evinden çalışıyor ve biz sadece bir ya da iki kez yılda gerçek hayatta buluşuyorus. Bu sene kararımız İstanbul‘a düştü.  Firmadaki tek Türk ben olduğumdan dolayı toplantının organizesi bana düştü.

İlk adım olarak böyle bir toplantının özelikleri nedir ve en iyi şekilde nasıl hazır edilir diye araştırdım. Kaj Arnö bunun ayrıntılarını blog’unda açıklamış bile: How to arrange a physical meeting in a virtual organisation. Kaj ile İstanbul’a 2008′de gelmiştik ve çeşitli konferanslar vermiştik.

İlk toplantı tarihi ile ilgili bir anket yaptık ve İstanbul’daki toplantı Ekim ayına karar verdik. Tam olarak, Perşembe, 7 Ekim’den Salı, 12 Ekim 2010′a kadar.

Yaklaşık 30 kişilik toplantıyı düzenlemek pek bir basit görev değil, bu nedenle toplantıyı hazırlanmak için bana bir asistan tahsis edildi – oldukçada ünlü bir asistan  – My Widenius. My (okunuşu Mü) MySQL’ın My’sü ve ona bir mariadb.org e-posta adresinle ulaşabilisiniz.

Eğer Ekim’in başında İstanbul’da iseniz, sizi bizim toplantıya davet etmekten mutluluk duyarız. Üç toplantı günleri olacaktır: Cuma 8 Ekim, Cumartesi 9 Ekim ve Pazar 10 Ekim 2010. Toplantıların çoğu herkese açık olacaktır. Misafirlerimiz olarak Facebook, Percona ve Intel bizlen olucak.

Ayrıca yerel (İstanbul) kullanıcı grupları ve bizim toplantıya ilgilenlerini arıyoruz. Sizi İstanbul’da görmek üzere, …


PlanetMySQL Voting: Vote UP / Vote DOWN