a:4:{s:5:"child";a:1:{s:0:"";a:1:{s:3:"rss";a:1:{i:0;a:6:{s:4:"data";s:3:"
";s:7:"attribs";a:1:{s:0:"";a:1:{s:7:"version";s:3:"2.0";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:1:{s:0:"";a:1:{s:7:"channel";a:1:{i:0;a:6:{s:4:"data";s:217:"
";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:1:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:12:"Planet MySQL";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:27:"http://www.planetmysql.org/";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Sat, 10 Jul 2010 10:15:01 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"language";a:1:{i:0;a:5:{s:4:"data";s:2:"en";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:42:"Planet MySQL - http://www.planetmysql.org/";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"item";a:50:{i:0;a:6:{s:4:"data";s:53:"
";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:25:"DBD::mysql 4.015 Released";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:43:"urn:lj:livejournal.com:atom1:capttofu:26687";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:42:"http://capttofu.livejournal.com/26687.html";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:891:"I'm glad to let everyone out in Perl Land know that DBD::mysql 4.015 is released. Per the changelog:* BUG #56664 fixed t/40blobs.t skip_all logic (W. Phillip Moore)* BUG #57253 Fixed iteration past end of string (crash). (Chris Butler)* Added a new parameter for old behavior- mysql_bind_comment_placeholders which will make it possible to have placeholders bound within comments for those who really want that behavior.* Fixed bind_type_guessing - always on now. Numbers will not be automatically quoted as they are now.You can get the latest release at:http://search.cpan.org/~capttofu/DBD-mysql-4.015/ file: $CPAN/authors/id/C/CA/CAPTTOFU/DBD-mysql-4.015.tar.gz size: 132029 bytes md5: 4d80bb5000b97bbfbe156140b9560c20Also, the latest source:git://github.com/CaptTofu/DBD-mysql.gitThanks for using DBD::mysql and reporting bugs! --Patrick 'CaptTofu' Galbraith";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Sat, 10 Jul 2010 04:01:15 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:3:{i:0;a:5:{s:4:"data";s:5:"mysql";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:1;a:5:{s:4:"data";s:16:"database clients";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:2;a:5:{s:4:"data";s:10:"dbd::mysql";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:1306:"I'm glad to let everyone out in Perl Land know that DBD::mysql 4.015 is released. Per the changelog:
* BUG #56664 fixed t/40blobs.t skip_all logic (W. Phillip Moore)
* BUG #57253 Fixed iteration past end of string (crash). (Chris Butler)
* Added a new parameter for old behavior- mysql_bind_comment_placeholders which
will make it possible to have placeholders bound within comments for those who really
want that behavior.
* Fixed bind_type_guessing - always on now. Numbers will not be automatically quoted as they are now.
You can get the latest release at:
http://search.cpan.org/~capttofu/DBD-mysql-4.015/
file: $CPAN/authors/id/C/CA/CAPTTOFU/DBD-mysql-4.015.tar.gz
size: 132029 bytes
md5: 4d80bb5000b97bbfbe156140b9560c20
Also, the latest source:
git://github.com/CaptTofu/DBD-mysql.git
Thanks for using DBD::mysql and reporting bugs!
--Patrick 'CaptTofu' Galbraith
PlanetMySQL Voting:
Vote UP /
Vote DOWN";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:17:"Patrick Galbraith";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:1;a:6:{s:4:"data";s:63:"
";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:32:"Load-balancing for MySQL Cluster";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:38:"http://mysqlblog.fivefarmers.com/?p=51";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:77:"http://mysqlblog.fivefarmers.com/2010/07/09/load-balancing-for-mysql-cluster/";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:3062:"Shortly after I wrote my last post regarding some advanced Connector/J load-balancing properties, Anirudh published a post describing configuration of RHEL LVS for load-balancing and failover of MySQL Cluster SQL nodes. It’s an interesting post, and I admit I know very little about RHEL LVS, but it reminded me of problems I experienced when trying to set up load-balanced ColdFusion(!) servers at my last job, years back. We ended up with a nice hardware load-balancer sitting in front of multiple ColdFusion web servers. The problems we found were that our application depended upon session state, which was stored (of course) on a single web server. The load-balancer allowed us to define sticky sessions, which is what we did, but it cost us.
We couldn’t really balance load – we could balance session counts, sort of. Every time a new session started, the balancer would pick which server would handle that session – for the full duration of the session. Some sessions might be short and little load, while others may be very long and represent a huge amount of load.
We also had a limited HA solution. We implemented a heartbeat function so that when a web server went offline, the load-balancer would re-route affected users to an available server. But because the session data was stored on the original server, the user had to log in again and recreate session data. If the user was in the middle of a complex transaction, too bad.
The above problem also made maintenance a pain. We could reconfigure the load-balancer on the fly to stop using a specific server for new sessions, but we couldn’t take that web server offline until all of the user sessions on that machine terminated. That might take 5 minutes, or it might take 5 hours.
As I said, I’m no LVS expert, but I would expect similar problems when using it as a load-balancer for MySQL Cluster. I suspect that only new connection requests are balanced, making persistent connections (like common Java connection pools) “sticky” to whatever machine the connection was originally assigned. You probably cannot balance load at anything less than “connection” level, while Connector/J will rebalance after transactions or communications errors. And anytime you lack the ability to redistribute load except at new connections, taking servers offline for maintenance will be problematic (Connector/J 5.1.13 provides a new mechanism to facilitate interruption-free maintenance, which I intend to blog about later).
This means that it probably works best when using other connectors which don’t support load-balancing, or with applications that don’t use persistent connections, but I wouldn’t use it instead of Connector/J’s load-balancing, and I definitely would not use it with Connector/J’s load-balancing – Connector/J won’t understand that multiple MySQL server instances live behind a single address, and won’t be able to coordinate load-balancing with LVS.";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Sat, 10 Jul 2010 04:00:42 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:5:{i:0;a:5:{s:4:"data";s:4:"Java";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:1;a:5:{s:4:"data";s:5:"MySQL";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:2;a:5:{s:4:"data";s:11:"Connector/J";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:3;a:5:{s:4:"data";s:14:"Load-balancing";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:4;a:5:{s:4:"data";s:13:"MySQL Cluster";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:3593:"
Shortly after I wrote my last post regarding some advanced Connector/J load-balancing properties, Anirudh published a post describing configuration of RHEL LVS for load-balancing and failover of MySQL Cluster SQL nodes. It’s an interesting post, and I admit I know very little about RHEL LVS, but it reminded me of problems I experienced when trying to set up load-balanced ColdFusion(!) servers at my last job, years back. We ended up with a nice hardware load-balancer sitting in front of multiple ColdFusion web servers. The problems we found were that our application depended upon session state, which was stored (of course) on a single web server. The load-balancer allowed us to define sticky sessions, which is what we did, but it cost us.
We couldn’t really balance load – we could balance session counts, sort of. Every time a new session started, the balancer would pick which server would handle that session – for the full duration of the session. Some sessions might be short and little load, while others may be very long and represent a huge amount of load.
We also had a limited HA solution. We implemented a heartbeat function so that when a web server went offline, the load-balancer would re-route affected users to an available server. But because the session data was stored on the original server, the user had to log in again and recreate session data. If the user was in the middle of a complex transaction, too bad.
The above problem also made maintenance a pain. We could reconfigure the load-balancer on the fly to stop using a specific server for new sessions, but we couldn’t take that web server offline until all of the user sessions on that machine terminated. That might take 5 minutes, or it might take 5 hours.
As I said, I’m no LVS expert, but I would expect similar problems when using it as a load-balancer for MySQL Cluster. I suspect that only new connection requests are balanced, making persistent connections (like common Java connection pools) “sticky” to whatever machine the connection was originally assigned. You probably cannot balance load at anything less than “connection” level, while Connector/J will rebalance after transactions or communications errors. And anytime you lack the ability to redistribute load except at new connections, taking servers offline for maintenance will be problematic (Connector/J 5.1.13 provides a new mechanism to facilitate interruption-free maintenance, which I intend to blog about later).
This means that it probably works best when using other connectors which don’t support load-balancing, or with applications that don’t use persistent connections, but I wouldn’t use it instead of Connector/J’s load-balancing, and I definitely would not use it with Connector/J’s load-balancing – Connector/J won’t understand that multiple MySQL server instances live behind a single address, and won’t be able to coordinate load-balancing with LVS.
PlanetMySQL Voting:
Vote UP /
Vote DOWN";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:11:"Todd Farmer";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:2;a:6:{s:4:"data";s:58:"
";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:29:"Guidelines for generating XML";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:73:"tag:www.rooftopsolutions.nl,2010:guidelines-for-generating-xml/1278694395";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:89:"http://feedproxy.google.com/~r/bijsterespoor/~3/VGdaa0laqNY/guidelines-for-generating-xml";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:4682:"Over the last little while I've come across quite a few XML feed generators written in PHP, with varying degrees of 'correctness'. Even though generating XML should be very simple, there's still quite a bit of pitfalls I feel every PHP or (insert your language)-developer should know about.1. You are better off using an XML libraryThis is the first and foremost rule. Most people end up generating their xml using simple string concatenation, while there are many dedicated tools out there that really help you generate your own XML.In PHP land the best example is XMLWriter. It is actually quite easy to use:<?php $xmlWriter = new XMLWriter();$xmlWriter->openMemory();$xmlWriter->startDocument('1.0','UTF-8');$xmlWriter->startElement('root');$xmlWriter->text('Contents of the root tag');$xmlWriter->endElement(); // root$xmlWriter->endDocument();echo $xmlWriter->outputMemory(); ?>Granted, XMLWriter is verbose, but you have to worry a lot less about escaping and validating your xml documents.2. Understand UnicodeDo you know the difference between a byte, a character and a codepoint? If you don't, I'd probably think twice about hiring you. It's absolutely shocking how many programmers are out there that don't understand the basics of unicode, UTF-8 and how it relates to the web.An often-heard excuse for not having to care for non-ascii characters, such as people in English speaking countries. However, if you need to use the euro-sign (€) or if you deal with people copy-pasting from word documents, you most definitely will come across problems.A simple call to utf8_encode is not actually enough. If some of your source-data was already encoded as UTF-8 you will end up losing data. Only use utf8_encode if you know your source-data is encoded as ISO-8859-1.The one true way to go about it, is to make sure that every step of the way in your web application is UTF-8. Including your HTTP/HTML contenttype, MySQL database and anything that basically ingests data for your application (email, csv importers, xml readers, web services). Once you are absolutely sure every part in your application is UTF-8, and converted any old data things will start to behave correctly.3. CDATA is never a solutionIt might be tempting to solve any encoding issues by simply surrounding it with <![CDATA[ and ]]>. This might make sure that XML parsers don't throw an error when reading, but they still have 'incorrect' characters. If your XML document has CDATA tags, or you think you need CDATA, you are probably wrong.More often than not using CDATA actually stems from encoding problems (see section 2). CDATA is not a method to encode binary characters, xml parsers will still throw errors if they come across certain byte sequences. If you do really need to encode binary data in XML, the best way is to use something like base64_encode instead.If your XML feed uses CDATA because of encoding issues you actually defer your problem to the consumer of your XML feed. So instead of seeing 'weird characters' on your side, the person that reads your xml feed now has no good way to detect which encoding was actually used. If it's for example an RSS feed you're generating, this can result in RSS readers throwing errors, or characters showing up incorrectly.4. Be liberal with whitespaceAn error like "unexpected character at line 1, column 176456" is much harder to debug than "line 5078, column 24". Whitespace between xml tags does usually not have any significance, so you can add as much indentation and linebreaks (\n) as you want. Note that tools such as XMLWriter will indent for you automatically.5. Be verboseEven though you might easily figure out that <ORD_NR> means 'order number', there's no reason to actually state it as <order-number>. Note that the following rules appear to fall in favor for most people: Use lowercase for tags and attribute names. Use dashes (-) to separate words, not underscores (_). Minimize the use of attributes, nested tags allow more flexibility.6. Be careful with entitiesThe only valid entities in XML are < (<), > (>) & (&) and " ("), so any other entity will simply not work and throw errors.HTML DTD's add many entities, so if you're mostly used to using HTML you might expect other entities to work. If your source-data already has entities, you might have to get rid of these first.In PHP it means you should use htmlspecialchars, instead of htmlentities.Feel free to discuss, disagree, or add on to this list in the comments, I'm happy to hear your experiences.";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Fri, 09 Jul 2010 17:19:59 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:4:{i:0;a:5:{s:4:"data";s:5:"cdata";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:1;a:5:{s:4:"data";s:3:"php";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:2;a:5:{s:4:"data";s:7:"unicode";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:3;a:5:{s:4:"data";s:3:"xml";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:6649:"Over the last little while I've come across quite a few XML feed generators written in PHP, with varying degrees of 'correctness'. Even though generating XML should be very simple, there's still quite a bit of pitfalls I feel every PHP or (insert your language)-developer should know about.
1. You are better off using an XML library
This is the first and foremost rule. Most people end up generating their xml using simple string concatenation, while there are many dedicated tools out there that really help you generate your own XML.
In PHP land the best example is XMLWriter. It is actually quite easy to use:
<?php
$xmlWriter = new XMLWriter();
$xmlWriter->openMemory();
$xmlWriter->startDocument('1.0','UTF-8');
$xmlWriter->startElement('root');
$xmlWriter->text('Contents of the root tag');
$xmlWriter->endElement(); // root
$xmlWriter->endDocument();
echo $xmlWriter->outputMemory();
?>
Granted, XMLWriter is verbose, but you have to worry a lot less about escaping and validating your xml documents.
2. Understand Unicode
Do you know the difference between a byte, a character and a codepoint? If you don't, I'd probably think twice about hiring you. It's absolutely shocking how many programmers are out there that don't understand the basics of unicode, UTF-8 and how it relates to the web.
An often-heard excuse for not having to care for non-ascii characters, such as people in English speaking countries. However, if you need to use the euro-sign (€) or if you deal with people copy-pasting from word documents, you most definitely will come across problems.
A simple call to utf8_encode is not actually enough. If some of your source-data was already encoded as UTF-8 you will end up losing data. Only use utf8_encode if you know your source-data is encoded as ISO-8859-1.
The one true way to go about it, is to make sure that every step of the way in your web application is UTF-8. Including your HTTP/HTML contenttype, MySQL database and anything that basically ingests data for your application (email, csv importers, xml readers, web services). Once you are absolutely sure every part in your application is UTF-8, and converted any old data things will start to behave correctly.
3. CDATA is never a solution
It might be tempting to solve any encoding issues by simply surrounding it with <![CDATA[ and ]]>. This might make sure that XML parsers don't throw an error when reading, but they still have 'incorrect' characters. If your XML document has CDATA tags, or you think you need CDATA, you are probably wrong.
More often than not using CDATA actually stems from encoding problems (see section 2). CDATA is not a method to encode binary characters, xml parsers will still throw errors if they come across certain byte sequences. If you do really need to encode binary data in XML, the best way is to use something like base64_encode instead.
If your XML feed uses CDATA because of encoding issues you actually defer your problem to the consumer of your XML feed. So instead of seeing 'weird characters' on your side, the person that reads your xml feed now has no good way to detect which encoding was actually used. If it's for example an RSS feed you're generating, this can result in RSS readers throwing errors, or characters showing up incorrectly.
4. Be liberal with whitespace
An error like "unexpected character at line 1, column 176456" is much harder to debug than "line 5078, column 24". Whitespace between xml tags does usually not have any significance, so you can add as much indentation and linebreaks (\n) as you want. Note that tools such as XMLWriter will indent for you automatically.
5. Be verbose
Even though you might easily figure out that <ORD_NR> means 'order number', there's no reason to actually state it as <order-number>. Note that the following rules appear to fall in favor for most people:
- Use lowercase for tags and attribute names.
- Use dashes (-) to separate words, not underscores (_).
- Minimize the use of attributes, nested tags allow more flexibility.
6. Be careful with entities
The only valid entities in XML are < (<), > (>) & (&) and " ("), so any other entity will simply not work and throw errors.
HTML DTD's add many entities, so if you're mostly used to using HTML you might expect other entities to work. If your source-data already has entities, you might have to get rid of these first.
In PHP it means you should use htmlspecialchars, instead of htmlentities.
Feel free to discuss, disagree, or add on to this list in the comments, I'm happy to hear your experiences.

PlanetMySQL Voting:
Vote UP /
Vote DOWN";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:9:"Evert Pot";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:3;a:6:{s:4:"data";s:38:"
";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:5:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:37:"New home for Dallas MySQL Users Group";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:70:"tag:blogger.com,1999:blog-8575059197193667898.post-4070331892183595324";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:82:"http://dave-stokes.blogspot.com/2010/07/new-home-for-dallas-mysql-users-group.html";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:470:"The Dallas Oracle Users Group has voted unanimously to accept the North Texas MySQL Users Group as a special interest group within their organization. This will give us wide exposure to the local DBA world, provide more venues for presentation (Irving and Plano), and the ability to have food & drink at meetings. New presentations are being worked on for presentations including 'No SQL -- is traditional row based SQL in trouble'? Watch here for meeting information.";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Fri, 09 Jul 2010 15:06:00 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:984:"The Dallas Oracle Users Group has voted unanimously to accept the North Texas MySQL Users Group as a special interest group within their organization. This will give us wide exposure to the local DBA world, provide more venues for presentation (Irving and Plano), and the ability to have food & drink at meetings.
New presentations are being worked on for presentations including 'No SQL -- is traditional row based SQL in trouble'? Watch here for meeting information.
PlanetMySQL Voting:
Vote UP /
Vote DOWN";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:11:"Dave Stokes";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:4;a:6:{s:4:"data";s:63:"
";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:39:"DBJ: Introduction to Multi-Master MySQL";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:34:"http://oracleopensource.com/?p=151";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:78:"http://oracleopensource.com/2010/07/09/dbj-introduction-to-multi-master-mysql/";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:185:"This month on Database Journal we talk about multi-master MySQL using circular replication to achieve high availability.
Read more at DatabaseJournal – Intro to Multi-Master MySQL";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Fri, 09 Jul 2010 15:00:10 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:5:{i:0;a:5:{s:4:"data";s:15:"databasejournal";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:1;a:5:{s:4:"data";s:17:"high availability";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:2;a:5:{s:4:"data";s:5:"mysql";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:3;a:5:{s:4:"data";s:3:"MMM";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:4;a:5:{s:4:"data";s:18:"multi-master mysql";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:504:"This month on Database Journal we talk about multi-master MySQL using circular replication to achieve high availability.
Read more at DatabaseJournal – Intro to Multi-Master MySQL
PlanetMySQL Voting:
Vote UP /
Vote DOWN";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:9:"Sean Hull";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:5;a:6:{s:4:"data";s:68:"
";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:90:"Free webinar – Scaling web apps with MySQL (an alternative to the MEMORY storage engine)";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:32:"http://www.clusterdb.com/?p=1155";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:116:"http://www.clusterdb.com/mysql/free-webinar-scaling-web-apps-with-mysql-an-alternative-to-the-memory-storage-engine/";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:1521:"Mat Keep and I will be presenting this free webinar on Wednesday 14 July.
The MEMORY storage engine has been widely adopted by MySQL users to provide near-instant responsiveness with use cases such as caching and web session management. As these services evolve to support more users, so the scalability and availability demands can start to exceed the capabilities of the MEMORY storage engine.
The MySQL Cluster database, which itself can be implemented as a MySQL storage engine, is a viable alternative to address these evolving web service demands. MySQL Cluster can be configured and run in the same way as the MEMORY storage engine (ie on a single host with no replication and no persistence). As web services evolve, any of these attributes can then be added in any combination to deliver higher levels of scalability, availability and database functionality, especially for those workloads which predominately access data by the primary key.
As always, the webinar is free of charge but you will need to register here.
Time:
Wed, Jul 14: 06:00 Hawaii time
Wed, Jul 14: 09:00 Pacific time (America)
Wed, Jul 14: 10:00 Mountain time (America)
Wed, Jul 14: 11:00 Central time (America)
Wed, Jul 14: 12:00 Eastern time (America)
Wed, Jul 14: 16:00 UTC
Wed, Jul 14: 17:00 Western European time
Wed, Jul 14: 18:00 Central European time
Wed, Jul 14: 19:00 Eastern European time
If you can’t make the live webinar then register anyway and you’ll get sent a link to the recording after the event.";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Fri, 09 Jul 2010 14:23:00 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:6:{i:0;a:5:{s:4:"data";s:5:"MySQL";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:1;a:5:{s:4:"data";s:13:"MySQL Cluster";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:2;a:5:{s:4:"data";s:17:"MySQL Cluster CGE";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:3;a:5:{s:4:"data";s:3:"web";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:4;a:5:{s:4:"data";s:7:"Webcast";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:5;a:5:{s:4:"data";s:7:"webinar";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:2369:"Mat Keep and I will be presenting this free webinar on Wednesday 14 July. 
The MEMORY storage engine has been widely adopted by MySQL users to provide near-instant responsiveness with use cases such as caching and web session management. As these services evolve to support more users, so the scalability and availability demands can start to exceed the capabilities of the MEMORY storage engine.
The MySQL Cluster database, which itself can be implemented as a MySQL storage engine, is a viable alternative to address these evolving web service demands. MySQL Cluster can be configured and run in the same way as the MEMORY storage engine (ie on a single host with no replication and no persistence). As web services evolve, any of these attributes can then be added in any combination to deliver higher levels of scalability, availability and database functionality, especially for those workloads which predominately access data by the primary key.
As always, the webinar is free of charge but you will need to register here.
Time:
- Wed, Jul 14: 06:00 Hawaii time
- Wed, Jul 14: 09:00 Pacific time (America)
- Wed, Jul 14: 10:00 Mountain time (America)
- Wed, Jul 14: 11:00 Central time (America)
- Wed, Jul 14: 12:00 Eastern time (America)
- Wed, Jul 14: 16:00 UTC
- Wed, Jul 14: 17:00 Western European time
- Wed, Jul 14: 18:00 Central European time
- Wed, Jul 14: 19:00 Eastern European time
If you can’t make the live webinar then register anyway and you’ll get sent a link to the recording after the event.
PlanetMySQL Voting:
Vote UP /
Vote DOWN";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:13:"Andrew Morgan";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:6;a:6:{s:4:"data";s:53:"
";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:46:"Linux cluster stack Debian packages for lenny!";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:34:"http://fghaas.wordpress.com/?p=546";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:85:"http://fghaas.wordpress.com/2010/07/09/linux-cluster-stack-debian-packages-for-lenny/";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:406:"Thanks to Martin Loschwitz, official Linux cluster stack Debian packages are now available for Debian lenny. Check out the the lenny-backports repository on backports.org (and your favorite local mirror).
When upgrading from Heartbeat 2.1.3, this webinar recording may come in handy. The webinar covers an upgrade to squeeze, but the upgrade procedure is identical if you’re staying on lenny.
";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Fri, 09 Jul 2010 06:00:48 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:3:{i:0;a:5:{s:4:"data";s:6:"Debian";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:1;a:5:{s:4:"data";s:9:"Heartbeat";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:2;a:5:{s:4:"data";s:9:"Pacemaker";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:2088:"Thanks to Martin Loschwitz, official Linux cluster stack Debian packages are now available for Debian lenny. Check out the the lenny-backports repository on backports.org (and your favorite local mirror).
When upgrading from Heartbeat 2.1.3, this webinar recording may come in handy. The webinar covers an upgrade to squeeze, but the upgrade procedure is identical if you’re staying on lenny.

PlanetMySQL Voting:
Vote UP /
Vote DOWN";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:15:"Florian G. Haas";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:7;a:6:{s:4:"data";s:68:"
";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:88:"Using EXPLAIN EXTENDED / SHOW WARNINGS to Help Troubleshoot Inefficient Queries in MySQL";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:35:"http://www.chriscalender.com/?p=118";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:35:"http://www.chriscalender.com/?p=118";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:5228:"When examining the execution plan of troublesome queries in MySQL, most users are aware of using EXPLAIN. However, an often overlooked, yet very helpful extension of EXPLAIN, is EXPLAIN EXTENDED coupled with the SHOW WARNINGS command.
The reason being is because it provides a little more information about how the optimizer processes the query, and thus it could help to quickly identify a problem that you might not otherwise recognize with just EXPLAIN.
For instance, here is a common query which could be inefficient:
SELECT id FROM t WHERE id='1';
And here is the CREATE TABLE output:
mysql> show create table t\G
*************************** 1. row ***************************
Table: t
Create Table: CREATE TABLE `t` (
`id` decimal(10,0) default NULL,
KEY `id` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)
You can see it’s a very basic table, and a very basic query.
And looking at the EXPLAIN output, everything still appears normal:
mysql> explain select id from t where id='1';
+----+-------------+-------+------+---------------+------+---------+-------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+-------+------+--------------------------+
| 1 | SIMPLE | t | ref | id | id | 6 | const | 1 | Using where; Using index |
+----+-------------+-------+------+---------------+------+---------+-------+------+--------------------------+
1 row in set (0.00 sec)
However, now let’s look at the EXPLAIN EXTENDED and SHOW WARNINGS output:
mysql> explain extended select id from t where id='1';
+----+-------------+-------+------+---------------+------+---------+-------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+-------+------+--------------------------+
| 1 | SIMPLE | t | ref | id | id | 6 | const | 1 | Using where; Using index |
+----+-------------+-------+------+---------------+------+---------+-------+------+--------------------------+
1 row in set, 1 warning (0.00 sec)
mysql> show warnings;
+-------+------+----------------------------------------------------------------------------------------------+
| Level | Code | Message |
+-------+------+----------------------------------------------------------------------------------------------+
| Note | 1003 | select `testing`.`t`.`id` AS `id` from `testing`.`t` where (`testing`.`t`.`id` = _latin1'1') |
+-------+------+----------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
In the WHERE clause of the SHOW WARNINGS output, you see this:
where (`testing`.`t`.`id` = _latin1'1')
The _latin1′1′ is key, as it shows the value of 1 is getting cast as a latin1 value. This is because we surrounded the 1 with single quotes in the query.
Note that this does not happen if you do not surround the 1 with single quotes:
mysql> show warnings;
+-------+------+-------------------------------------------------------------------------------------+
| Level | Code | Message |
+-------+------+-------------------------------------------------------------------------------------+
| Note | 1003 | select `testing`.`t`.`id` AS `id` from `testing`.`t` where (`testing`.`t`.`id` = 1) |
+-------+------+-------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
In this example, the constant value of 1 is being cast, which is not horrible, but still not as efficient as it should be. However, depending on how the query is written, I’ve seen where it might choose to cast/convert the row retrived from the database (especially in a JOIN scenario - it will need to convert one column if they differ). These conversions or casts can be very inefficient (especially if within a JOIN or worse, a sub-query), and would not be otherwise caught without using EXPLAIN EXTENDED / SHOW WARNINGS.
And in addition to the above example, it is common to see similar types of conversions occurring if you try to match two columns whose character sets differ. This can be another easy-to-overlook problem, which can lead to poor performance, and you wouldn’t know it wasn’t running efficiently unless you used EXPLAIN EXTENDED (or unless it brings your system to a halt during a heavy load ..).
So in conclusion, consider using EXPLAIN EXTENDED / SHOW WARNINGS in lieu of just EXPLAIN when analyzing your queries.
Note: If you are using pre-5.1.46, you may want to consider running both EXPLAIN and EXPLAIN EXTENDED to ensure the optimization plans match. This is due to the following bug which existed:
http://bugs.mysql.com/bug.php?id=47669
Click here to give this article a “Thumbs Up” on Planet MySQL !
";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Fri, 09 Jul 2010 01:37:27 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:6:{i:0;a:5:{s:4:"data";s:5:"MySQL";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:1;a:5:{s:4:"data";s:16:"charset mismatch";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:2;a:5:{s:4:"data";s:7:"EXPLAIN";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:3;a:5:{s:4:"data";s:16:"EXPLAIN EXTENDED";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:4;a:5:{s:4:"data";s:18:"query optimization";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:5;a:5:{s:4:"data";s:13:"SHOW WARNINGS";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:6285:"When examining the execution plan of troublesome queries in MySQL, most users are aware of using EXPLAIN. However, an often overlooked, yet very helpful extension of EXPLAIN, is EXPLAIN EXTENDED coupled with the SHOW WARNINGS command.
The reason being is because it provides a little more information about how the optimizer processes the query, and thus it could help to quickly identify a problem that you might not otherwise recognize with just EXPLAIN.
For instance, here is a common query which could be inefficient:
SELECT id FROM t WHERE id='1';
And here is the CREATE TABLE output:
mysql> show create table t\G
*************************** 1. row ***************************
Table: t
Create Table: CREATE TABLE `t` (
`id` decimal(10,0) default NULL,
KEY `id` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)
You can see it’s a very basic table, and a very basic query.
And looking at the EXPLAIN output, everything still appears normal:
mysql> explain select id from t where id='1';
+----+-------------+-------+------+---------------+------+---------+-------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+-------+------+--------------------------+
| 1 | SIMPLE | t | ref | id | id | 6 | const | 1 | Using where; Using index |
+----+-------------+-------+------+---------------+------+---------+-------+------+--------------------------+
1 row in set (0.00 sec)
However, now let’s look at the EXPLAIN EXTENDED and SHOW WARNINGS output:
mysql> explain extended select id from t where id='1';
+----+-------------+-------+------+---------------+------+---------+-------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+-------+------+--------------------------+
| 1 | SIMPLE | t | ref | id | id | 6 | const | 1 | Using where; Using index |
+----+-------------+-------+------+---------------+------+---------+-------+------+--------------------------+
1 row in set, 1 warning (0.00 sec)
mysql> show warnings;
+-------+------+----------------------------------------------------------------------------------------------+
| Level | Code | Message |
+-------+------+----------------------------------------------------------------------------------------------+
| Note | 1003 | select `testing`.`t`.`id` AS `id` from `testing`.`t` where (`testing`.`t`.`id` = _latin1'1') |
+-------+------+----------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
In the WHERE clause of the SHOW WARNINGS output, you see this:
where (`testing`.`t`.`id` = _latin1'1')
The _latin1′1′ is key, as it shows the value of 1 is getting cast as a latin1 value. This is because we surrounded the 1 with single quotes in the query.
Note that this does not happen if you do not surround the 1 with single quotes:
mysql> show warnings;
+-------+------+-------------------------------------------------------------------------------------+
| Level | Code | Message |
+-------+------+-------------------------------------------------------------------------------------+
| Note | 1003 | select `testing`.`t`.`id` AS `id` from `testing`.`t` where (`testing`.`t`.`id` = 1) |
+-------+------+-------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
In this example, the constant value of 1 is being cast, which is not horrible, but still not as efficient as it should be. However, depending on how the query is written, I’ve seen where it might choose to cast/convert the row retrived from the database (especially in a JOIN scenario - it will need to convert one column if they differ). These conversions or casts can be very inefficient (especially if within a JOIN or worse, a sub-query), and would not be otherwise caught without using EXPLAIN EXTENDED / SHOW WARNINGS.
And in addition to the above example, it is common to see similar types of conversions occurring if you try to match two columns whose character sets differ. This can be another easy-to-overlook problem, which can lead to poor performance, and you wouldn’t know it wasn’t running efficiently unless you used EXPLAIN EXTENDED (or unless it brings your system to a halt during a heavy load ..).
So in conclusion, consider using EXPLAIN EXTENDED / SHOW WARNINGS in lieu of just EXPLAIN when analyzing your queries.
Note: If you are using pre-5.1.46, you may want to consider running both EXPLAIN and EXPLAIN EXTENDED to ensure the optimization plans match. This is due to the following bug which existed:
http://bugs.mysql.com/bug.php?id=47669
Click here to give this article a “Thumbs Up” on Planet MySQL
!
PlanetMySQL Voting:
Vote UP /
Vote DOWN";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:14:"Chris Calender";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:8;a:6:{s:4:"data";s:63:"
";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:60:"MySQL/MariaDB replication: applying events on the slave side";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:49:"http://kristiannielsen.livejournal.com/13382.html";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:49:"http://kristiannielsen.livejournal.com/13382.html";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:12741:"
Working on a new set of replication APIs in MariaDB, I have
given some thought
to the generation of replication events on the master server.
But there is another side of the equation: to apply the generated events on a
slave server. This is something that most replication setups will need (unless
they replicate to non-MySQL/MariaDB slaves). So it will be good to provide a
generic interface for this, otherwise every binlog-like plugin implementation
will have to re-invent this themselves.
A central idea in the current design for generating events is that we do not
enforce a specific content of events. Instead, the API provides accessors for
a lot of different information related to each event, allowing the plugin
flexibility in choosing what to include in a particular event format. For
example, one plugin may request column names for a row-based UPDATE event;
another plugin may not need them and can avoid any overhead related to column
names simply by not requesting them.
To get the same flexibility on the slave side, the roles of plugin and API are
reversed. Here, the plugin will have a certain pre-determined (by the
particular event format implemented) set of information related to the event
available. And the API must make do with whatever information it is provided
(or fail gracefully if essential information is missing).
My idea is that the event application API will provide corresponding events to
the events in the generation API. Each application event will have "provider"
methods corresponding to the accessor methods of the generator API. So the
plugin that wants to apply an event can obtain an event generator object, call
the appropriate provider methods for all the information available, and
finally ask the API to execute the event with the provided context
information.
This is only an abstract idea at this point; there are lots of details to take
care of to make this idea into a concrete design proposal. And I have not
fully decided if such an API will be part of the replication project or
not. But I like the idea so far.
Understanding how MySQL binlog events are applied on the slave
I wanted to get a better understanding of what is involved in an event
application API like the one described above. So I did a similar exercise to
the one I wrote about in my
last post,
where I went through in detail all the information that the existing MySQL
binlog format includes. This time I went through the details in the code that
applies MySQL binlog events on a slave.
Again, I concentrate on the actual events that change the database, ignoring
(most) details that relate only to the particular binlog format used by MySQL
(and there are quite a few :-).
At the top level, the slave SQL thread
(in exec_relay_log_event()) reads events
(next_event()) from the relay logs and executes
(apply_event_and_update_pos()) them.
There are a number of intricate details here relating to switching to a new
relay log (and purging old ones), and re-winding in the relay log to re-try a
failed transaction (eg. in case of deadlock or the like). This is mostly
specific to the particular binlog implementation.
The actual data changes are mostly done in the do_apply_event()
methods of each event class in sql/log_event.cc. I will go
briefly through this method for the events that are used to change actual data
in current MySQL replication. It is relatively easy to read a particular
detail out of the code, since it is all located in
these do_apply_event() methods.
(One problem however is that there are special cases sprinkled across the code
where special action is taken (or not taken) when running in the slave SQL
thread. I have not so far tried to determine the full list of such special
cases, or access how many there are).
Query_log_event
The main task done here is to set up various context in the THD,
execute the query, and then perform necessary cleanup. If the query throws an
error, there is also some fairly complex logic to handle this error correctly;
for example to ignore certain errors, to require same error as on master (if
the query failed on the master in some particular way), and to re-try the
query/transaction for certain errors (like deadlock).
There are also some hard-coded special cases for NDB (this seems to be a
common theme in the replication code).
The main think that to my eyes make this part of the code complex is the set
of actions taken to prepare context before executing the query, and to clean
up after execution. Each individual step in the code is in fact relatively
easy to follow (and often the commenting is quite good). The problem is that
there are so many individual steps. It is very hard to feel sure that exactly
this set of actions is sufficient (and that none are redundant for that
matter).
For example, code like this (not complete, just a random part of the setup):
thd->set_time((time_t)when);
thd->set_query((char*)query_arg, q_len_arg);
VOID(pthread_mutex_lock(&LOCK_thread_count));
thd->query_id = next_query_id();
VOID(pthread_mutex_unlock(&LOCK_thread_count));
thd->variables.pseudo_thread_id= thread_id;
It seems very easy to forget to assign query_id or whatever if
one was to write this from scratch. That is something I would really like to
improve in an API for replication plugins: it should be possible to understand
completely exactly what setup and cleanup is needed around event execution,
and there should be appropriate methods to achieve such setup/cleanup.
Another thing that is interesting in the code
for Query_log_event::do_apply_event() is that the code
does strcmp() on the query against keywords like COMMIT,
SAVEPOINT, and ROLLBACK, and follows a fairly different execution path for
these. This seems to bypass the SQL parser! But in reality, these particular
queries are generated on the master with special code in the server that
carefully restricts the possible format (eg. no whitespace or comments
etc). So in effect, this is just a way to hack in special event types for
these special queries without actually adding new binlog format events.
Rand_log_event, Intvar_log_event, and User_var_log_event
The action taken for these events is essentially to update
the THD with the information in the event: value of random
seed, LAST_INSERT_ID/INSERT_ID,
or @user_variable.
Xid_log_event
On the slave side, this event is essentially a COMMIT. One thing that springs
to mind is how different the code to handle this event is from the special
code in Query_log_event that handles a query "COMMIT". Again, it
is very hard to tell from the code if this is a bug, or if the
different-looking code is in fact equivalent.
Begin_load_query_log_event, Append_block_log_event, Delete_file_log_event, and Execute_load_query_log_event
This implements the LOAD DATA INFILE query, which needs special
handling as it originally references a file on the master server (or on the
client machine). The actual execution of the query is handled the same way as
for normal queries (Execute_load_query_log_event is a sub-class
of Query_log_event), but some preparation is needed first to
write the data from the events in the relay log into a temporary file on the
slave.
The main thing one notices about this code is how it handles re-writing the
LOAD DATA query to use a new temporary file name on the
slave. Take for example this query:
LOAD DATA CONCURRENT /**/ LOCAL INFILE 'foobar.dat' REPLACE INTO /**/ TABLE ...
The event includes offsets into the query string of the two places
marked /**/ in the example. This part of the query is then
re-written in the slave code (so it is not just replacing the
filename). Again, the code by-passes the SQL parser, it just so happens that
the SQL syntax in this case is sufficiently simple that this is not too
hackish to do. If one were to check, one would probably see that any user comments
in this particular part of the query string disappear in the slave
binlog if --log-slave-updates is enabled.
Table_map_log_event
This just links a struct RPL_TABLE_LIST into a list, containing
information about the tables described by the event. The actual opening of the
table is done when executing the first row event (WRITE/UPDATE/DELETE).
Write_rows_log_event, Update_rows_log_event, and Delete_rows_log_event
These are what handle the application of row-based replication events on a slave.
The first of the row events causes some setup to be done, a partial extract is this:
lex_start(thd);
mysql_reset_thd_for_next_command(thd, 0);
thd->transaction.stmt.modified_non_trans_table= FALSE;
query_cache.invalidate_locked_for_write(rli->tables_to_lock);
(As remarked for Query_log_event, while row event application
setup is somewhat simpler, it still appears a bit magic that exactly these
setups are sufficient and necessary).
The code also switches to row-based binlogging for the following row
operations (this is for --log-slave-updates, as it is
not possible to binlog the application of row-based events as statement-based
events in the slave binlog). This is by the way an interesting challenge for a
generic replication API: how does one handle binlogging of events applied on a
slave, for daisy-chaining replication servers? This of course gets more
interesting if one were to use a different binlogging plugin on the slave than
on the master. I need to think more about it, but this seems to be a pretty
strong argument that a generic event application API is needed, which hooks
into event generators to properly generate all needed events for the updates
done on slaves. Another important aspect is the support of a global
transaction ID, that will identify a transaction uniquely across an entire
replication setup to make migrating slaves to a new master easier. Such a
global transaction ID also needs to be preserved in a slave binlog when
replicating events from a master.
During execution of the first row event, the code also sets the flags for
foreign key checks and unique checks that is included in the event from the master.
And it checks if the event should be skipped
(for --replicate-ignore-db and friends).
A check is made to ensure that the table(s) on the slave are compatible with
the tables on the master (as described in the table map event(s) received just
before the row event(s)). The MySQL row-based replication has a fair bit of
flexibility in terms of allowing differences in tables between master and
slave, such as allowing different column types, different storage engine, or
even extra columns on the slave table. In particular allowing extra columns
raises some issues about default values etc. for these columns, though I did
not really go into details about this.
Again, there are a number of hard-coded differences for NDB.
For Write_rows_log_event, flags need to be set to ensure that
column values for AUTO_INCREMENT and TIMESTAMP
columns are taken from the supplied values, not
auto-generated. If slave_exec_mode=IDEMPOTENT, an INSERT that
fails due to already existing row does not cause replication to fail; instead
an UPDATE is tried, or in some cases (like if there are foreign keys) a DELETE
+ re-tried INSERT. There is also code to hint storage engines about the
approximate number of rows that are part of a bulk insert.
For Update_rows_log_event and Delete_rows_log_event,
the code needs to locate the row to update/delete. This is done by primary key
if one exists on the slave table (the current binlogging always includes every
column in the before image of the row, so the primary key value of the row to
modify is always available). But there is also support for tables with no
primary key, in which the first index on the table is used to locate the row
(if any), or failing that a full table scan. This btw. is a good reminder
to not use row-based replication with primary-key-less tables with
non-trivial amount of rows: every row operation applied will need a
full table scan!
Finally, the values from the event are unpacked into a row buffer in the
format used by MySQL storage engines, and the read_set
and write_set are set up (current replication always includes all
columns in row operations), before the
actual ha_write_row(), ha_update_row(),
or ha_delete_row() call into the storage engine handler is made
to perform the actual update. Note that a single row event can include
multiple rows, which are applied one after the other.
Final words
And that is it! Quite a bit of detail, but again I found it very useful to
create this complete overview; it will make things easier when re-implementing
this in a new replication API.
";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Thu, 08 Jul 2010 22:33:07 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:5:{i:0;a:5:{s:4:"data";s:12:"freesoftware";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:1;a:5:{s:4:"data";s:7:"mariadb";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:2;a:5:{s:4:"data";s:5:"mysql";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:3;a:5:{s:4:"data";s:11:"programming";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:4;a:5:{s:4:"data";s:11:"replication";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:13987:"
Working on a new set of replication APIs in MariaDB, I have
given some thought
to the generation of replication events on the master server.
But there is another side of the equation: to apply the generated events on a
slave server. This is something that most replication setups will need (unless
they replicate to non-MySQL/MariaDB slaves). So it will be good to provide a
generic interface for this, otherwise every binlog-like plugin implementation
will have to re-invent this themselves.
A central idea in the current design for generating events is that we do not
enforce a specific content of events. Instead, the API provides accessors for
a lot of different information related to each event, allowing the plugin
flexibility in choosing what to include in a particular event format. For
example, one plugin may request column names for a row-based UPDATE event;
another plugin may not need them and can avoid any overhead related to column
names simply by not requesting them.
To get the same flexibility on the slave side, the roles of plugin and API are
reversed. Here, the plugin will have a certain pre-determined (by the
particular event format implemented) set of information related to the event
available. And the API must make do with whatever information it is provided
(or fail gracefully if essential information is missing).
My idea is that the event application API will provide corresponding events to
the events in the generation API. Each application event will have "provider"
methods corresponding to the accessor methods of the generator API. So the
plugin that wants to apply an event can obtain an event generator object, call
the appropriate provider methods for all the information available, and
finally ask the API to execute the event with the provided context
information.
This is only an abstract idea at this point; there are lots of details to take
care of to make this idea into a concrete design proposal. And I have not
fully decided if such an API will be part of the replication project or
not. But I like the idea so far.
Understanding how MySQL binlog events are applied on the slave
I wanted to get a better understanding of what is involved in an event
application API like the one described above. So I did a similar exercise to
the one I wrote about in my
last post,
where I went through in detail all the information that the existing MySQL
binlog format includes. This time I went through the details in the code that
applies MySQL binlog events on a slave.
Again, I concentrate on the actual events that change the database, ignoring
(most) details that relate only to the particular binlog format used by MySQL
(and there are quite a few :-).
At the top level, the slave SQL thread
(in exec_relay_log_event()) reads events
(next_event()) from the relay logs and executes
(apply_event_and_update_pos()) them.
There are a number of intricate details here relating to switching to a new
relay log (and purging old ones), and re-winding in the relay log to re-try a
failed transaction (eg. in case of deadlock or the like). This is mostly
specific to the particular binlog implementation.
The actual data changes are mostly done in the do_apply_event()
methods of each event class in sql/log_event.cc. I will go
briefly through this method for the events that are used to change actual data
in current MySQL replication. It is relatively easy to read a particular
detail out of the code, since it is all located in
these do_apply_event() methods.
(One problem however is that there are special cases sprinkled across the code
where special action is taken (or not taken) when running in the slave SQL
thread. I have not so far tried to determine the full list of such special
cases, or access how many there are).
Query_log_event
The main task done here is to set up various context in the THD,
execute the query, and then perform necessary cleanup. If the query throws an
error, there is also some fairly complex logic to handle this error correctly;
for example to ignore certain errors, to require same error as on master (if
the query failed on the master in some particular way), and to re-try the
query/transaction for certain errors (like deadlock).
There are also some hard-coded special cases for NDB (this seems to be a
common theme in the replication code).
The main think that to my eyes make this part of the code complex is the set
of actions taken to prepare context before executing the query, and to clean
up after execution. Each individual step in the code is in fact relatively
easy to follow (and often the commenting is quite good). The problem is that
there are so many individual steps. It is very hard to feel sure that exactly
this set of actions is sufficient (and that none are redundant for that
matter).
For example, code like this (not complete, just a random part of the setup):
thd->set_time((time_t)when);
thd->set_query((char*)query_arg, q_len_arg);
VOID(pthread_mutex_lock(&LOCK_thread_count));
thd->query_id = next_query_id();
VOID(pthread_mutex_unlock(&LOCK_thread_count));
thd->variables.pseudo_thread_id= thread_id;
It seems very easy to forget to assign query_id or whatever if
one was to write this from scratch. That is something I would really like to
improve in an API for replication plugins: it should be possible to understand
completely exactly what setup and cleanup is needed around event execution,
and there should be appropriate methods to achieve such setup/cleanup.
Another thing that is interesting in the code
for Query_log_event::do_apply_event() is that the code
does strcmp() on the query against keywords like COMMIT,
SAVEPOINT, and ROLLBACK, and follows a fairly different execution path for
these. This seems to bypass the SQL parser! But in reality, these particular
queries are generated on the master with special code in the server that
carefully restricts the possible format (eg. no whitespace or comments
etc). So in effect, this is just a way to hack in special event types for
these special queries without actually adding new binlog format events.
Rand_log_event, Intvar_log_event, and User_var_log_event
The action taken for these events is essentially to update
the THD with the information in the event: value of random
seed, LAST_INSERT_ID/INSERT_ID,
or @user_variable.
Xid_log_event
On the slave side, this event is essentially a COMMIT. One thing that springs
to mind is how different the code to handle this event is from the special
code in Query_log_event that handles a query "COMMIT". Again, it
is very hard to tell from the code if this is a bug, or if the
different-looking code is in fact equivalent.
Begin_load_query_log_event, Append_block_log_event, Delete_file_log_event, and Execute_load_query_log_event
This implements the LOAD DATA INFILE query, which needs special
handling as it originally references a file on the master server (or on the
client machine). The actual execution of the query is handled the same way as
for normal queries (Execute_load_query_log_event is a sub-class
of Query_log_event), but some preparation is needed first to
write the data from the events in the relay log into a temporary file on the
slave.
The main thing one notices about this code is how it handles re-writing the
LOAD DATA query to use a new temporary file name on the
slave. Take for example this query:
LOAD DATA CONCURRENT /**/ LOCAL INFILE 'foobar.dat' REPLACE INTO /**/ TABLE ...
The event includes offsets into the query string of the two places
marked /**/ in the example. This part of the query is then
re-written in the slave code (so it is not just replacing the
filename). Again, the code by-passes the SQL parser, it just so happens that
the SQL syntax in this case is sufficiently simple that this is not too
hackish to do. If one were to check, one would probably see that any user comments
in this particular part of the query string disappear in the slave
binlog if --log-slave-updates is enabled.
Table_map_log_event
This just links a struct RPL_TABLE_LIST into a list, containing
information about the tables described by the event. The actual opening of the
table is done when executing the first row event (WRITE/UPDATE/DELETE).
Write_rows_log_event, Update_rows_log_event, and Delete_rows_log_event
These are what handle the application of row-based replication events on a slave.
The first of the row events causes some setup to be done, a partial extract is this:
lex_start(thd);
mysql_reset_thd_for_next_command(thd, 0);
thd->transaction.stmt.modified_non_trans_table= FALSE;
query_cache.invalidate_locked_for_write(rli->tables_to_lock);
(As remarked for Query_log_event, while row event application
setup is somewhat simpler, it still appears a bit magic that exactly these
setups are sufficient and necessary).
The code also switches to row-based binlogging for the following row
operations (this is for --log-slave-updates, as it is
not possible to binlog the application of row-based events as statement-based
events in the slave binlog). This is by the way an interesting challenge for a
generic replication API: how does one handle binlogging of events applied on a
slave, for daisy-chaining replication servers? This of course gets more
interesting if one were to use a different binlogging plugin on the slave than
on the master. I need to think more about it, but this seems to be a pretty
strong argument that a generic event application API is needed, which hooks
into event generators to properly generate all needed events for the updates
done on slaves. Another important aspect is the support of a global
transaction ID, that will identify a transaction uniquely across an entire
replication setup to make migrating slaves to a new master easier. Such a
global transaction ID also needs to be preserved in a slave binlog when
replicating events from a master.
During execution of the first row event, the code also sets the flags for
foreign key checks and unique checks that is included in the event from the master.
And it checks if the event should be skipped
(for --replicate-ignore-db and friends).
A check is made to ensure that the table(s) on the slave are compatible with
the tables on the master (as described in the table map event(s) received just
before the row event(s)). The MySQL row-based replication has a fair bit of
flexibility in terms of allowing differences in tables between master and
slave, such as allowing different column types, different storage engine, or
even extra columns on the slave table. In particular allowing extra columns
raises some issues about default values etc. for these columns, though I did
not really go into details about this.
Again, there are a number of hard-coded differences for NDB.
For Write_rows_log_event, flags need to be set to ensure that
column values for AUTO_INCREMENT and TIMESTAMP
columns are taken from the supplied values, not
auto-generated. If slave_exec_mode=IDEMPOTENT, an INSERT that
fails due to already existing row does not cause replication to fail; instead
an UPDATE is tried, or in some cases (like if there are foreign keys) a DELETE
+ re-tried INSERT. There is also code to hint storage engines about the
approximate number of rows that are part of a bulk insert.
For Update_rows_log_event and Delete_rows_log_event,
the code needs to locate the row to update/delete. This is done by primary key
if one exists on the slave table (the current binlogging always includes every
column in the before image of the row, so the primary key value of the row to
modify is always available). But there is also support for tables with no
primary key, in which the first index on the table is used to locate the row
(if any), or failing that a full table scan. This btw. is a good reminder
to not use row-based replication with primary-key-less tables with
non-trivial amount of rows: every row operation applied will need a
full table scan!
Finally, the values from the event are unpacked into a row buffer in the
format used by MySQL storage engines, and the read_set
and write_set are set up (current replication always includes all
columns in row operations), before the
actual ha_write_row(), ha_update_row(),
or ha_delete_row() call into the storage engine handler is made
to perform the actual update. Note that a single row event can include
multiple rows, which are applied one after the other.
Final words
And that is it! Quite a bit of detail, but again I found it very useful to
create this complete overview; it will make things easier when re-implementing
this in a new replication API.
PlanetMySQL Voting:
Vote UP /
Vote DOWN";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:16:"Kristian Nielsen";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:9;a:6:{s:4:"data";s:38:"
";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:5:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:34:"Upgrading Cassandra 0.5.1 to 0.6.3";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:59:"tag:blogger.com,1999:blog-31421954.post-8880555644188705213";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:72:"http://mysqldba.blogspot.com/2010/07/upgrading-cassandra-051-to-063.html";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:2832:"Every month or so a node randomly diesEQX root@cass01:/opt/cassandra/bin# ./nodeprobe -host localhost -port 8181 ringAddress Status Load Range Ring facebook_1301003235_1301003235 10.129.28.22 Down 15.77 GB 9ZehBzpHHwnxiPJU ||Trying to get info from the host, the reads timeout.java.net.SocketTimeoutException: Read timed outDoing an lsof -p on the java proc I see that it is holding open a bunch of sockets. So the node itself is hanging on something internal is my assumption.Looking at /var/log/cassandra/system.log I see that the last rotation happened Jun 8th over a month ago and no new log is being written to. THe issue is the node just died today. So this seems like a bug to me.Now since Cassandra does not tell me what the problem is, I assume that there is a bug in this version and searching Cassandra Jira bug database I see that a lot of stuff is fixed as well as added. So might as well as upgrade.Before I upgrade I wanted to do research to see if anyone else has. To my surprise there doesn't seem to be any blog talking about upgrading from 0.5 to 0.6.3I know its rather easy but there is some new stuff in 0.6.3 that is turned on by default: So let's see what changes in the confdiff /opt/cassandra/conf /opt/apache-cassandra-0.6.3/confI see that in storage.xml there is some new XML attributes for the ColumnFamily tag such as RowsCached, new tags called HintedHandoffEnabled, Authenticator, DiskAccessMode, RowWarningThresholdInMB.Additional to this I noticed that a lot of XML tags are missing. A rolling upgrade is just not possible and is mentioned in NEWS.txtThus in my application I set this $GLOBALS['cfg']['disable_nosql_feature'] = 1; I have about 40 toggles to play with, a very helpful process to enable dynamically code with out breaking your site.now time for an upgrade without the service running:Steps: Shut down Cassandra: dsh -g cassandra "pkill java" # same thing as stop-server rpm -e cassandra-0.5.1 rpm -ivh cassandra-0.6.3.rpm /opt/cassandra/bin/cassandra Done. Note what the hell is cassandra-0.6.3.rpm, it's an rpm I created that has my storage-conf.xmllog4j.propertiescassandra.in.shAfter Upgrading:***************************************************************WARNING: ./nodeprobe is obsolete, use ./nodetool instead***************************************************************Address Status Load Range Ring facebook_1301003235_1301003235 10.129.28.22 Up 11.75 GB 9ZehBzpHHwnxiPJU ||Now what is left to do it change my ganglia scripts / nagios scripts to use nodetool instead of nodeprobe.";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Thu, 08 Jul 2010 19:58:00 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:4674:"Every month or so a node randomly dies
EQX root@cass01:/opt/cassandra/bin# ./nodeprobe -host localhost -port 8181 ring
Address Status Load Range Ring
facebook_1301003235_1301003235
10.129.28.22 Down 15.77 GB 9ZehBzpHHwnxiPJU |<--|
10.129.28.23 Up 7.59 GB facebook_100000471858343_1514390063 | |
10.129.28.14 Up 4.59 GB facebook_100000846936312 | |
10.129.28.20 Up 12.94 GB facebook_1301003235_1301003235 |-->|
Trying to get info from the host, the reads timeout.
java.net.SocketTimeoutException: Read timed out
Doing an lsof -p on the java proc I see that it is holding open a bunch of sockets. So the node itself is hanging on something internal is my assumption.
Looking at /var/log/cassandra/system.log I see that the last rotation happened Jun 8th over a month ago and no new log is being written to. THe issue is the node just died today. So this seems like a bug to me.
Now since Cassandra does not tell me what the problem is, I assume that there is a bug in this version and searching Cassandra Jira bug database I see that a lot of stuff is fixed as well as added. So might as well as upgrade.
Before I upgrade I wanted to do research to see if anyone else has. To my surprise there doesn't seem to be any blog talking about upgrading from 0.5 to 0.6.3
I know its rather easy but there is some new stuff in 0.6.3 that is turned on by default: So let's see what changes in the conf
diff /opt/cassandra/conf /opt/apache-cassandra-0.6.3/conf
I see that in storage.xml there is some new XML attributes for the ColumnFamily tag such as RowsCached, new tags called HintedHandoffEnabled, Authenticator, DiskAccessMode, RowWarningThresholdInMB.
Additional to this I noticed that a lot of XML tags are missing. A rolling upgrade is just not possible and is mentioned in NEWS.txt
Thus in my application I set this $GLOBALS['cfg']['disable_nosql_feature'] = 1; I have about 40 toggles to play with, a very helpful process to enable dynamically code with out breaking your site.
now time for an upgrade without the service running:
Steps:
- Shut down Cassandra: dsh -g cassandra "pkill java" # same thing as stop-server
- rpm -e cassandra-0.5.1
- rpm -ivh cassandra-0.6.3.rpm
- /opt/cassandra/bin/cassandra
Done. Note what the hell is cassandra-0.6.3.rpm, it's an rpm I created that has my storage-conf.xml
log4j.properties
cassandra.in.sh
After Upgrading:
***************************************************************
WARNING: ./nodeprobe is obsolete, use ./nodetool instead
***************************************************************
Address Status Load Range Ring
facebook_1301003235_1301003235
10.129.28.22 Up 11.75 GB 9ZehBzpHHwnxiPJU |<--|
10.129.28.23 Up 3.04 GB facebook_100000471858343_1514390063 | |
10.129.28.14 Up 2.33 GB facebook_100000846936312 | |
10.129.28.20 Up 4.4 GB facebook_1301003235_1301003235 |-->|
Now what is left to do it change my ganglia scripts / nagios scripts to use nodetool instead of nodeprobe.
PlanetMySQL Voting:
Vote UP /
Vote DOWN";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:17:"Dathan Pattishall";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:10;a:6:{s:4:"data";s:68:"
";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:57:"Optimizing SQL Performance – The Art of Elimination";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:38:"http://ronaldbradford.com/blog/?p=3005";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:92:"http://ronaldbradford.com/blog/optimizing-sql-performance-the-art-of-elimination-2010-07-08/";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:3490:"The most efficient performance optimization of a SQL statement is to eliminate it. Cary Millsap’s recent Kaleidoscope presentation again highlighted that improving performance is function of code path. Removing code will improve performance.
You may think that it could be hard to eliminate SQL, however when you know every SQL statement that is executed in your code path obvious improvements may be possible. In the sequence SQL was implemented sometimes easy observations can lead to great gains. Let me provide some actual client examples that were discovered by using the MySQL General Log.
Example 1
5 Query SELECT * FROM `artist`
5 Query SELECT * FROM `artist`
5 Query SELECT * FROM `artist`
5 Query SELECT * FROM `artist`
5 Query SELECT * FROM `artist`
5 Query SELECT * FROM `artist` WHERE (ArtistID = 196 )
5 Query SELECT * FROM `artist` WHERE (ArtistID = 2188 )
5 Query SELECT * FROM `artist`
5 Query SELECT * FROM `artist`
5 Query SELECT * FROM `artist`
In this example, the following was executed for a single page load. Not only did I find a bug where full-table scans occurred rather then being qualified, there were many repeating and unnecessary occurrences.
Example 2
SELECT option_name, option_value FROM wp_options WHERE autoload = 'yes'
SELECT option_value FROM wp_options WHERE option_name = 'aiosp_title_format' LIMIT 1
SELECT option_value FROM wp_options WHERE option_name = 'ec3_show_only_even' LIMIT 1
SELECT option_value FROM wp_options WHERE option_name = 'ec3_num_months' LIMIT 1
SELECT option_value FROM wp_options WHERE option_name = 'ec3_day_length' LIMIT 1
SELECT option_value FROM wp_options WHERE option_name = 'ec3_hide_event_box' LIMIT 1
SELECT option_value FROM wp_options WHERE option_name = 'ec3_advanced' LIMIT 1
SELECT option_value FROM wp_options WHERE option_name = 'ec3_navigation' LIMIT 1
SELECT option_value FROM wp_options WHERE option_name = 'ec3_disable_popups' LIMIT 1
SELECT option_value FROM wp_options WHERE option_name = 'sidebars_widgets' LIMIT 1
This is a stock Wordpress installation and highlights a classic Row at a Time (RAT) processing.
Example 3
SELECT * FROM activities_theme WHERE theme_parent_id=0
SELECT * FROM activities_theme WHERE theme_parent_id=1
SELECT * FROM activities_theme WHERE theme_parent_id=2
SELECT * FROM activities_theme WHERE theme_parent_id=11
SELECT * FROM activities_theme WHERE theme_parent_id=16
In this client example, again RAT processing, I provided a code improvement to run these multiple queries in a single statement, otherwise known as Chunk At a Time (CAT) processing. It’s not rocket science however the elimination of the network component of several SQL statements can greatly reduce page load time.
SELECT *
FROM activities_theme
WHERE theme_parent_id in (0,1,2,11,16)
Example 4
The following represents one of the best improvement. During capture, the following query was executed 6,000 times over a 5 minute period. While you make think this is acceptable, the value passed wae 0. The pages_id is an auto_increment column which by definition does not have a 0 value. In this instance, a simple boundary condition in the code would eliminate this query.
SELECT pages_id, pages_livestats_code, pages_title,
pages_parent, pages_exhibid, pages_theme,
pages_accession_num
FROM pages WHERE pages_id = 0
There are many tips to improving and optimizing SQL. This is the simplest and often overlooked starting point.";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Thu, 08 Jul 2010 17:19:24 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:6:{i:0;a:5:{s:4:"data";s:9:"Databases";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:1;a:5:{s:4:"data";s:5:"MySQL";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:2;a:5:{s:4:"data";s:12:"Professional";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:3;a:5:{s:4:"data";s:12:"cary millsap";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:4;a:5:{s:4:"data";s:11:"performance";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:5;a:5:{s:4:"data";s:14:"query analysis";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:4098:"The most efficient performance optimization of a SQL statement is to eliminate it. Cary Millsap’s recent Kaleidoscope presentation again highlighted that improving performance is function of code path. Removing code will improve performance.
You may think that it could be hard to eliminate SQL, however when you know every SQL statement that is executed in your code path obvious improvements may be possible. In the sequence SQL was implemented sometimes easy observations can lead to great gains. Let me provide some actual client examples that were discovered by using the MySQL General Log.
Example 1
5 Query SELECT * FROM `artist`
5 Query SELECT * FROM `artist`
5 Query SELECT * FROM `artist`
5 Query SELECT * FROM `artist`
5 Query SELECT * FROM `artist`
5 Query SELECT * FROM `artist` WHERE (ArtistID = 196 )
5 Query SELECT * FROM `artist` WHERE (ArtistID = 2188 )
5 Query SELECT * FROM `artist`
5 Query SELECT * FROM `artist`
5 Query SELECT * FROM `artist`
In this example, the following was executed for a single page load. Not only did I find a bug where full-table scans occurred rather then being qualified, there were many repeating and unnecessary occurrences.
Example 2
SELECT option_name, option_value FROM wp_options WHERE autoload = 'yes'
SELECT option_value FROM wp_options WHERE option_name = 'aiosp_title_format' LIMIT 1
SELECT option_value FROM wp_options WHERE option_name = 'ec3_show_only_even' LIMIT 1
SELECT option_value FROM wp_options WHERE option_name = 'ec3_num_months' LIMIT 1
SELECT option_value FROM wp_options WHERE option_name = 'ec3_day_length' LIMIT 1
SELECT option_value FROM wp_options WHERE option_name = 'ec3_hide_event_box' LIMIT 1
SELECT option_value FROM wp_options WHERE option_name = 'ec3_advanced' LIMIT 1
SELECT option_value FROM wp_options WHERE option_name = 'ec3_navigation' LIMIT 1
SELECT option_value FROM wp_options WHERE option_name = 'ec3_disable_popups' LIMIT 1
SELECT option_value FROM wp_options WHERE option_name = 'sidebars_widgets' LIMIT 1
This is a stock Wordpress installation and highlights a classic Row at a Time (RAT) processing.
Example 3
SELECT * FROM activities_theme WHERE theme_parent_id=0
SELECT * FROM activities_theme WHERE theme_parent_id=1
SELECT * FROM activities_theme WHERE theme_parent_id=2
SELECT * FROM activities_theme WHERE theme_parent_id=11
SELECT * FROM activities_theme WHERE theme_parent_id=16
In this client example, again RAT processing, I provided a code improvement to run these multiple queries in a single statement, otherwise known as Chunk At a Time (CAT) processing. It’s not rocket science however the elimination of the network component of several SQL statements can greatly reduce page load time.
SELECT *
FROM activities_theme
WHERE theme_parent_id in (0,1,2,11,16)
Example 4
The following represents one of the best improvement. During capture, the following query was executed 6,000 times over a 5 minute period. While you make think this is acceptable, the value passed wae 0. The pages_id is an auto_increment column which by definition does not have a 0 value. In this instance, a simple boundary condition in the code would eliminate this query.
SELECT pages_id, pages_livestats_code, pages_title,
pages_parent, pages_exhibid, pages_theme,
pages_accession_num
FROM pages WHERE pages_id = 0
There are many tips to improving and optimizing SQL. This is the simplest and often overlooked starting point.
PlanetMySQL Voting:
Vote UP /
Vote DOWN";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:15:"Ronald Bradford";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:11;a:6:{s:4:"data";s:63:"
";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:28:"PBMS is in the Drizzle tree!";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:70:"tag:blogger.com,1999:blog-5313622847232042654.post-5933636087855764469";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:63:"http://bpbdev.blogspot.com/2010/07/pbms-is-in-drizzle-tree.html";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:2000:"If you haven't already heard PBMS is now part of the Drizzle tree.Getting it there was a fair bit of work but not as much as I had thought it would be. The process of getting it to work with Drizzle and running it thorough Hudson has improved the code a lot. It is amazing what some compilers will catch that others will let by. I am now a firm believer in treating all compiler warnings as errors.I am just in the process of updating the PBMS plugin so that it will build and install the PBMS client library (libpbmscl.so) as well as the plugin. The PBMS client library is a standalone library that can be used to access the PBMS daemon weather it is running as part of MySQL or Drizzle. So a PBMS client library built with Drizzle can be used to access a PBMS daemon running as part of MySQL and vice-versa.There is also PHP extension for PBMS that is basically just a wrapper for the library. Currently this is part of the PBMS project on launchpad but I am working on getting it into pecl. The PHP extension has a set of test cases with it which is what I use to test PBMS with.If anybody is interested in taking on the task of creating a python module for PBMS I would be happy to provide what ever help you may need. I think it would just be a wrapper around the PBMS client library almost identical to the PHP extension. I would recoment just taking the PHP extension and converting it.Now that I have PBMS in Drizzle I am planning on getting replication working with PBMS. I have decided that the best way to do this is to do a bit of work rearranging how PBMS uses the BLOB URLs to reference the BLOB data. The URLs already contain a server, database, and table id so the plan is to change things so that PBMS can handle BLOB URLS from other servers being inserted or referenced. Once this is working then I will automatically have replication and 95% of what is needed to support clustered servers.I will go into detail on this in a later posting which will include pretty pictures.Barry";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Thu, 08 Jul 2010 16:49:00 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:5:{i:0;a:5:{s:4:"data";s:5:"MySQL";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:1;a:5:{s:4:"data";s:3:"PHP";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:2;a:5:{s:4:"data";s:5:"BLOBS";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:3;a:5:{s:4:"data";s:4:"PBMS";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:4;a:5:{s:4:"data";s:7:"Drizzle";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:2457:"If you haven't already heard PBMS is now part of the Drizzle tree.
Getting it there was a fair bit of work but not as much as I had thought it would be. The process of getting it to work with Drizzle and running it thorough Hudson has improved the code a lot. It is amazing what some compilers will catch that others will let by. I am now a firm believer in treating all compiler warnings as errors.
I am just in the process of updating the PBMS plugin so that it will build and install the PBMS client library (libpbmscl.so) as well as the plugin. The PBMS client library is a standalone library that can be used to access the PBMS daemon weather it is running as part of MySQL or Drizzle. So a PBMS client library built with Drizzle can be used to access a PBMS daemon running as part of MySQL and vice-versa.
There is also PHP extension for PBMS that is basically just a wrapper for the library. Currently this is part of the PBMS project on launchpad but I am working on getting it into pecl. The PHP extension has a set of test cases with it which is what I use to test PBMS with.
If anybody is interested in taking on the task of creating a python module for PBMS I would be happy to provide what ever help you may need. I think it would just be a wrapper around the PBMS client library almost identical to the PHP extension. I would recoment just taking the PHP extension and converting it.
Now that I have PBMS in Drizzle I am planning on getting replication working with PBMS. I have decided that the best way to do this is to do a bit of work rearranging how PBMS uses the BLOB URLs to reference the BLOB data. The URLs already contain a server, database, and table id so the plan is to change things so that PBMS can handle BLOB URLS from other servers being inserted or referenced. Once this is working then I will automatically have replication and 95% of what is needed to support clustered servers.
I will go into detail on this in a later posting which will include pretty pictures.
Barry
PlanetMySQL Voting:
Vote UP /
Vote DOWN";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:12:"Barry Leslie";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:12;a:6:{s:4:"data";s:68:"
";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:29:"What do MySQL Consultants do?";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:38:"http://ronaldbradford.com/blog/?p=3029";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:71:"http://ronaldbradford.com/blog/what-do-mysql-consultants-do-2010-07-08/";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:2989:"One role of a MySQL consultant is to review an existing production system. Sometimes you have sufficient time and access, and other times you don’t. If I am given a limited time here is a general list of things I look at.
Review Server architecture, OS, Memory, Disks (including raid and partition type), Network etc
Review server load and identify physical bottleneck
Look at all running processes
Look specifically at MySQL processes
Review MySQL Error Log
Determine MySQL version
Look at MySQL configuration (e.g. /etc/my.cnf)
Look at running MySQL Variables
Look at running MySQL status (x n times)
Look at running MySQL INNODB status (x n times) if used
Get Database and Schema Sizes
Get Database Schema
Review Slow Query Log
Capture query sample via SHOW FULL PROCESSLIST (locked and long running)
Analyze Binary Log file
Capture all running SQL
Here are some of the commands I would run.
2. Review server load and identify physical bottleneck
$ vmstat 5 720 > vmstat.`date +%y%m%d.%H%M%S`.txt
4. Look at MySQL processes
$ ps -eopid,fname,rss,vsz,user,command | grep -e "RSS" -e "mysql"
PID COMMAND RSS VSZ USER COMMAND
5463 grep 764 5204 ronald grep -e RSS -e mysql
13894 mysqld_s 596 3936 root /bin/sh /usr/bin/mysqld_safe
13933 mysqld 4787812 5127208 mysql /usr/sbin/mysqld --basedir=/usr --datadir=/vol/mysql/mysqldata --user=mysql --pid-file=/var/run/mysqld/mysqld.pid --skip-external-locking --port=3306 --socket=/var/run/mysqld/mysqld.sock
13934 logger 608 3840 root logger -p daemon.err -t mysqld_safe -i -t mysqld
$ ps -eopid,fname,rss,vsz,user,command | grep " mysqld " | grep -v grep | awk '{print $3,$4}'
4787820 5127208
5. Review MySQL Error Log
The error log can be found in various different places based on the operating system and configuration. It is important to find the right log, the SHOW GLOBAL VARIABLES LIKE ‘log_error’ will determine the location.
This is generally overlooked, however this can quickly identify some underlying problems with a MySQL environment.
7. Look at MySQL configuration
$ [ -f /etc/my.cnf ] && cat /etc/my.cnf
$ [ -f /etc/mysql/my.cnf ] && cat /etc/mysql/my.cnf
$ find / -name "*my*cnf" 2>/dev/null
8. Look at running MySQL Variables
$ mysqladmin -uroot -p variables
9. Look at running MySQL status (x n times)
$ mysqladmin -uroot -p extended-status
It is important to run this several times at regular intervals, say 60 seconds, 60 minutes, or 24 hours.
I also have dedicated scripts that can perform this. Check out Log MySQL Stats.
11. Get Database and Schema Sizes
Check out my scripts on my MySQL DBA page
14. Capture Locked statements
Check out my script for Capturing MySQL sessions.
15. Analyze Binary Log file
Check out my post on using mk-query-digest.
16. Capture all SQL
Check out my post on DML Stats per table
Moving forward
Of course the commands I run exceeds this initial list, and gathering this information is only ";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Thu, 08 Jul 2010 16:18:41 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:6:{i:0;a:5:{s:4:"data";s:9:"Databases";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:1;a:5:{s:4:"data";s:5:"MySQL";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:2;a:5:{s:4:"data";s:12:"Professional";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:3;a:5:{s:4:"data";s:10:"monitoring";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:4;a:5:{s:4:"data";s:20:"performance analysis";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:5;a:5:{s:4:"data";s:6:"tuning";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:3935:"One role of a MySQL consultant is to review an existing production system. Sometimes you have sufficient time and access, and other times you don’t. If I am given a limited time here is a general list of things I look at.
- Review Server architecture, OS, Memory, Disks (including raid and partition type), Network etc
- Review server load and identify physical bottleneck
- Look at all running processes
- Look specifically at MySQL processes
- Review MySQL Error Log
- Determine MySQL version
- Look at MySQL configuration (e.g. /etc/my.cnf)
- Look at running MySQL Variables
- Look at running MySQL status (x n times)
- Look at running MySQL INNODB status (x n times) if used
- Get Database and Schema Sizes
- Get Database Schema
- Review Slow Query Log
- Capture query sample via SHOW FULL PROCESSLIST (locked and long running)
- Analyze Binary Log file
- Capture all running SQL
Here are some of the commands I would run.
2. Review server load and identify physical bottleneck
$ vmstat 5 720 > vmstat.`date +%y%m%d.%H%M%S`.txt
4. Look at MySQL processes
$ ps -eopid,fname,rss,vsz,user,command | grep -e "RSS" -e "mysql"
PID COMMAND RSS VSZ USER COMMAND
5463 grep 764 5204 ronald grep -e RSS -e mysql
13894 mysqld_s 596 3936 root /bin/sh /usr/bin/mysqld_safe
13933 mysqld 4787812 5127208 mysql /usr/sbin/mysqld --basedir=/usr --datadir=/vol/mysql/mysqldata --user=mysql --pid-file=/var/run/mysqld/mysqld.pid --skip-external-locking --port=3306 --socket=/var/run/mysqld/mysqld.sock
13934 logger 608 3840 root logger -p daemon.err -t mysqld_safe -i -t mysqld
$ ps -eopid,fname,rss,vsz,user,command | grep " mysqld " | grep -v grep | awk '{print $3,$4}'
4787820 5127208
5. Review MySQL Error Log
The error log can be found in various different places based on the operating system and configuration. It is important to find the right log, the SHOW GLOBAL VARIABLES LIKE ‘log_error’ will determine the location.
This is generally overlooked, however this can quickly identify some underlying problems with a MySQL environment.
7. Look at MySQL configuration
$ [ -f /etc/my.cnf ] && cat /etc/my.cnf
$ [ -f /etc/mysql/my.cnf ] && cat /etc/mysql/my.cnf
$ find / -name "*my*cnf" 2>/dev/null
8. Look at running MySQL Variables
$ mysqladmin -uroot -p variables
9. Look at running MySQL status (x n times)
$ mysqladmin -uroot -p extended-status
It is important to run this several times at regular intervals, say 60 seconds, 60 minutes, or 24 hours.
I also have dedicated scripts that can perform this. Check out Log MySQL Stats.
11. Get Database and Schema Sizes
Check out my scripts on my MySQL DBA page
14. Capture Locked statements
Check out my script for Capturing MySQL sessions.
15. Analyze Binary Log file
Check out my post on using mk-query-digest.
16. Capture all SQL
Check out my post on DML Stats per table
Moving forward
Of course the commands I run exceeds this initial list, and gathering this information is only
PlanetMySQL Voting:
Vote UP /
Vote DOWN";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:15:"Ronald Bradford";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:13;a:6:{s:4:"data";s:53:"
";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:48:"LinuxTag presentation now available for download";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:34:"http://fghaas.wordpress.com/?p=541";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:88:"http://fghaas.wordpress.com/2010/07/08/linuxtag-presentation-now-available-for-download/";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:515:"A live recording of my LinuxTag 2010 presentation entitled Storage Done Right: Building a Resilient, Distributed, Highly Available Open Source iSCSI SAN is now available from our web site. If you want to find out how to build a complete SAN from 100% open source, do take a look!
I do apologize for the less-than-optimal sound quality. I did the recording myself with my laptop mike, so unfortunately there’s quite a bit of clipping in the audio track. I hope my ramblings are still somewhat audible.
";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Thu, 08 Jul 2010 15:26:38 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:3:{i:0;a:5:{s:4:"data";s:11:"Conferences";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:1;a:5:{s:4:"data";s:7:"Storage";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:2;a:5:{s:4:"data";s:9:"Technical";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:1975:"A live recording of my LinuxTag 2010 presentation entitled Storage Done Right: Building a Resilient, Distributed, Highly Available Open Source iSCSI SAN is now available from our web site. If you want to find out how to build a complete SAN from 100% open source, do take a look!
I do apologize for the less-than-optimal sound quality. I did the recording myself with my laptop mike, so unfortunately there’s quite a bit of clipping in the audio track. I hope my ramblings are still somewhat audible.

PlanetMySQL Voting:
Vote UP /
Vote DOWN";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:15:"Florian G. Haas";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:14;a:6:{s:4:"data";s:58:"
";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:33:"RHEL LVS setup for MySQL DB Nodes";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:34:"http://dbperf.wordpress.com/?p=132";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:73:"http://dbperf.wordpress.com/2010/07/08/rhel-lvs-setup-for-mysql-db-nodes/";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:3237:"I was configuring MySQL Cluster where the application servers use a properties file to connect to the MySQL Data/Storage node (I configured both Data and Storage nodes on same physical server).
I want the application to use a single IP address, to access the DB servers in cluster. Since if any of the nodes fail, the application servers should still be able to query the databases.
Hence I thought of using LVS so that multiple application servers can access the DB servers through VIP. To achieve the same I have configured LVS, thought of sharing the same.
RHEL Linux LVS setup:
=====================
Pre-Requisites:
===============
REDHAT Linux Cluster Packages
VIRTUAL IP
Configuration for LB server:
============================
Configure packet forwarding on LB server
vi /etc/sysctl.conf
net.ipv4.ip_forward = 1 # change to 1,
Install piranha package from redhat cluster group
Once installed, configure the password:
# piranha-passwd
for autostart the LVS services, below are the commands
# chkconfig pulse on
# chkconfig piranha-gui on
# chkconfig httpd on
Start the http and piranha services
# service httpd start
# service piranha-gui start
Open http://localhost:3636 in a Web browser to access the Piranha Configuration Tool.
Click on the Login button and enter piranha for the Username and the administrative password you created in the Password field.
OR
Configure file to set up LVS
vi /etc/sysconfig/ha/lvs.cf # Sample configuration file for 11.1 and 11.4
serial_no = 47
primary = 172.16.11.1
service = lvs
backup_active = 1
backup = 172.16.11.4
heartbeat = 1
heartbeat_port = 539
keepalive = 3
deadtime = 6
network = direct
debug_level = NONE
monitor_links = 0
syncdaemon = 0
virtual MySQL {
active = 1
address = 172.16.11.10 bond0:1
vip_nmask = 255.255.255.255
port = 3306
expect = “UP”
use_regex = 1
send_program = “/root/mysql_mon.sh %h”
load_monitor = none
scheduler = wlc
protocol = tcp
timeout = 5
reentry = 3
quiesce_server = 0
server Server1 {
address = 172.16.11.2
active = 1
weight = 1
}
server Server2 {
address = 172.16.11.3
active = 1
weight = 1
}
}
Restart the pulse services on the LVS Routers
# service pulse restart
Verify the LVS. Check the ipvsadm entries:
[root@ServerLB|172.16.11.1~]~ # ipvsadm
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP 172.16.11.10:mysql wlc
-> Server1 :mysql Route 1 0 0
-> Server2 :mysql Route 1 0 0
MySQL Real Servers configuration:
=================================
Run below commands on both the MySQL nodes
echo 1 > /proc/sys/net/ipv4/conf/lo/arp_ignore
echo 2 > /proc/sys/net/ipv4/conf/lo/arp_announce
Configure loopback on both the MySQL nodes
ifconfig lo:0 172.16.11.10 netmask 255.255.255.255 up
OR (Permanent changes)
vi /etc/sysconfig/network-scripts/ifcfg-lo:0
DEVICE=lo:0
IPADDR=172.16.11.10
NETMASK=255.255.255.255
NETWORK=172.16.11.0
BROADCAST=172.16.11.255
ONBOOT=yes
NAME=loopback
To bring up the IP alias the ifup command is used:
/sbin/ifup lo
";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Thu, 08 Jul 2010 11:59:10 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:4:{i:0;a:5:{s:4:"data";s:23:"Database Configurations";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:1;a:5:{s:4:"data";s:27:"Configuration steps for LVS";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:2;a:5:{s:4:"data";s:24:"RHEL LVS Setup for MySQL";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:3;a:5:{s:4:"data";s:21:"Step to configure LVS";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:5175:"I was configuring MySQL Cluster where the application servers use a properties file to connect to the MySQL Data/Storage node (I configured both Data and Storage nodes on same physical server).
I want the application to use a single IP address, to access the DB servers in cluster. Since if any of the nodes fail, the application servers should still be able to query the databases.
Hence I thought of using LVS so that multiple application servers can access the DB servers through VIP. To achieve the same I have configured LVS, thought of sharing the same.
RHEL Linux LVS setup:
=====================
Pre-Requisites:
===============
REDHAT Linux Cluster Packages
VIRTUAL IP
Configuration for LB server:
============================
Configure packet forwarding on LB server
vi /etc/sysctl.conf
net.ipv4.ip_forward = 1 # change to 1,
Install piranha package from redhat cluster group
Once installed, configure the password:
# piranha-passwd
for autostart the LVS services, below are the commands
# chkconfig pulse on
# chkconfig piranha-gui on
# chkconfig httpd on
Start the http and piranha services
# service httpd start
# service piranha-gui start
Open http://localhost:3636 in a Web browser to access the Piranha Configuration Tool.
Click on the Login button and enter piranha for the Username and the administrative password you created in the Password field.
OR
Configure file to set up LVS
vi /etc/sysconfig/ha/lvs.cf # Sample configuration file for 11.1 and 11.4
serial_no = 47
primary = 172.16.11.1
service = lvs
backup_active = 1
backup = 172.16.11.4
heartbeat = 1
heartbeat_port = 539
keepalive = 3
deadtime = 6
network = direct
debug_level = NONE
monitor_links = 0
syncdaemon = 0
virtual MySQL {
active = 1
address = 172.16.11.10 bond0:1
vip_nmask = 255.255.255.255
port = 3306
expect = “UP”
use_regex = 1
send_program = “/root/mysql_mon.sh %h”
load_monitor = none
scheduler = wlc
protocol = tcp
timeout = 5
reentry = 3
quiesce_server = 0
server Server1 {
address = 172.16.11.2
active = 1
weight = 1
}
server Server2 {
address = 172.16.11.3
active = 1
weight = 1
}
}
Restart the pulse services on the LVS Routers
# service pulse restart
Verify the LVS. Check the ipvsadm entries:
[root@ServerLB|172.16.11.1~]~ # ipvsadm
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP 172.16.11.10:mysql wlc
-> Server1 :mysql Route 1 0 0
-> Server2 :mysql Route 1 0 0
MySQL Real Servers configuration:
=================================
Run below commands on both the MySQL nodes
echo 1 > /proc/sys/net/ipv4/conf/lo/arp_ignore
echo 2 > /proc/sys/net/ipv4/conf/lo/arp_announce
Configure loopback on both the MySQL nodes
ifconfig lo:0 172.16.11.10 netmask 255.255.255.255 up
OR (Permanent changes)
vi /etc/sysconfig/network-scripts/ifcfg-lo:0
DEVICE=lo:0
IPADDR=172.16.11.10
NETMASK=255.255.255.255
NETWORK=172.16.11.0
BROADCAST=172.16.11.255
ONBOOT=yes
NAME=loopback
To bring up the IP alias the ifup command is used:
/sbin/ifup lo

PlanetMySQL Voting:
Vote UP /
Vote DOWN";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:16:"Anirudh Tamsekar";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:15;a:6:{s:4:"data";s:43:"
";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:48:"The benchmark you’re reading is probably wrong";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:36:"http://www.rethinkdb.com/blog/?p=448";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:84:"http://www.rethinkdb.com/blog/2010/07/the-benchmark-youre-reading-is-probably-wrong/";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:3853:"Mikeal Rogers wrote a blog post on MongoDB performance and durability. In one of the sections, he writes about the request/response model, and makes the following statement:
MongoDB, by default, doesn’t actually have a response for writes.
In response, one of 10gen employees (the company behind MongoDB) made the following comment on Hacker News:
We did this to make MongoDB look good in stupid benchmarks.
The benchmark in question shows a single graph, which demonstrates that MongoDB is 27 times faster than CouchDB on inserting one million rows. At the first glance, the benchmark immediately looks silly if you’ve ever done serious benchmarking before. CouchDB people are smart, inserting such a small number of elements is a relatively simple feature, and it’s almost certain that either they would have fixed something that simple or they had a very good reason not to (in which case the benchmark is likely measuring apples and oranges).
Let’s do some back of the envelope math. Roundtrip latency on a commodity network for a small packet can range from 0.2ms to 0.8ms. A single rotational drive can do 15000RPM / 60sec = 250 operations per second (resulting in close to 5ms latency in practice), and a single Intel X25-m SSD drive can do about 7000 write operations per second (resulting in close to 0.15ms latency).
The benchmark demonstrates that CouchDB takes an average of 0.5ms per document to insert one million documents, while MongoDB does the same in 0.01ms. Clearly the rotational drives are too slow to play a part in the measurement, and the SSD drives are probably too fast to matter for CouchDB and too slow to matter for MongoDB. However, CouchDB appears to be awfully close to commonly encountered network latencies, while MongoDB inserts each document 50 times faster than commodity network latency.
At first observation, it appears likely that the CouchDB client library is configured to wait for the socket to receive a response from the database server before sending the next insert, while the MongoDB client is configured to continue sending insert requests without waiting for a response. If this is true, the benchmark compares apples and oranges and tells you absolutely nothing about which database engine is actually faster at inserting elements. It doesn’t measure how fast each engine handles insertion when the dataset fits into memory, when the dataset spills onto disk, or when there are multiple concurrent clients (which is a whole different can of worms). It doesn’t even begin to address the more subtle issues of whether the potential bottlenecks for each database might reside in the virtual memory configuration, or the file system, or the operating system I/O scheduler, or some other part of the stack, because each database uses each one of these components slightly differently. What the benchmark likely measures is something that is never mentioned – the latency of the network stack for CouchDB, and something entirely unrelated for MongoDB.
Unfortunately most benchmarks published online have similar crucial flaws in the methodology, and since many people make decisions based on this information, software vendors are forced to modify the default configuration of their products to look good on these benchmarks. There is no easy solution – performing proper benchmarks is very error-prone, time consuming work. It’s good to be very skeptical about benchmarks that show a large performance difference but don’t carefully discuss the methodology and potential pitfalls. As Brad Pitt’s character says at the end of Inglourious Basterds – “Long story short, we hear a story too good to be true, it ain’t”.
Interested in working at RethinkDB? We’re hiring – please see our jobs page for more details.";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Thu, 08 Jul 2010 08:01:00 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:1:{i:0;a:5:{s:4:"data";s:13:"Announcements";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:4410:"Mikeal Rogers wrote a blog post on MongoDB performance and durability. In one of the sections, he writes about the request/response model, and makes the following statement:
MongoDB, by default, doesn’t actually have a response for writes.
In response, one of 10gen employees (the company behind MongoDB) made the following comment on Hacker News:
We did this to make MongoDB look good in stupid benchmarks.
The benchmark in question shows a single graph, which demonstrates that MongoDB is 27 times faster than CouchDB on inserting one million rows. At the first glance, the benchmark immediately looks silly if you’ve ever done serious benchmarking before. CouchDB people are smart, inserting such a small number of elements is a relatively simple feature, and it’s almost certain that either they would have fixed something that simple or they had a very good reason not to (in which case the benchmark is likely measuring apples and oranges).
Let’s do some back of the envelope math. Roundtrip latency on a commodity network for a small packet can range from 0.2ms to 0.8ms. A single rotational drive can do 15000RPM / 60sec = 250 operations per second (resulting in close to 5ms latency in practice), and a single Intel X25-m SSD drive can do about 7000 write operations per second (resulting in close to 0.15ms latency).
The benchmark demonstrates that CouchDB takes an average of 0.5ms per document to insert one million documents, while MongoDB does the same in 0.01ms. Clearly the rotational drives are too slow to play a part in the measurement, and the SSD drives are probably too fast to matter for CouchDB and too slow to matter for MongoDB. However, CouchDB appears to be awfully close to commonly encountered network latencies, while MongoDB inserts each document 50 times faster than commodity network latency.
At first observation, it appears likely that the CouchDB client library is configured to wait for the socket to receive a response from the database server before sending the next insert, while the MongoDB client is configured to continue sending insert requests without waiting for a response. If this is true, the benchmark compares apples and oranges and tells you absolutely nothing about which database engine is actually faster at inserting elements. It doesn’t measure how fast each engine handles insertion when the dataset fits into memory, when the dataset spills onto disk, or when there are multiple concurrent clients (which is a whole different can of worms). It doesn’t even begin to address the more subtle issues of whether the potential bottlenecks for each database might reside in the virtual memory configuration, or the file system, or the operating system I/O scheduler, or some other part of the stack, because each database uses each one of these components slightly differently. What the benchmark likely measures is something that is never mentioned – the latency of the network stack for CouchDB, and something entirely unrelated for MongoDB.
Unfortunately most benchmarks published online have similar crucial flaws in the methodology, and since many people make decisions based on this information, software vendors are forced to modify the default configuration of their products to look good on these benchmarks. There is no easy solution – performing proper benchmarks is very error-prone, time consuming work. It’s good to be very skeptical about benchmarks that show a large performance difference but don’t carefully discuss the methodology and potential pitfalls. As Brad Pitt’s character says at the end of Inglourious Basterds – “Long story short, we hear a story too good to be true, it ain’t”.
Interested in working at RethinkDB? We’re hiring – please see our jobs page for more details.
PlanetMySQL Voting:
Vote UP /
Vote DOWN";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:15:"Slava Akhmechet";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:16;a:6:{s:4:"data";s:68:"
";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:15:"PBMS in Drizzle";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:40:"http://www.flamingspork.com/blog/?p=2061";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:60:"http://www.flamingspork.com/blog/2010/07/08/pbms-in-drizzle/";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:939:"Some of you may have noticed that blob streaming has been merged into the main Drizzle tree recently. There are a few hooks inside the Drizzle kernel that PBMS uses, and everything else is just in the plug in.
For those not familiar with PBMS it does two things: provide a place (not in the table) for BLOBs to be stored (locally on disk or even out to S3) and provide a HTTP interface to get and store BLOBs.
This means you can do really neat things such as have your BLOBs replicated, consistent and all those nice databasey things as well as easily access them in a scalable way (everybody knows how to cache HTTP).
This is a great addition to the AlsoSQL arsenal of Drizzle. I’m looking forward to it advancing and being adopted (now much easier that it’s in the main repository)
Share this on Facebook
Tweet This!
Share this on del.icio.us
Digg this!
Post on Google Buzz
";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Thu, 08 Jul 2010 06:53:45 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:6:{i:0;a:5:{s:4:"data";s:4:"code";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:1;a:5:{s:4:"data";s:7:"drizzle";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:2;a:5:{s:4:"data";s:4:"BLOB";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:3;a:5:{s:4:"data";s:4:"http";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:4;a:5:{s:4:"data";s:4:"pbms";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:5;a:5:{s:4:"data";s:9:"streaming";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:2194:"Some of you may have noticed that blob streaming has been merged into the main Drizzle tree recently. There are a few hooks inside the Drizzle kernel that PBMS uses, and everything else is just in the plug in.
For those not familiar with PBMS it does two things: provide a place (not in the table) for BLOBs to be stored (locally on disk or even out to S3) and provide a HTTP interface to get and store BLOBs.
This means you can do really neat things such as have your BLOBs replicated, consistent and all those nice databasey things as well as easily access them in a scalable way (everybody knows how to cache HTTP).
This is a great addition to the AlsoSQL arsenal of Drizzle. I’m looking forward to it advancing and being adopted (now much easier that it’s in the main repository)
PlanetMySQL Voting:
Vote UP /
Vote DOWN";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:13:"Stewart Smith";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:17;a:6:{s:4:"data";s:63:"
";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:48:"Connector/J’s load-balancing failover policies";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:38:"http://mysqlblog.fivefarmers.com/?p=39";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:89:"http://mysqlblog.fivefarmers.com/2010/07/07/connectorjs-load-balancing-failover-policies/";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:3956:"Connector/J provides a useful load-balancing implementation for Cluster or multi-master deployments. As of Connector/J 5.1.12, this same implementation is used under the hood for balancing load between read-only slaves with ReplicationDriver. When trying to balance workload between multiple servers, though, the driver has to decide when it’s safe to swap servers – doing so in the middle of a transaction would not make applications very happy. Many of the same principles which apply to autoReconnect also apply here – you don’t want to lose important state information.
As a result, Connector/J will only try to pick a new server when one of the following happen:
At transaction boundaries (transactions are explicitly committed or rolled back)
A communication exception (SQL State starting with “08″) is encountered
When a SQLException matches conditions defined by user, using the extension points defined by the loadBalanceSQLStateFailover, loadBalanceSQLExceptionSubclassFailover or loadBalanceExceptionChecker properties.
The third condition is new, and revolves around three new properties introduced with Connector/J 5.1.13. It allows you to control which SQLExceptions trigger failover. Let’s examine each of the new properties in detail.
loadBalanceExceptionChecker
The loadBalanceExceptionChecker property is really the key. This takes a fully-qualified class name which implements the new com.mysql.jdbc.LoadBalanceExceptionChecker interface. This interface is very simple, and you only need to implement the following method:
public boolean shouldExceptionTriggerFailover(SQLException ex)
In goes a SQLException, out comes a boolean. True triggers a failover, false does not. Easy!
You can use this to implement your own custom logic. An example where this might be useful is when dealing with transient errors with MySQL Cluster, where certain buffers may be overloaded. At the 2010 MySQL Conference, Mark Matthews and I presented a simple example during our tutorial which does this:
public class NdbLoadBalanceExceptionChecker
extends StandardLoadBalanceExceptionChecker {
public boolean shouldExceptionTriggerFailover(SQLException ex) {
return super.shouldExceptionTriggerFailover(ex)
|| checkNdbException(ex);
}
private boolean checkNdbException(SQLException ex){
// Have to parse the message since most NDB errors
// are mapped to the same DEMC, sadly.
return (ex.getMessage().startsWith("Lock wait timeout exceeded") ||
(ex.getMessage().startsWith("Got temporary error")
&& ex.getMessage().endsWith("from NDB")));
}
}
The code above extends com.mysql.jdbc.StandardLoadBalanceExceptionChecker, which is the default implementation. There’s a few convenient shortcuts built into this, for those who want to have some level of control using properties, without writing Java code. This default implementation uses the two remaining properties: loadBalanceSQLStateFailover and loadBalanceSQLExceptionSubclassFailover.
loadBalanceSQLStateFailover
The loadBalanceSQLStateFailover property allows you to define a comma-delimited list of SQLState code prefixes, against which a SQLException is compared. If the prefix matches, failover is triggered. So, for example, the following would trigger a failover if a given SQLException starts with “00″, or is “12345″:
loadBalanceSQLStateFailover=00,12345
loadBalanceSQLExceptionSubclassFailover
This property can be used in conjunction with loadBalanceSQLStateFailover or on it’s own. If you want certain subclasses of SQLException to trigger failover, simply provide a comma-delimited list of fully-qualified class or interface names to check against. For example, say you want all SQLTransientConnectionExceptions to trigger failover:
loadBalanceSQLExceptionSubclassFailover=java.sql.SQLTransientConnectionException
That’s all there is to it!";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Thu, 08 Jul 2010 04:43:55 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:5:{i:0;a:5:{s:4:"data";s:4:"Java";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:1;a:5:{s:4:"data";s:5:"MySQL";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:2;a:5:{s:4:"data";s:11:"Connector/J";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:3;a:5:{s:4:"data";s:14:"Load-balancing";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:4;a:5:{s:4:"data";s:13:"MySQL Cluster";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:4914:"Connector/J provides a useful load-balancing implementation for Cluster or multi-master deployments. As of Connector/J 5.1.12, this same implementation is used under the hood for balancing load between read-only slaves with ReplicationDriver. When trying to balance workload between multiple servers, though, the driver has to decide when it’s safe to swap servers – doing so in the middle of a transaction would not make applications very happy. Many of the same principles which apply to autoReconnect also apply here – you don’t want to lose important state information.
As a result, Connector/J will only try to pick a new server when one of the following happen:
- At transaction boundaries (transactions are explicitly committed or rolled back)
- A communication exception (SQL State starting with “08″) is encountered
- When a SQLException matches conditions defined by user, using the extension points defined by the loadBalanceSQLStateFailover, loadBalanceSQLExceptionSubclassFailover or loadBalanceExceptionChecker properties.
The third condition is new, and revolves around three new properties introduced with Connector/J 5.1.13. It allows you to control which SQLExceptions trigger failover. Let’s examine each of the new properties in detail.
loadBalanceExceptionChecker
The loadBalanceExceptionChecker property is really the key. This takes a fully-qualified class name which implements the new com.mysql.jdbc.LoadBalanceExceptionChecker interface. This interface is very simple, and you only need to implement the following method:
public boolean shouldExceptionTriggerFailover(SQLException ex)
In goes a SQLException, out comes a boolean. True triggers a failover, false does not. Easy!
You can use this to implement your own custom logic. An example where this might be useful is when dealing with transient errors with MySQL Cluster, where certain buffers may be overloaded. At the 2010 MySQL Conference, Mark Matthews and I presented a simple example during our tutorial which does this:
public class NdbLoadBalanceExceptionChecker
extends StandardLoadBalanceExceptionChecker {
public boolean shouldExceptionTriggerFailover(SQLException ex) {
return super.shouldExceptionTriggerFailover(ex)
|| checkNdbException(ex);
}
private boolean checkNdbException(SQLException ex){
// Have to parse the message since most NDB errors
// are mapped to the same DEMC, sadly.
return (ex.getMessage().startsWith("Lock wait timeout exceeded") ||
(ex.getMessage().startsWith("Got temporary error")
&& ex.getMessage().endsWith("from NDB")));
}
}
The code above extends com.mysql.jdbc.StandardLoadBalanceExceptionChecker, which is the default implementation. There’s a few convenient shortcuts built into this, for those who want to have some level of control using properties, without writing Java code. This default implementation uses the two remaining properties: loadBalanceSQLStateFailover and loadBalanceSQLExceptionSubclassFailover.
loadBalanceSQLStateFailover
The loadBalanceSQLStateFailover property allows you to define a comma-delimited list of SQLState code prefixes, against which a SQLException is compared. If the prefix matches, failover is triggered. So, for example, the following would trigger a failover if a given SQLException starts with “00″, or is “12345″:
loadBalanceSQLStateFailover=00,12345
loadBalanceSQLExceptionSubclassFailover
This property can be used in conjunction with loadBalanceSQLStateFailover or on it’s own. If you want certain subclasses of SQLException to trigger failover, simply provide a comma-delimited list of fully-qualified class or interface names to check against. For example, say you want all SQLTransientConnectionExceptions to trigger failover:
loadBalanceSQLExceptionSubclassFailover=java.sql.SQLTransientConnectionException
That’s all there is to it!
PlanetMySQL Voting:
Vote UP /
Vote DOWN";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:11:"Todd Farmer";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:18;a:6:{s:4:"data";s:73:"
";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:45:"The XLDB4 Conference for Very Large Databases";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:36:"http://www.pythian.com/news/?p=14307";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:62:"http://www.pythian.com/news/14307/conferences-including-xldb4/";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:1893:"Ronald saved me a post by giving his feedback on a few Oracle conferences that now have MySQL content.
My opinion is pretty much a summary of Ronald’s post, so I won’t repeat it here. Instead, I’ll post about a conference he did not, the 4th Extremely Large Databases Conference. I am particularly interested in any MySQL folks planning to attend (I would expect Tokutek to be represented, and maybe even the
"http://www.calpont.com/">Calpont folks).
Most of this is directly from an e-mail I received from Jacek Becla, who had a keynote at the 2008 MySQL User Conference and Expo. If you also received this e-mail, please feel free to skip ahead to my viewpoints on the various Oracle conferences (or just skip altogether).
4th Extremely Large Databases (XLDB4) Conference
October 6-7, 2010
SLAC National Accelerator Laboratory
Menlo Park, California
http://www.xldb.org/4
XLDB, now including a conference open to all interested parties, is the gathering place for the community managing and analyzing data at extreme scale.
Leading practitioners from science, industry, and academia will discuss lessons learned and real-world solutions for handling terabytes and petabytes of data. Experts will present emerging trends that will significantly impact how extremely large databases are built and used.
More information, including the program, and online registration are available at http://www.xldb.org/4. Space is limited, so if you would like to attend, make sure to register soon. Early registration ($100) ends on July 31; late registration is $150.
If you are uncertain whether to attend, I would recommend reading the article recently written by Curt Monash:
http://www.dbms2.com/2010/07/01/why-you-should-go-to-xldb4/
For questions, please contact us by email using
xldb-admin at slac.stanford.edu or contact me [Jacek] directly at
becla at slac.stanford.edu.
";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Wed, 07 Jul 2010 23:00:25 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:7:{i:0;a:5:{s:4:"data";s:16:"Group Blog Posts";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:1;a:5:{s:4:"data";s:5:"jacek";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:2;a:5:{s:4:"data";s:5:"MySQL";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:3;a:5:{s:4:"data";s:6:"Oracle";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:4;a:5:{s:4:"data";s:8:"petabyte";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:5;a:5:{s:4:"data";s:7:"Pythian";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:6;a:5:{s:4:"data";s:4:"xldb";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:2691:"Ronald saved me a post by giving his feedback on a few Oracle conferences that now have MySQL content.
My opinion is pretty much a summary of Ronald’s post, so I won’t repeat it here. Instead, I’ll post about a conference he did not, the 4th Extremely Large Databases Conference. I am particularly interested in any MySQL folks planning to attend (I would expect Tokutek to be represented, and maybe even the
"http://www.calpont.com/">Calpont folks).
Most of this is directly from an e-mail I received from Jacek Becla, who had a keynote at the 2008 MySQL User Conference and Expo. If you also received this e-mail, please feel free to skip ahead to my viewpoints on the various Oracle conferences (or just skip altogether).
4th Extremely Large Databases (XLDB4) Conference
October 6-7, 2010
SLAC National Accelerator Laboratory
Menlo Park, California
http://www.xldb.org/4
XLDB, now including a conference open to all interested parties, is the gathering place for the community managing and analyzing data at extreme scale.
Leading practitioners from science, industry, and academia will discuss lessons learned and real-world solutions for handling terabytes and petabytes of data. Experts will present emerging trends that will significantly impact how extremely large databases are built and used.
More information, including the program, and online registration are available at http://www.xldb.org/4. Space is limited, so if you would like to attend, make sure to register soon. Early registration ($100) ends on July 31; late registration is $150.
If you are uncertain whether to attend, I would recommend reading the article recently written by Curt Monash:
http://www.dbms2.com/2010/07/01/why-you-should-go-to-xldb4/
For questions, please contact us by email using
xldb-admin at slac.stanford.edu or contact me [Jacek] directly at
becla at slac.stanford.edu.
PlanetMySQL Voting:
Vote UP /
Vote DOWN";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:16:"Sheeri K. Cabral";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:19;a:6:{s:4:"data";s:68:"
";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:23:"Timing your SQL queries";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:38:"http://ronaldbradford.com/blog/?p=3024";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:66:"http://ronaldbradford.com/blog/timing-your-sql-queries-2010-07-07/";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:965:"When working interactively with the MySQL client, you receive feedback of the time the query took to complete to a granularity of 10 ms.
Enabling profiling is a simple way to get more a more accurate timing of running queries. In the following example you can see the time the kernel took to run an explain, the query, and alter, and repeat explain and query.
mysql> set profiling=1;
mysql> EXPLAIN SELECT ...
mysql> SELECT ...
mysql> ALTER ...
mysql> show profiles;
+----------+------------+-------------------------
| Query_ID | Duration | Query
+----------+------------+-------------------------
| 1 | 0.00036500 | EXPLAIN SELECT sbvi.id a
| 2 | 0.00432700 | SELECT sbvi.id as sbvi_i
| 3 | 2.83206100 | alter table sbvi drop in
| 4 | 0.00047500 | explain SELECT sbvi.id a
| 5 | 0.00367100 | SELECT sbvi.id as sbvi_i
+----------+------------+-------------------------
More information at Show Profiles documentation page.";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Wed, 07 Jul 2010 18:48:32 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:6:{i:0;a:5:{s:4:"data";s:9:"Databases";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:1;a:5:{s:4:"data";s:5:"MySQL";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:2;a:5:{s:4:"data";s:12:"Professional";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:3;a:5:{s:4:"data";s:11:"performance";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:4;a:5:{s:4:"data";s:13:"show profiles";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:5;a:5:{s:4:"data";s:6:"timing";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:1282:"When working interactively with the MySQL client, you receive feedback of the time the query took to complete to a granularity of 10 ms.
Enabling profiling is a simple way to get more a more accurate timing of running queries. In the following example you can see the time the kernel took to run an explain, the query, and alter, and repeat explain and query.
mysql> set profiling=1;
mysql> EXPLAIN SELECT ...
mysql> SELECT ...
mysql> ALTER ...
mysql> show profiles;
+----------+------------+-------------------------
| Query_ID | Duration | Query
+----------+------------+-------------------------
| 1 | 0.00036500 | EXPLAIN SELECT sbvi.id a
| 2 | 0.00432700 | SELECT sbvi.id as sbvi_i
| 3 | 2.83206100 | alter table sbvi drop in
| 4 | 0.00047500 | explain SELECT sbvi.id a
| 5 | 0.00367100 | SELECT sbvi.id as sbvi_i
+----------+------------+-------------------------
More information at Show Profiles documentation page.
PlanetMySQL Voting:
Vote UP /
Vote DOWN";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:15:"Ronald Bradford";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:20;a:6:{s:4:"data";s:73:"
";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:16:"Federated Tables";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:33:"http://www.mysqlfanboy.com/?p=234";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:52:"http://www.mysqlfanboy.com/2010/07/federated-tables/";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:3267:"Your searching for how to create a join across two databases on two different servers and it can’t be done directly. select d1.a, d2.b from db1@server1 join db2@server2 where db1.c = db2.c; does not work.
You learn about federated databases. The federated storage engine allows accesses data in tables of remote databases. Now how do you make it work?
1) Check if the federated storage engine is supported. Federation is OFF by default!
mysql> show engines;
+------------+---------+----------------------------------------------------------------+
| Engine | Support | Comment |
+------------+---------+----------------------------------------------------------------+
| InnoDB | YES | Supports transactions, row-level locking, and foreign keys |
| MyISAM | DEFAULT | Default engine as of MySQL 3.23 with great performance |
| BLACKHOLE | YES | /dev/null storage engine (anything you write to it disappears) |
| CSV | YES | CSV storage engine |
| MEMORY | YES | Hash based, stored in memory, useful for temporary tables |
| FEDERATED | YES | Federated MySQL storage engine |
| ARCHIVE | YES | Archive storage engine |
| MRG_MYISAM | YES | Collection of identical MyISAM tables |
+------------+---------+----------------------------------------------------------------+
If it is not “Support”ed (on) you need to add ‘federated=ON‘ to the [mysqld] section of your /etc/my.cnf file. I found this section to be a bit troublesome. It must be ‘=ON’ not ‘=YES” or even ‘=on’. Most options allow these but the federated options is picky. I’m running MySQL Enterprise 5.1.37.sp1.
2) If you don’t already have the database created, create the database on the storage server. By ‘storage server’ I mean the one where the data will be written to disk.
I like to create a user just for the purpose of connection the federated copy of the database to the true database. This way, if the password gets changed or the user deleted, the federated system can continue to connect.
mysql> CREATE DATABASE xfiles;
mysql> USE xfiles;
mysql> CREATE TABLE cases(
Name VARCHAR(20),
case TINYINT(3),
) ENGINE = INNODB;
3) Now you can create the federated version of your data on the remote system.
mysql> CREATE DATABASE xfiles;
mysql> USE xfiles;
mysql> CREATE TABLE cases(
Name VARCHAR(20),
case TINYINT(3),
) ENGINE = FEDERATED
CONNECTION = 'mysql://skiner:c0nsper@fbi/xfiles/cases';
4) Check your work. The table status should show Engine: FEDERATED.
mysql> use xfiles;
mysql> show table status\G
Now you can add records to the table and the data should show up in select on either server.
Enjoy.
";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Wed, 07 Jul 2010 15:55:47 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:7:{i:0;a:5:{s:4:"data";s:17:"Tips & Tricks";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:1;a:5:{s:4:"data";s:4:"Data";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:2;a:5:{s:4:"data";s:6:"Engine";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:3;a:5:{s:4:"data";s:9:"Federated";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:4;a:5:{s:4:"data";s:5:"MySQL";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:5;a:5:{s:4:"data";s:13:"remote server";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:6;a:5:{s:4:"data";s:6:"Tables";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:3879:"Your searching for how to create a join across two databases on two different servers and it can’t be done directly. select d1.a, d2.b from db1@server1 join db2@server2 where db1.c = db2.c; does not work.
You learn about federated databases. The federated storage engine allows accesses data in tables of remote databases. Now how do you make it work?
1) Check if the federated storage engine is supported. Federation is OFF by default!
mysql> show engines;
+------------+---------+----------------------------------------------------------------+
| Engine | Support | Comment |
+------------+---------+----------------------------------------------------------------+
| InnoDB | YES | Supports transactions, row-level locking, and foreign keys |
| MyISAM | DEFAULT | Default engine as of MySQL 3.23 with great performance |
| BLACKHOLE | YES | /dev/null storage engine (anything you write to it disappears) |
| CSV | YES | CSV storage engine |
| MEMORY | YES | Hash based, stored in memory, useful for temporary tables |
| FEDERATED | YES | Federated MySQL storage engine |
| ARCHIVE | YES | Archive storage engine |
| MRG_MYISAM | YES | Collection of identical MyISAM tables |
+------------+---------+----------------------------------------------------------------+
If it is not “Support”ed (on) you need to add ‘federated=ON‘ to the [mysqld] section of your /etc/my.cnf file. I found this section to be a bit troublesome. It must be ‘=ON’ not ‘=YES” or even ‘=on’. Most options allow these but the federated options is picky. I’m running MySQL Enterprise 5.1.37.sp1.
2) If you don’t already have the database created, create the database on the storage server. By ‘storage server’ I mean the one where the data will be written to disk.
I like to create a user just for the purpose of connection the federated copy of the database to the true database. This way, if the password gets changed or the user deleted, the federated system can continue to connect.
mysql> CREATE DATABASE xfiles;
mysql> USE xfiles;
mysql> CREATE TABLE cases(
Name VARCHAR(20),
case TINYINT(3),
) ENGINE = INNODB;
3) Now you can create the federated version of your data on the remote system.
mysql> CREATE DATABASE xfiles;
mysql> USE xfiles;
mysql> CREATE TABLE cases(
Name VARCHAR(20),
case TINYINT(3),
) ENGINE = FEDERATED
CONNECTION = 'mysql://skiner:c0nsper@fbi/xfiles/cases';
4) Check your work. The table status should show Engine: FEDERATED.
mysql> use xfiles;
mysql> show table status\G
Now you can add records to the table and the data should show up in select on either server.
Enjoy.

PlanetMySQL Voting:
Vote UP /
Vote DOWN";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:12:"Mark Grennan";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:21;a:6:{s:4:"data";s:58:"
";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:78:"So if I don't call myself 'open source vendor', then everything is fine? (yes)";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:25:"291 at http://openlife.cc";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:103:"http://openlife.cc/blogs/2010/july/so-if-he-doesnt-call-himself-open-source-vendor-then-everything-fine";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:410:"A lot has been written for and against open core now. Yet in the end, a couple tweets can catch all that is needed:
scurryn @h_ingo -- So as long as 'an open core vendor' doesn't call themselves 'an open source vendor' then everything's fine?
h_ingo @scurryn: pretty much. I think I owe everyone one more blog post to answer that question with a few more details.
(Twitter)
This is that blog post.
read more";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Wed, 07 Jul 2010 13:10:31 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:4:{i:0;a:5:{s:4:"data";s:15:"Business models";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:1;a:5:{s:4:"data";s:6:"Ethics";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:2;a:5:{s:4:"data";s:5:"MySQL";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:3;a:5:{s:4:"data";s:11:"Open Source";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:1073:"A lot has been written for and against open core now. Yet in the end, a couple tweets can catch all that is needed:
scurryn @h_ingo -- So as long as 'an open core vendor' doesn't call themselves 'an open source vendor' then everything's fine?
h_ingo @scurryn: pretty much. I think I owe everyone one more blog post to answer that question with a few more details.
(Twitter)
This is that blog post.
read more
PlanetMySQL Voting:
Vote UP /
Vote DOWN";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:11:"Henrik Ingo";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:22;a:6:{s:4:"data";s:48:"
";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:137:"Translation of "Chapter 6. Locks and deadlocks." of "Methods for searching errors in SQL application" just published.";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:71:"http://blogs.sun.com/svetasmirnova/entry/translation_of_chapter_6_locks";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:71:"http://blogs.sun.com/svetasmirnova/entry/translation_of_chapter_6_locks";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:620:"This is new part which contains information about what to do if problem is repeatable only when queries run concurrently.
Chapter 6. Locks and deadlocks.
In the last part we discussed how to find cause of the problem in case
if it is always repeatable. But there are cases when problem occurs
only under particular circumstances.
For example, such easy query can run long enough:
mysql> select * from t;
+-----+
| a |
+-----+
| 0 |
| 256 |
+-----+
2 rows in set (3 min 18.71 sec)
...
Rest of the chapter is here
";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Wed, 07 Jul 2010 11:55:41 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:2:{i:0;a:5:{s:4:"data";s:3:"Sun";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:1;a:5:{s:4:"data";s:5:"mysql";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:1281:"This is new part which contains information about what to do if problem is repeatable only when queries run concurrently.
Chapter 6. Locks and deadlocks.
In the last part we discussed how to find cause of the problem in case
if it is always repeatable. But there are cases when problem occurs
only under particular circumstances.
For example, such easy query can run long enough:
mysql> select * from t;
+-----+
| a |
+-----+
| 0 |
| 256 |
+-----+
2 rows in set (3 min 18.71 sec)
...
Rest of the chapter is here
PlanetMySQL Voting:
Vote UP /
Vote DOWN";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:14:"Sveta Smirnova";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:23;a:6:{s:4:"data";s:53:"
";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:47:"Implicit casting you don’t want to see around";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:36:"http://code.openark.org/blog/?p=2344";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:79:"http://code.openark.org/blog/mysql/implicit-casting-you-dont-want-to-see-around";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:3184:"In Beware of implicit casting, I have outlined the dangers of implicit casting. Here’s a few more real-world examples I have tackled:
Number-String comparisons
Much like in programming languages, implicit casting is made to numbers when at least one of the arguments is a number. Thus:
mysql> SELECT 3 = '3.0';
+-----------+
| 3 = '3.0' |
+-----------+
| 1 |
+-----------+
1 row in set (0.00 sec)
mysql> SELECT '3' = '3.0';
+-------------+
| '3' = '3.0' |
+-------------+
| 0 |
+-------------+
The second query consists of pure strings comparison. It has no way to determine that number comparison should be made.
Direct DATE arithmetics
The first query seems to work, but is completely incorrect. The second explains why. The third is a total mess.
mysql> SELECT DATE('2010-01-01')+3;
+----------------------+
| DATE('2010-01-01')+3 |
+----------------------+
| 20100104 |
+----------------------+
1 row in set (0.00 sec)
mysql> SELECT DATE('2010-01-01')-3;
+----------------------+
| DATE('2010-01-01')-3 |
+----------------------+
| 20100098 |
+----------------------+
1 row in set (0.00 sec)
mysql> SELECT '2010-01-01' - 3;
+------------------+
| '2010-01-01' - 3 |
+------------------+
| 2007 |
+------------------+
1 row in set, 1 warning (0.00 sec)
Number-String comparisons, big integers
Look at the following crazy comparisons:
mysql> SELECT 1234 = '1234';
+---------------+
| 1234 = '1234' |
+---------------+
| 1 |
+---------------+
mysql> SELECT 123456789012345678 = '123456789012345678';
+-------------------------------------------+
| 123456789012345678 = '123456789012345678' |
+-------------------------------------------+
| 0 |
+-------------------------------------------+
mysql> SELECT 123456789012345678 = '123456789012345677';
+-------------------------------------------+
| 123456789012345678 = '123456789012345677' |
+-------------------------------------------+
| 1 |
+-------------------------------------------+
The amazing result of the last two comparisons may strike as odd. Actually, it may strike as a bug, and indeed when a customer approached me with this behavior I was at loss for words. But this is documented. The manual describes the cases for casting, then states: “… In all other cases, the arguments are compared as floating-point (real) numbers. …”
Lessons learned:
Be careful when comparing strings with floating point values. Matching depends on how both are represented.
Avoid converting temporal types to strings when doing date manipulation.
Avoid direct math on temporal types.
Avoid casting BIGINTs represented by strings. Casting will turn out to use FLOATs and may be incorrect.
Last but not least:
Use the proper data types for your data’s representation. When dealing with numbers, use numbers. When dealing with temporal values, use temporal types.
";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Wed, 07 Jul 2010 08:53:37 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:3:{i:0;a:5:{s:4:"data";s:5:"MySQL";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:1;a:5:{s:4:"data";s:10:"Data Types";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:2;a:5:{s:4:"data";s:3:"SQL";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:3868:"In Beware of implicit casting, I have outlined the dangers of implicit casting. Here’s a few more real-world examples I have tackled:
Number-String comparisons
Much like in programming languages, implicit casting is made to numbers when at least one of the arguments is a number. Thus:
mysql> SELECT 3 = '3.0';
+-----------+
| 3 = '3.0' |
+-----------+
| 1 |
+-----------+
1 row in set (0.00 sec)
mysql> SELECT '3' = '3.0';
+-------------+
| '3' = '3.0' |
+-------------+
| 0 |
+-------------+
The second query consists of pure strings comparison. It has no way to determine that number comparison should be made.
Direct DATE arithmetics
The first query seems to work, but is completely incorrect. The second explains why. The third is a total mess.
mysql> SELECT DATE('2010-01-01')+3;
+----------------------+
| DATE('2010-01-01')+3 |
+----------------------+
| 20100104 |
+----------------------+
1 row in set (0.00 sec)
mysql> SELECT DATE('2010-01-01')-3;
+----------------------+
| DATE('2010-01-01')-3 |
+----------------------+
| 20100098 |
+----------------------+
1 row in set (0.00 sec)
mysql> SELECT '2010-01-01' - 3;
+------------------+
| '2010-01-01' - 3 |
+------------------+
| 2007 |
+------------------+
1 row in set, 1 warning (0.00 sec)
Number-String comparisons, big integers
Look at the following crazy comparisons:
mysql> SELECT 1234 = '1234';
+---------------+
| 1234 = '1234' |
+---------------+
| 1 |
+---------------+
mysql> SELECT 123456789012345678 = '123456789012345678';
+-------------------------------------------+
| 123456789012345678 = '123456789012345678' |
+-------------------------------------------+
| 0 |
+-------------------------------------------+
mysql> SELECT 123456789012345678 = '123456789012345677';
+-------------------------------------------+
| 123456789012345678 = '123456789012345677' |
+-------------------------------------------+
| 1 |
+-------------------------------------------+
The amazing result of the last two comparisons may strike as odd. Actually, it may strike as a bug, and indeed when a customer approached me with this behavior I was at loss for words. But this is documented. The manual describes the cases for casting, then states: “… In all other cases, the arguments are compared as floating-point (real) numbers. …”
Lessons learned:
- Be careful when comparing strings with floating point values. Matching depends on how both are represented.
- Avoid converting temporal types to strings when doing date manipulation.
- Avoid direct math on temporal types.
- Avoid casting BIGINTs represented by strings. Casting will turn out to use FLOATs and may be incorrect.
Last but not least:
- Use the proper data types for your data’s representation. When dealing with numbers, use numbers. When dealing with temporal values, use temporal types.
PlanetMySQL Voting:
Vote UP /
Vote DOWN";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:12:"Shlomi Noach";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:24;a:6:{s:4:"data";s:53:"
";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:39:"High Availability MySQL Cookbook review";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:30:"http://fernandoipar.com/?p=204";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:75:"http://fernandoipar.com/2010/07/07/high-availability-mysql-cookbook-review/";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:1832:"High Availability MySQL Cookbook (Alex Davies, Packt Publishing) presents different approaches to achieve high availability with MySQL.
The bulk of the book is dedicated to MySQL Cluster, with shorter sections on:
MySQL replication
shared storage
block level replication
performance tuning
The recipes are clear and well explained, based on a CentOS distribution, and it seems any technically skilled person could follow them without issues.
What’s lacking are some design aspects. Based on this material, one probably wouldn’t be able to decide what the best high availability architecture is for a given problem. Actually, one may even be tempted to think MySQL Cluster is the best fit for most scenarios, given the percentage of the book dedicated to it. Nevertheless, there’s a section about Cluster limitations and potential problems, so the cautious reader won’t be tempted to choose this solution for every new project.
I also found that some important considerations regarding replication are missing.
The reader is instructed to rely on Seconds_Behind_Master alone to monitor replication, and there’s no mention to the situations that can cause as slave to go out of sync, nor of a process to fix this problem.
However, this book is a useful addition to any MySQL practitioner’s library, provided you don’t expect to rely only on it to design and deploy your MySQL based highly available services.
Related posts:MySQL Certification self study I’m taking the MySQL Certification exams soon, and while I’d...Using MySQL sandbox for testing MySQL Sandbox is a great tool for quickly deploying test...Indexing text columns in MySQL This time, I’m talking about indexes for string typed columns....
Related posts brought to you by Yet Another Related Posts Plugin.";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Wed, 07 Jul 2010 04:03:17 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:3:{i:0;a:5:{s:4:"data";s:5:"MySQL";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:1;a:5:{s:4:"data";s:12:"Availability";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:2;a:5:{s:4:"data";s:6:"Review";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:2909:"High Availability MySQL Cookbook (Alex Davies, Packt Publishing) presents different approaches to achieve high availability with MySQL.
The bulk of the book is dedicated to MySQL Cluster, with shorter sections on:
- MySQL replication
- shared storage
- block level replication
- performance tuning
The recipes are clear and well explained, based on a CentOS distribution, and it seems any technically skilled person could follow them without issues.
What’s lacking are some design aspects. Based on this material, one probably wouldn’t be able to decide what the best high availability architecture is for a given problem. Actually, one may even be tempted to think MySQL Cluster is the best fit for most scenarios, given the percentage of the book dedicated to it. Nevertheless, there’s a section about Cluster limitations and potential problems, so the cautious reader won’t be tempted to choose this solution for every new project.
I also found that some important considerations regarding replication are missing.
The reader is instructed to rely on Seconds_Behind_Master alone to monitor replication, and there’s no mention to the situations that can cause as slave to go out of sync, nor of a process to fix this problem.
However, this book is a useful addition to any MySQL practitioner’s library, provided you don’t expect to rely only on it to design and deploy your MySQL based highly available services.
Related posts:
- MySQL Certification self study I’m taking the MySQL Certification exams soon, and while I’d...
- Using MySQL sandbox for testing MySQL Sandbox is a great tool for quickly deploying test...
- Indexing text columns in MySQL This time, I’m talking about indexes for string typed columns....
Related posts brought to you by Yet Another Related Posts Plugin.
PlanetMySQL Voting:
Vote UP /
Vote DOWN";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:13:"Fernando Ipar";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:25;a:6:{s:4:"data";s:88:"
";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:55:"A review of Guerrilla Capacity Planning by Neil Gunther";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:33:"http://www.xaprb.com/blog/?p=1946";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:93:"http://www.xaprb.com/blog/2010/07/06/a-review-of-guerrilla-capacity-planning-by-neil-gunther/";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:10462:"Guerrilla Capacity Planning
Guerrilla Capacity Planning. By Neil J. Gunther, Springer 2007. Page count: about 200 pages, plus appendixes. (Here’s a link to the publisher’s site.)
Of all the books I’ve reviewed, this one has taken me the longest to study first. That’s because there is a lot of math involved, and Neil Gunther knows a lot more about it than I do. Here’s the short version: I’m learning how to use this in the real world, but that’s going to take many months, probably years. I’ve already spent about 10 months studying this book, and have read it all the way through twice — parts of it five times or more. Needless to say, if I didn’t think this was a book with value, I wouldn’t be doing that. But you’ll only get out of this book what you put in. If you want to learn a wholly new way to understand software and hardware scalability, and how to do capacity planning as a result, then buy the book and set aside some study time. But don’t think you’re going to breeze through this book and end up with a simple N-step method to take capacity forecasts to your boss. If you want that, buy John Allspaw’s book instead. (If you’re reading this blog post, you need that book.)
I don’t want to spend a lot of time talking about Neil’s method, because honestly the book isn’t about the method first and foremost, and I think many readers will have a hard time digging the capacity planning method out of the math-ness. This book is, in a sense, a textbook or workbook for his training courses. It begins with a lot of general topics, such as how managers think about capacity, risk, what’s needed in the world of businesses that are driven by Wall Street, ITIL, and so on. Then there’s the mathematical background for the rest of the book, things like significant digits and expressing error.
The part of the book that I’m still studying begins in Chapter 4, which introduces ways to quantify scalability. The math begins with Amdahl’s Law, which you may have heard of. It turns out that not only can this be used to understand how much overall speedup is possible by speeding up part of a process, which is how I’m used to using it, but it can be used to model what is possible with parallelization. (I think I actually learned this in my university classes, but I’d forgotten its uses in parallel computing since then.) Anyway, it’s a straightforward model that makes intuitive sense and is easy to accept. I believe in it because it’s so logical and simple, and because I’ve worked with it for a long time. That’s the last bit of math in this book that I can understand so solidly, because after that, we get into a lot of things that have to do with interaction between concurrently performed work, and nothing is ever intuitive about that domain.
Now, when you’re talking about scalability, you generally are working with scalability of concurrent systems, and queueing theory is Topic Number One. Proper queueing theory is correctly modeled, under certain very restricted conditions, by the Erlang C formula. This is a complex bit of math, and although I believe in it, I don’t understand it enough to know how it’s derived or proven to be correct. Well, there’s no Erlang C math in this book. Neil Gunther goes a completely different direction. Instead of modeling the impact of queueing through the math that describes the model, he creates a new model. Let’s leave the model for later, and just look at what’s nice about not using Erlang C math to model computer system scalability:
The Erlang C formula requires complex calculations.
It is valid only in restricted conditions, and it’s a lot of work to prove that your workload conforms.
It models queueing delay, but it doesn’t model coherency delay.
It requires inputs such as service time, which are difficult or impossible to measure accurately.
Someone once said that all models are wrong, but some models are useful. Neil Gunther heads in the direction of a more useful model. First, he proves that two parameters are necessary and sufficient to create a realistic model. Next, he introduces another parameter into Amdahl’s Law to account for coherency delay. The resulting (still simple) equation models serial delay (the reverse of parallel speedup) and coherency delay. Now we have a model for how a system scales under a given workload as you increasingly parallelize the hardware. This is the universal scalability model. From the mathematical point of view, it’s the crowning achievement of this book. I’m very much summarizing, by the way. There’s a lot to think about in developing such a model, so the reader gets quite a tour de force here. Along the way Neil shows how you can arrive at the same surprising result through an entirely different route, without even using Amdahl’s Law as a starting point.
There are other models. Neil discusses these. They all have problems. Some don’t model what we know can happen in the real world — retrograde scaling — where performance can decrease when you add more power to a system. Others are physically impossible, predicting negative speedup. Negative speedup means the system’s performance goes below zero. As in, you ask it to do work, and it, uh, takes back work it’s already done? Impossible. So it certainly looks like Neil’s model is the strongest contender. By the way, Craig Shallahamer’s book on forecasting Oracle performance uses the universal scalability model, although without the mathematical rigor.
Now, the problem is how to apply this in the real world. To model a system’s performance, you have to know the value of those two magical parameters. How on earth can you find these values? This seems to be just as hard as Erlang C math. But Neil shows the second most remarkable thing: if you transform the universal scalability model around a bit, then you get a polynomial of degree two. This is exciting because if you take some measurements of your system’s observed performance at different points on the scalability curve (holding the work per processor constant, and adding more processors), and then transform those measurements in an equivalent manner, you can fit a regression curve through those points. Now you can reverse the transformations to the equation, plug in the coefficients of the quadratic equation that resulted from the curve-fitting, and out come the parameters you need for the universal scalability equation! Final result: you can extrapolate out beyond your observations and predict the system’s scalability.
We’re not done. All of this was about hardware scalability: “how much faster will this system run if I add more CPUs?” Software scalability is next. Neil goes back to the basics, starting with how Amdahl’s Law applies to software speedup, and essentially covers all the same ground we’ve already covered, but this time modeling what happens when you hold the hardware constant, and increase the concurrency of the workload the software is serving. It turns out that exactly the same scalability model holds for software as it did for hardware. This is why he calls it the universal scalability model. But not only that, it works for multi-tier architectures of arbitrary complexity.
And this is why I say I am not competent to really prove or disprove the validity of the whole thing. It makes sense to me that even a multi-tier architecture can conform to a model with two parameters. As we know in the real world, there is usually a single worst bottleneck, a weakest link. And therefore no matter how complex the architecture, the dominant factors limiting scalability are still coherency and/or queueing at the bottleneck, and how much you can parallelize (Amdahl’s Law). Thus, the universal scalability model intuitively might be valid for such architectures. But proving it — wow, that’s way beyond me. I know my limits. I’m taking it all on faith, experience, and intuition at this point.
In my mind, the results Neil Gunther derives up to this point in the book would have been plenty. However, there’s lots more left in the book. The rest of the book is about how to use the model for capacity planning, but surprisingly, it’s not about just how to use the universal scalability model. It’s about Guerrilla Capacity Planning in the real world. Right after exploring software scalability, he dives into virtualization for a whole chapter — and then shows you how to measure, model, and predict the scalability of various virtualization technologies. Next chapter: web site capacity planning. After that? “Gargantuan Computing: GRIDs and P2P.” Yep, he analyzes the scalability limits of Gnutella and friends. And then, apparently just because he can, he dissects arguments about network traffic in general (read: “how scalable is the Internet?”). I can’t pretend to understand all this myself. I’m just following along.
I have a feeling that Neil Gunther is kind of like Einstein: his real gift is his ability to create thought experiments that make the model accessible to mortals. Maybe someday he’ll be a legend you learn about in CS101 classes, or maybe someday he’ll be proven wrong like Newton, who knows. In the meantime, I’m going to keep working on applying it all in the real world, especially to MySQL, and see what comes of it. The fact that I’m still doing that bears out what I said earlier: you aren’t going to just waltz through this book and come away with a clear picture of how to work through a capacity planning method. You’ll have some work to do. If you want an elegant and simple capacity planning method, then you should buy John Allspaw’s The Art Of Capacity Planning instead.
Related posts:A review of The Art of Capacity Planning by John AllspawA review of Forecasting Oracle Performance by Craig ShallahamerA review of SQL and Relational Theory by C. J. DateA review of Optimizing Oracle Performance by Cary MillsapA Review of Beginning Database Design by Clare Churcher";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Wed, 07 Jul 2010 01:54:45 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:10:{i:0;a:5:{s:4:"data";s:6:"Review";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:1;a:5:{s:4:"data";s:3:"SQL";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:2;a:5:{s:4:"data";s:17:"Capacity Planning";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:3;a:5:{s:4:"data";s:12:"Cary Millsap";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:4;a:5:{s:4:"data";s:17:"Craig Shallahamer";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:5;a:5:{s:4:"data";s:12:"John Allspaw";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:6;a:5:{s:4:"data";s:12:"Neil Gunther";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:7;a:5:{s:4:"data";s:15:"Queueing Theory";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:8;a:5:{s:4:"data";s:11:"Scalability";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:9;a:5:{s:4:"data";s:27:"Universal Scalability Model";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:13120:"
Guerrilla Capacity Planning
Guerrilla Capacity Planning. By Neil J. Gunther, Springer 2007. Page count: about 200 pages, plus appendixes. (Here’s a link to the publisher’s site.)
Of all the books I’ve reviewed, this one has taken me the longest to study first. That’s because there is a lot of math involved, and Neil Gunther knows a lot more about it than I do. Here’s the short version: I’m learning how to use this in the real world, but that’s going to take many months, probably years. I’ve already spent about 10 months studying this book, and have read it all the way through twice — parts of it five times or more. Needless to say, if I didn’t think this was a book with value, I wouldn’t be doing that. But you’ll only get out of this book what you put in. If you want to learn a wholly new way to understand software and hardware scalability, and how to do capacity planning as a result, then buy the book and set aside some study time. But don’t think you’re going to breeze through this book and end up with a simple N-step method to take capacity forecasts to your boss. If you want that, buy John Allspaw’s book instead. (If you’re reading this blog post, you need that book.)
I don’t want to spend a lot of time talking about Neil’s method, because honestly the book isn’t about the method first and foremost, and I think many readers will have a hard time digging the capacity planning method out of the math-ness. This book is, in a sense, a textbook or workbook for his training courses. It begins with a lot of general topics, such as how managers think about capacity, risk, what’s needed in the world of businesses that are driven by Wall Street, ITIL, and so on. Then there’s the mathematical background for the rest of the book, things like significant digits and expressing error.
The part of the book that I’m still studying begins in Chapter 4, which introduces ways to quantify scalability. The math begins with Amdahl’s Law, which you may have heard of. It turns out that not only can this be used to understand how much overall speedup is possible by speeding up part of a process, which is how I’m used to using it, but it can be used to model what is possible with parallelization. (I think I actually learned this in my university classes, but I’d forgotten its uses in parallel computing since then.) Anyway, it’s a straightforward model that makes intuitive sense and is easy to accept. I believe in it because it’s so logical and simple, and because I’ve worked with it for a long time. That’s the last bit of math in this book that I can understand so solidly, because after that, we get into a lot of things that have to do with interaction between concurrently performed work, and nothing is ever intuitive about that domain.
Now, when you’re talking about scalability, you generally are working with scalability of concurrent systems, and queueing theory is Topic Number One. Proper queueing theory is correctly modeled, under certain very restricted conditions, by the Erlang C formula. This is a complex bit of math, and although I believe in it, I don’t understand it enough to know how it’s derived or proven to be correct. Well, there’s no Erlang C math in this book. Neil Gunther goes a completely different direction. Instead of modeling the impact of queueing through the math that describes the model, he creates a new model. Let’s leave the model for later, and just look at what’s nice about not using Erlang C math to model computer system scalability:
- The Erlang C formula requires complex calculations.
- It is valid only in restricted conditions, and it’s a lot of work to prove that your workload conforms.
- It models queueing delay, but it doesn’t model coherency delay.
- It requires inputs such as service time, which are difficult or impossible to measure accurately.
Someone once said that all models are wrong, but some models are useful. Neil Gunther heads in the direction of a more useful model. First, he proves that two parameters are necessary and sufficient to create a realistic model. Next, he introduces another parameter into Amdahl’s Law to account for coherency delay. The resulting (still simple) equation models serial delay (the reverse of parallel speedup) and coherency delay. Now we have a model for how a system scales under a given workload as you increasingly parallelize the hardware. This is the universal scalability model. From the mathematical point of view, it’s the crowning achievement of this book. I’m very much summarizing, by the way. There’s a lot to think about in developing such a model, so the reader gets quite a tour de force here. Along the way Neil shows how you can arrive at the same surprising result through an entirely different route, without even using Amdahl’s Law as a starting point.
There are other models. Neil discusses these. They all have problems. Some don’t model what we know can happen in the real world — retrograde scaling — where performance can decrease when you add more power to a system. Others are physically impossible, predicting negative speedup. Negative speedup means the system’s performance goes below zero. As in, you ask it to do work, and it, uh, takes back work it’s already done? Impossible. So it certainly looks like Neil’s model is the strongest contender. By the way, Craig Shallahamer’s book on forecasting Oracle performance uses the universal scalability model, although without the mathematical rigor.
Now, the problem is how to apply this in the real world. To model a system’s performance, you have to know the value of those two magical parameters. How on earth can you find these values? This seems to be just as hard as Erlang C math. But Neil shows the second most remarkable thing: if you transform the universal scalability model around a bit, then you get a polynomial of degree two. This is exciting because if you take some measurements of your system’s observed performance at different points on the scalability curve (holding the work per processor constant, and adding more processors), and then transform those measurements in an equivalent manner, you can fit a regression curve through those points. Now you can reverse the transformations to the equation, plug in the coefficients of the quadratic equation that resulted from the curve-fitting, and out come the parameters you need for the universal scalability equation! Final result: you can extrapolate out beyond your observations and predict the system’s scalability.
We’re not done. All of this was about hardware scalability: “how much faster will this system run if I add more CPUs?” Software scalability is next. Neil goes back to the basics, starting with how Amdahl’s Law applies to software speedup, and essentially covers all the same ground we’ve already covered, but this time modeling what happens when you hold the hardware constant, and increase the concurrency of the workload the software is serving. It turns out that exactly the same scalability model holds for software as it did for hardware. This is why he calls it the universal scalability model. But not only that, it works for multi-tier architectures of arbitrary complexity.
And this is why I say I am not competent to really prove or disprove the validity of the whole thing. It makes sense to me that even a multi-tier architecture can conform to a model with two parameters. As we know in the real world, there is usually a single worst bottleneck, a weakest link. And therefore no matter how complex the architecture, the dominant factors limiting scalability are still coherency and/or queueing at the bottleneck, and how much you can parallelize (Amdahl’s Law). Thus, the universal scalability model intuitively might be valid for such architectures. But proving it — wow, that’s way beyond me. I know my limits. I’m taking it all on faith, experience, and intuition at this point.
In my mind, the results Neil Gunther derives up to this point in the book would have been plenty. However, there’s lots more left in the book. The rest of the book is about how to use the model for capacity planning, but surprisingly, it’s not about just how to use the universal scalability model. It’s about Guerrilla Capacity Planning in the real world. Right after exploring software scalability, he dives into virtualization for a whole chapter — and then shows you how to measure, model, and predict the scalability of various virtualization technologies. Next chapter: web site capacity planning. After that? “Gargantuan Computing: GRIDs and P2P.” Yep, he analyzes the scalability limits of Gnutella and friends. And then, apparently just because he can, he dissects arguments about network traffic in general (read: “how scalable is the Internet?”). I can’t pretend to understand all this myself. I’m just following along.
I have a feeling that Neil Gunther is kind of like Einstein: his real gift is his ability to create thought experiments that make the model accessible to mortals. Maybe someday he’ll be a legend you learn about in CS101 classes, or maybe someday he’ll be proven wrong like Newton, who knows. In the meantime, I’m going to keep working on applying it all in the real world, especially to MySQL, and see what comes of it. The fact that I’m still doing that bears out what I said earlier: you aren’t going to just waltz through this book and come away with a clear picture of how to work through a capacity planning method. You’ll have some work to do. If you want an elegant and simple capacity planning method, then you should buy John Allspaw’s The Art Of Capacity Planning instead.
Related posts:
- A review of The Art of Capacity Planning by John Allspaw
- A review of Forecasting Oracle Performance by Craig Shallahamer
- A review of SQL and Relational Theory by C. J. Date
- A review of Optimizing Oracle Performance by Cary Millsap
- A Review of Beginning Database Design by Clare Churcher
PlanetMySQL Voting:
Vote UP /
Vote DOWN";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:22:"Baron Schwartz (xaprb)";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:26;a:6:{s:4:"data";s:78:"
";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:49:"Upcoming Conferences with dedicated MySQL content";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:38:"http://ronaldbradford.com/blog/?p=3009";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:92:"http://ronaldbradford.com/blog/upcoming-conferences-with-dedicated-mysql-content-2010-07-06/";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:2383:"We recently held a dedicated MySQL Track at ODTUG Kaleidoscope 2010 conference for 4 days. This is the first of many Oracle events that will begin to include dedicated MySQL content.
If your attending OSCON 2010 in the next few weeks you will see a number of MySQL presentations.
MySQL will be represented at Open World 2010 in September with MySQL Sunday. Giuseppe has created a great one page summary of speakers. This event is described as technical sessions, an un-conference and an fireside chat with Edward Screven. I’ve seen tickets listed at $50 or $75 for the day.
Open SQL Camp will be held in Germany in August, and Boston in October. This is a great FREE event that includes technical content not just on MySQL but other open source databases and data stores.
You will also find dedicated MySQL tracks in Europe at the German Oracle Users Group (DOAG) conference in November and the United Kingdom Oracle Users Group (UKOUG) in November that I am planning on attending.
In 2011 there is already a lineup of events that will all contain multiple tracks of MySQL content.
April – Collaborate 11 in Orlando, FL. (no landing page 2010 site)
April – O’Reilly MySQL Conference in Santa Clara, CA (no landing page 2010 site)
June – Kaleidoscope 2011 in Long Beach, CA
For the MySQL community the introduction of various large Oracle conferences may be confusing. From my perspective I describe the big three as.
Oracle Open World is targeted towards marketing. This includes product announcements, case studies and first class events.
Collaborate is targeted towards deployment and includes 3 different user groups, the IOUG representing the Oracle Database, the Oracle Applications User Group, and the Quest Group.
ODTUG Kaleidoscope is targeted towards development. This includes the tools and technologies for developers and DBA’s to do your job.
Having just attended Kaleidoscope 2010, and being a relative unknown I left with a great impression of an open, technical and welcoming event. There was a great atmosphere, great events with excellent food for breakfast, lunch and dinner and I now have a long list of new friends. This conference very much reflected being part of a greater extended family, the experience I have enjoyed at previous MySQL conferences. I’ve already committed to being involved next year.";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Tue, 06 Jul 2010 21:51:28 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:8:{i:0;a:5:{s:4:"data";s:9:"Databases";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:1;a:5:{s:4:"data";s:17:"Kaleidoscope 2010";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:2;a:5:{s:4:"data";s:5:"MySQL";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:3;a:5:{s:4:"data";s:12:"MySQL Events";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:4;a:5:{s:4:"data";s:22:"MySQL User Conferences";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:5;a:5:{s:4:"data";s:6:"Oracle";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:6;a:5:{s:4:"data";s:24:"Oracle/MySQL Conferences";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:7;a:5:{s:4:"data";s:12:"Professional";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:3373:"We recently held a dedicated MySQL Track at ODTUG Kaleidoscope 2010 conference for 4 days. This is the first of many Oracle events that will begin to include dedicated MySQL content.
If your attending OSCON 2010 in the next few weeks you will see a number of MySQL presentations.
MySQL will be represented at Open World 2010 in September with MySQL Sunday. Giuseppe has created a great one page summary of speakers. This event is described as technical sessions, an un-conference and an fireside chat with Edward Screven. I’ve seen tickets listed at $50 or $75 for the day.
Open SQL Camp will be held in Germany in August, and Boston in October. This is a great FREE event that includes technical content not just on MySQL but other open source databases and data stores.
You will also find dedicated MySQL tracks in Europe at the German Oracle Users Group (DOAG) conference in November and the United Kingdom Oracle Users Group (UKOUG) in November that I am planning on attending.
In 2011 there is already a lineup of events that will all contain multiple tracks of MySQL content.
For the MySQL community the introduction of various large Oracle conferences may be confusing. From my perspective I describe the big three as.
- Oracle Open World is targeted towards marketing. This includes product announcements, case studies and first class events.
- Collaborate is targeted towards deployment and includes 3 different user groups, the IOUG representing the Oracle Database, the Oracle Applications User Group, and the Quest Group.
- ODTUG Kaleidoscope is targeted towards development. This includes the tools and technologies for developers and DBA’s to do your job.
Having just attended Kaleidoscope 2010, and being a relative unknown I left with a great impression of an open, technical and welcoming event. There was a great atmosphere, great events with excellent food for breakfast, lunch and dinner and I now have a long list of new friends. This conference very much reflected being part of a greater extended family, the experience I have enjoyed at previous MySQL conferences. I’ve already committed to being involved next year.
PlanetMySQL Voting:
Vote UP /
Vote DOWN";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:15:"Ronald Bradford";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:27;a:6:{s:4:"data";s:78:"
";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:55:"Making “Insert Ignore” Fast, by Avoiding Disk Seeks";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:26:"http://tokutek.com/?p=1577";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:76:"http://tokutek.com/2010/07/making-insert-ignore-fast-by-avoiding-disk-seeks/";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:3565:"
In my post from three weeks ago, I explained why the semantics of normal ad-hoc insertions with a primary key are expensive because they require disk seeks on large data sets. Towards the end of the post, I claimed that it would be better to use “replace into” or “insert ignore” over normal inserts, because the semantics of these statements do NOT require disk seeks. In my post last week, I explained how the command “replace into” can be fast with TokuDB’s fractal trees. Today, I explain how “insert ignore” can be fast, using a strategy that is very similar to what we do with “replace into”.
The semantics of “insert ignore” are similar to that of “replace into”:
if the primary (or unique) key does not exist: insert the new row
if the primary (or unique) key does exist: do nothing
B-trees have the same problem with “insert ignore” that they have with “replace into”. They perform a lookup of the primary key, incurring a disk seek. We have already shown how fractal trees do not incur this disk seek for “replace into”, so let’s see how we can avoid disk seeks with “insert ignore”.
The only difference with “replace into” is when the primary (or unique) key exists, instead of overwriting the old row with the new row, we disregard the new row. So, all we need to do is tweak our tombstone messaging scheme (that we use for deletes and “replace into”) so that when “insert ignore” commands do not overwrite old rows with new rows. Similar to deletes and replace into, with this scheme, “insert ignore” can be two orders of magnitude faster than insertions into a B-tree.
Here is what we do. We insert a message into the fractal tree, with a new message “ii”, to signify that we are doing an “insert ignore”. The only difference between this message and the normal “i” message for insertions is what we do on queries and merges. On queries, if the message is an “ii”, then the value in the LOWER node is read, and not the higher node. On merges, if the higher node has a message of “ii”, the value in the LOWER node takes precedence over the value in the higher node.
Let’s look at an example that is similar to what we looked at for “replace into”:
create table foo (a int, b int, primary key (a));
Suppose the fractal tree for this table looks as follows:
-
- -
- - - -
....
(i (1,1)) (i (2,2)) (i (3,3)) (i (4,4)) ... (i (1000,1000)) ... (i (2^32, 2^32))
The ‘i’ stands for insertion message. Now suppose we do:
insert ignore into foo values (1000, 1001).
With fractal trees, we insert (ii (1000,1001)) into the top node. The tree then looks as such:
(ii (1000,1001))
- -
- - - -
....
(i (1,1)) (i (2,2)) (i (3,3)) (i (4,4)) ... (i (2^32, 2^32))
So upon querying the key ’1000′, a cursor notices that (1000,1001) has a message of “ii”. If it finds another value for the key 1000 in a lower node, it reads that value, otherwise, it reads (1000,1001). Because (1000,1000) is located in a lower node, the cursor returns (1000,1000) to the user. On merges, the message in the lower node, (1000,1000) overwrites the message in the higher node, (1000,1001).
While “insert ignore” can be fast, there are caveats (indexes, triggers, replication), just as there are with “replace into”. In a future posting, I will get into some of them.";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Tue, 06 Jul 2010 20:57:15 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:8:{i:0;a:5:{s:4:"data";s:8:"TokuView";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:1;a:5:{s:4:"data";s:6:"B-tree";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:2;a:5:{s:4:"data";s:9:"disk seek";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:3;a:5:{s:4:"data";s:13:"Fractal Trees";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:4;a:5:{s:4:"data";s:6:"insert";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:5;a:5:{s:4:"data";s:13:"insert ignore";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:6;a:5:{s:4:"data";s:5:"mysql";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:7;a:5:{s:4:"data";s:6:"TokuDB";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:4215:"
In my post from three weeks ago, I explained why the semantics of normal ad-hoc insertions with a primary key are expensive because they require disk seeks on large data sets. Towards the end of the post, I claimed that it would be better to use “replace into” or “insert ignore” over normal inserts, because the semantics of these statements do NOT require disk seeks. In my post last week, I explained how the command “replace into” can be fast with TokuDB’s fractal trees. Today, I explain how “insert ignore” can be fast, using a strategy that is very similar to what we do with “replace into”.
The semantics of “insert ignore” are similar to that of “replace into”:
- if the primary (or unique) key does not exist: insert the new row
- if the primary (or unique) key does exist: do nothing
B-trees have the same problem with “insert ignore” that they have with “replace into”. They perform a lookup of the primary key, incurring a disk seek. We have already shown how fractal trees do not incur this disk seek for “replace into”, so let’s see how we can avoid disk seeks with “insert ignore”.
The only difference with “replace into” is when the primary (or unique) key exists, instead of overwriting the old row with the new row, we disregard the new row. So, all we need to do is tweak our tombstone messaging scheme (that we use for deletes and “replace into”) so that when “insert ignore” commands do not overwrite old rows with new rows. Similar to deletes and replace into, with this scheme, “insert ignore” can be two orders of magnitude faster than insertions into a B-tree.
Here is what we do. We insert a message into the fractal tree, with a new message “ii”, to signify that we are doing an “insert ignore”. The only difference between this message and the normal “i” message for insertions is what we do on queries and merges. On queries, if the message is an “ii”, then the value in the LOWER node is read, and not the higher node. On merges, if the higher node has a message of “ii”, the value in the LOWER node takes precedence over the value in the higher node.
Let’s look at an example that is similar to what we looked at for “replace into”:
create table foo (a int, b int, primary key (a));
Suppose the fractal tree for this table looks as follows:
-
- -
- - - -
....
(i (1,1)) (i (2,2)) (i (3,3)) (i (4,4)) ... (i (1000,1000)) ... (i (2^32, 2^32))
The ‘i’ stands for insertion message. Now suppose we do:
insert ignore into foo values (1000, 1001).
With fractal trees, we insert (ii (1000,1001)) into the top node. The tree then looks as such:
(ii (1000,1001))
- -
- - - -
....
(i (1,1)) (i (2,2)) (i (3,3)) (i (4,4)) ... (i (2^32, 2^32))
So upon querying the key ’1000′, a cursor notices that (1000,1001) has a message of “ii”. If it finds another value for the key 1000 in a lower node, it reads that value, otherwise, it reads (1000,1001). Because (1000,1000) is located in a lower node, the cursor returns (1000,1000) to the user. On merges, the message in the lower node, (1000,1000) overwrites the message in the higher node, (1000,1001).
While “insert ignore” can be fast, there are caveats (indexes, triggers, replication), just as there are with “replace into”. In a future posting, I will get into some of them.
PlanetMySQL Voting:
Vote UP /
Vote DOWN";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:13:"Tokuview Blog";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:28;a:6:{s:4:"data";s:38:"
";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:5:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:45:"Prepared statements - Are they useful or not?";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:70:"tag:blogger.com,1999:blog-9144505959002328789.post-2956693581031294843";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:91:"http://karlssonondatabases.blogspot.com/2010/07/prepared-statements-are-they-useful-or.html";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:5675:"MySQL has supported prepared statements since version 5.0, but the use of the non-prepared statement API is still much more popular. Here, I'll explain what prepared statements are in short, and show some examples and why you would, and would not, use them.The concept of a prepared statement is that it a SQL statement that is lacking any parameters or literals and with these being replaced by placeholders. Then the statement is parsed and optimized without those values, and the placeholders are referenced to program variables. Once this is done, to execute the prepared statement, you set the program variables to the appropriate values and then execute the statement. You may change the program variables referencing the placeholder values and re-execute the SQL statement as many times as you want, without having to re-parse or optimize the statement.The different steps then are:Prepare the statement - You prepare the statement with the appropriate placeholders (which are question marks), like: SELECT foo FROM bar WHERE foobar = ?Bind the placeholder values to program variables.Bind the result columns to program variables (if there are any columns returned that is).Set the program variables that are bound to the statement placeholder literals.Execute the prepared statement.If there are columns returned, i.e. this is a SELECT or SHOW statement for example, store the result, fetch the rows and the free the result.Repeat from step 4 until done.Free the prepared statement.This doesn't seem so complicated, right? Well, there are more than one way of dealing with prepared statements in MySQL. One way is the way employed by the PREPARE and EXECUTE SQL statements. This is different than what I described above, but the basics are there. What we are to look at here though is the Prepared statement MySQL C-API. And to be clear, I look at the API from the C API level, and not from the underlying "wire" protocol, although in this case there is little difference.Working with prepared statement in MySQL, as in most other database systems, require a bit more work than playing with simple non-prepared statements. The prepare phase is done with the mysql_stmt_prepare() call, which use a MYSQL_STMT to keep track of the statement, and which has been initialized with a call to mysql_stmt_init().The next step is to bind the placeholders, and this is done with a call to mysql_stmt_bind_param(), which takes the MYSQL_STMT struct as above, as well as an array of MYSQL_BIND structs, each of them referencing one bin parameter in the statement, and each to be filled in with information to the MySQL server about where the program variable is, what type it has and a pointer to an indicator of the NULL / NOT NULL status if this variable. Note that to pass a NULL, you do not pass the string NULL, which will do just that: Use the string "NULL", instead you MUST set the is_null member indicator of the MYSQL_BIND struct.After binding parameters, you have to bind result columns, if there are any columns returned. The way this is done with the MySQL C API looks a bit awkward, but not really complicated. In essence, you call mysql_stmt_result_metadata() to get a MYSQL_RES, which is a familiar struct for any user of the classic MySQL C API, and then you use that to get any information you want on the result data from the statement.You can nod bind the result columns, and again you use an array of MYSQL_BIND structs for this. Note that you have to pass appropriate data, like space for the returned column data and for a NULL indicator. Failure to do so will cause a segmentation fault and may in extreme cases cause really bad karma.This was the complex part, now all you have to do is set the program variables referenced by the MYSQL_BIND array for the parameters, call mysql_stmt_execute() to execute the statement and then, if there are columns returned, call mysql_stmt_store_result() to get the result, then call mysql_stmt_fetch() to get each row, and for every row, you can look at the data referenced by the MYSQL_BIND struct array you passed for the column data. When you are done with the result, call mysql_stmt_free_result() and that's about it.Well, to be honest, there is a bit more to it, but this is the basics. So what are the advantages? It really should be faster, as there is a lot less parsing and optimization going on, and in many database systems, using prepared statements increases performance A LOT. Not so with MySQL, probably due to the limited amount of optimization we do, and the rather basic SQL syntax we support. But there are cases where there are distinct advantages, in particular SELECTs. One reason for this is that a SELECT is usually a more complex SQL than, say, an INSERT or a SELECT (and mostly also an UPDATE), whcih means there is more to gain from limited time in the parser and optimizer. Also, a SELECT is in the same way also more complicated to optimize. I have done some simple tests, and has seen some 50% performance gain when using prepared statements with a still simple SELECT with one join. On the other hand, I have yet to see any advantages for INSERTs, but also no performance loss either, instead, I could see just about the same performance.And are there disadvantages? Yes, I'm afraid so. For CALL statements, in/out parameters are not supported, nor is handling result sets from prepared statements. In addition, multiple statement results are not supported at all when using the prepared statement C API. Read more on the limitations in the MySQL manual.These limitations are the worst aspects of using prepared statements, and I really hope we will be able to fix them soon./Karlsson";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Tue, 06 Jul 2010 12:20:00 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:6639:"MySQL has supported prepared statements since version 5.0, but the use of the non-prepared statement API is still much more popular. Here, I'll explain what prepared statements are in short, and show some examples and why you would, and would not, use them.
The concept of a prepared statement is that it a SQL statement that is lacking any parameters or literals and with these being replaced by placeholders. Then the statement is parsed and optimized without those values, and the placeholders are referenced to program variables. Once this is done, to execute the prepared statement, you set the program variables to the appropriate values and then execute the statement. You may change the program variables referencing the placeholder values and re-execute the SQL statement as many times as you want, without having to re-parse or optimize the statement.
The different steps then are:
- Prepare the statement - You prepare the statement with the appropriate placeholders (which are question marks), like: SELECT foo FROM bar WHERE foobar = ?
- Bind the placeholder values to program variables.
- Bind the result columns to program variables (if there are any columns returned that is).
- Set the program variables that are bound to the statement placeholder literals.
- Execute the prepared statement.
- If there are columns returned, i.e. this is a SELECT or SHOW statement for example, store the result, fetch the rows and the free the result.
- Repeat from step 4 until done.
- Free the prepared statement.
This doesn't seem so complicated, right? Well, there are more than one way of dealing with prepared statements in MySQL. One way is the way employed by the PREPARE and EXECUTE SQL statements. This is different than what I described above, but the basics are there. What we are to look at here though is the Prepared statement MySQL C-API. And to be clear, I look at the API from the C API level, and not from the underlying "wire" protocol, although in this case there is little difference.
Working with prepared statement in MySQL, as in most other database systems, require a bit more work than playing with simple non-prepared statements. The prepare phase is done with the mysql_stmt_prepare() call, which use a MYSQL_STMT to keep track of the statement, and which has been initialized with a call to mysql_stmt_init().
The next step is to bind the placeholders, and this is done with a call to mysql_stmt_bind_param(), which takes the MYSQL_STMT struct as above, as well as an array of MYSQL_BIND structs, each of them referencing one bin parameter in the statement, and each to be filled in with information to the MySQL server about where the program variable is, what type it has and a pointer to an indicator of the NULL / NOT NULL status if this variable. Note that to pass a NULL, you do not pass the string NULL, which will do just that: Use the string "NULL", instead you MUST set the is_null member indicator of the MYSQL_BIND struct.
After binding parameters, you have to bind result columns, if there are any columns returned. The way this is done with the MySQL C API looks a bit awkward, but not really complicated. In essence, you call mysql_stmt_result_metadata() to get a MYSQL_RES, which is a familiar struct for any user of the classic MySQL C API, and then you use that to get any information you want on the result data from the statement.
You can nod bind the result columns, and again you use an array of MYSQL_BIND structs for this. Note that you have to pass appropriate data, like space for the returned column data and for a NULL indicator. Failure to do so will cause a segmentation fault and may in extreme cases cause really bad karma.
This was the complex part, now all you have to do is set the program variables referenced by the MYSQL_BIND array for the parameters, call mysql_stmt_execute() to execute the statement and then, if there are columns returned, call mysql_stmt_store_result() to get the result, then call mysql_stmt_fetch() to get each row, and for every row, you can look at the data referenced by the MYSQL_BIND struct array you passed for the column data. When you are done with the result, call mysql_stmt_free_result() and that's about it.
Well, to be honest, there is a bit more to it, but this is the basics. So what are the advantages? It really should be faster, as there is a lot less parsing and optimization going on, and in many database systems, using prepared statements increases performance A LOT. Not so with MySQL, probably due to the limited amount of optimization we do, and the rather basic SQL syntax we support. But there are cases where there are distinct advantages, in particular SELECTs. One reason for this is that a SELECT is usually a more complex SQL than, say, an INSERT or a SELECT (and mostly also an UPDATE), whcih means there is more to gain from limited time in the parser and optimizer. Also, a SELECT is in the same way also more complicated to optimize. I have done some simple tests, and has seen some 50% performance gain when using prepared statements with a still simple SELECT with one join. On the other hand, I have yet to see any advantages for INSERTs, but also no performance loss either, instead, I could see just about the same performance.
And are there disadvantages? Yes, I'm afraid so. For CALL statements, in/out parameters are not supported, nor is handling result sets from prepared statements. In addition, multiple statement results are not supported at all when using the prepared statement C API. Read more on the limitations in the MySQL manual.
These limitations are the worst aspects of using prepared statements, and I really hope we will be able to fix them soon.
/Karlsson
PlanetMySQL Voting:
Vote UP /
Vote DOWN";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:15:"Anders Karlsson";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:29;a:6:{s:4:"data";s:63:"
";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:58:"Open source or Open Core or Commercial... Does it matter??";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:70:"tag:blogger.com,1999:blog-9144505959002328789.post-2147849925358915781";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:90:"http://karlssonondatabases.blogspot.com/2010/07/oen-source-or-open-core-or-commercial.html";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:7090:"This is my 2 cents in the Open Source vs. Open Code vs. Commercial debate. And it's a long one...Maybe some of you reading this are offended already, but bear with me, I'll get there. The way I see the Open Source model, having worked with OSS at MySQL for 6+ years now, is that this is a great way of developing software. Not brilliant, but great, but I'll get there also.Users of OSS, in my mind, are OSS users for one or more of three reasons:It's Open - The users using OSS for this reason believes that being open is in and of itself a great thing, enough so to use OSS even when non-OSS is less expensive and/or better.Cost - OSS is typically less expensive than non-OSS, and this is the reason these users get here. There are then 2 subgroups here, one that represents users that just aren't funded at all, many websites are in this category, the users building Joomla and Drupal sites and the like, I think you get the point. The second group are those that have funding, but would rather spend their money on luxury items and a new car than of a software license.Technology - This is a category that many think they are in, but I don't think this is mostly not the case. These are the users on a unique piece of software that is either not existing as non-OSS, or where the OSS variations are so much more powerful than the commercial counterparts. In all honesty, although I am aware these cases exist, I do not think that that there are THAT many. But there are those there Cost + Technology plays in, i.e. even though a commercial option exists, it is just too expensive.OK, so now we know (what I think) are the reasons that Open Source exists, is in wide use and is growing. For the first group, the ones that see Openness as a good enough reason in and of itself, I think this is a smaller group of the total number of users. But that openness is not really, in my mind, well defined.If Oracle would take the sourcecode for the Oracle database and release it under GPL, then it would be Open in most peoples mind I guess. But that piece of code is massive, and few people outside the Oracle developers would have the time, resources and knowledge to understand, extend and modify it, so what how Open is it really then? I think to an extent MySQL is case in point here, although it is GPL licenced and the sourcecode is open and free, there are few outside contributors, as compared to the large number of users. I think most users building a website using Drupal cares much about MySQL being open or closed or whatever. I think most of them care about the cost being low. And one sure could argue that low cost comes from the source being open, that is probably true to a large extent, but that doesn't mean that commercial software or non-OSS also can be low cost (shareware for example).What this boils down to, in my mind then, is that although we all enjoy the low cost of OSS, less care about it really being open and if so how, and more about it being inexpensive. And I say that as someone who doesn't actually mind reading sourcecode, and this is something I do on a regular basis, read and sometimes tinker with the MySQL source. But I really do not think that I am typical here.And all this is not to say that there is something wrong with OSS, quite the opposite, but often it is more about cost than actual openness. And this is worrying, but there are exceptions. Linux is one such example, although the kernel is since long ago developed by a rather small closely knit community, utilities and programs surrounding and extending the kernel, such as modules, the GNU packages and that stuff, are developed separately from this group, by individuals or groups of them with specific needs or knowledge. The key here is the open interfaces. You don't have to understand every aspect of the Linux kernel to develop a well working and efficient utility or even kernel module.But I do not think that even Linux is developed enough in this area as I would like it to see. To me, who really believe that Open Source Software is a good thing and an excellent model for development, I would like to see an even more "contributor friendly" architecture. I think Unix got a long way here in it's early days, with the principles of simple and easy to use APIs (like pipes) and programs could do one hing and do it well. But those days are gone now, that was 30 - 40 years ago or so, and we need to develop things, and I haven't seen that happening. FSF and GPL and all that defines to extent the framework for distributed software in terms of legalities and many other aspects, but there is little help in how to make the software that can now in theory be read by anyone truly open. If we assume that Oracle made their sourcecode GPL, but did not provide any documentation on how the sourcecode works (which is not a requirement of GPL) and removed all the sourcecode comments (which is not a requirement either), how open would that be, really? I do not think it would help much in terms of openness, to be honest. Sure, it would be open for someone who wanted to hickjack some intricate part of the Oracle sourcecode, but that would need a large investment in investigating the code, so this would probably only be reasonable for a some other large corporate entity. But the code would really be open for the rest of us.Instead of discussing Open Source vs. Open Code vs. Commercial, I think it would be much more interesting to discuss how we develop software that truly is contributor friendly. Code that is easy to add to, code that lives in an environment where changes and additions can easily be made, reviewed and tested. Code that allows itself to be built by anyone, anywhere so that I can test my code on a 16 CPU x86 box somewhere in australia, provided by a nice person I don't even know, although I am located in Sweden. Code that is required to have proper commenting, proper structured APIs and natural points for injecting new and changed code. And above all, code that lets someone with excellent domain knowledge (in for example indexing algorithms, GIS, text search, APIs, disk management etc., if we talk about databases) to write code and test, without being a database expert or even knowing the inner details of the system he/she writes code in, and not being brilliant developers themselves.Is this a dream? Maybe, Is Drizzle the answer (I know someone will suggest that), and I say no, it's just not enough, it's just more of the same (plugins), it doesn't really provide anything new in how we develop things or how those developments are published and distributed.In short, I think the Open Source vs. Open Code debate is just nitpicking and boring. Neither model just isn't good enough to be truly friendly and open for contribution. The difference lies more in how and with what we we can commercialize our efforts, which is a valid concern, but my main concern, as you can see, is that I believe that neither model is truly open. And I would rather see a truly contributor friendly Open Code model than the current state of affairs./Karlsson";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Tue, 06 Jul 2010 10:19:00 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:5:{i:0;a:5:{s:4:"data";s:10:"commercial";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:1;a:5:{s:4:"data";s:5:"mysql";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:2;a:5:{s:4:"data";s:4:"core";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:3;a:5:{s:4:"data";s:4:"open";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:4;a:5:{s:4:"data";s:6:"source";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:7716:"This is my 2 cents in the Open Source vs. Open Code vs. Commercial debate. And it's a long one...
Maybe some of you reading this are offended already, but bear with me, I'll get there. The way I see the Open Source model, having worked with OSS at MySQL for 6+ years now, is that this is a great way of developing software. Not brilliant, but great, but I'll get there also.
Users of OSS, in my mind, are OSS users for one or more of three reasons:
- It's Open - The users using OSS for this reason believes that being open is in and of itself a great thing, enough so to use OSS even when non-OSS is less expensive and/or better.
- Cost - OSS is typically less expensive than non-OSS, and this is the reason these users get here. There are then 2 subgroups here, one that represents users that just aren't funded at all, many websites are in this category, the users building Joomla and Drupal sites and the like, I think you get the point. The second group are those that have funding, but would rather spend their money on luxury items and a new car than of a software license.
- Technology - This is a category that many think they are in, but I don't think this is mostly not the case. These are the users on a unique piece of software that is either not existing as non-OSS, or where the OSS variations are so much more powerful than the commercial counterparts. In all honesty, although I am aware these cases exist, I do not think that that there are THAT many. But there are those there Cost + Technology plays in, i.e. even though a commercial option exists, it is just too expensive.
OK, so now we know (what I think) are the reasons that Open Source exists, is in wide use and is growing. For the first group, the ones that see Openness as a good enough reason in and of itself, I think this is a smaller group of the total number of users. But that openness is not really, in my mind, well defined.
If Oracle would take the sourcecode for the Oracle database and release it under GPL, then it would be Open in most peoples mind I guess. But that piece of code is massive, and few people outside the Oracle developers would have the time, resources and knowledge to understand, extend and modify it, so what how Open is it really then? I think to an extent MySQL is case in point here, although it is GPL licenced and the sourcecode is open and free, there are few outside contributors, as compared to the large number of users. I think most users building a website using Drupal cares much about MySQL being open or closed or whatever. I think most of them care about the cost being low. And one sure could argue that low cost comes from the source being open, that is probably true to a large extent, but that doesn't mean that commercial software or non-OSS also can be low cost (shareware for example).
What this boils down to, in my mind then, is that although we all enjoy the low cost of OSS, less care about it really being open and if so how, and more about it being inexpensive. And I say that as someone who doesn't actually mind reading sourcecode, and this is something I do on a regular basis, read and sometimes tinker with the MySQL source. But I really do not think that I am typical here.
And all this is not to say that there is something wrong with OSS, quite the opposite, but often it is more about cost than actual openness. And this is worrying, but there are exceptions. Linux is one such example, although the kernel is since long ago developed by a rather small closely knit community, utilities and programs surrounding and extending the kernel, such as modules, the GNU packages and that stuff, are developed separately from this group, by individuals or groups of them with specific needs or knowledge. The key here is the open interfaces. You don't have to understand every aspect of the Linux kernel to develop a well working and efficient utility or even kernel module.
But I do not think that even Linux is developed enough in this area as I would like it to see. To me, who really believe that Open Source Software is a good thing and an excellent model for development, I would like to see an even more "contributor friendly" architecture. I think Unix got a long way here in it's early days, with the principles of simple and easy to use APIs (like pipes) and programs could do one hing and do it well. But those days are gone now, that was 30 - 40 years ago or so, and we need to develop things, and I haven't seen that happening. FSF and GPL and all that defines to extent the framework for distributed software in terms of legalities and many other aspects, but there is little help in how to make the software that can now in theory be read by anyone truly open. If we assume that Oracle made their sourcecode GPL, but did not provide any documentation on how the sourcecode works (which is not a requirement of GPL) and removed all the sourcecode comments (which is not a requirement either), how open would that be, really? I do not think it would help much in terms of openness, to be honest. Sure, it would be open for someone who wanted to hickjack some intricate part of the Oracle sourcecode, but that would need a large investment in investigating the code, so this would probably only be reasonable for a some other large corporate entity. But the code would really be open for the rest of us.
Instead of discussing Open Source vs. Open Code vs. Commercial, I think it would be much more interesting to discuss how we develop software that truly is contributor friendly. Code that is easy to add to, code that lives in an environment where changes and additions can easily be made, reviewed and tested. Code that allows itself to be built by anyone, anywhere so that I can test my code on a 16 CPU x86 box somewhere in australia, provided by a nice person I don't even know, although I am located in Sweden. Code that is required to have proper commenting, proper structured APIs and natural points for injecting new and changed code. And above all, code that lets someone with excellent domain knowledge (in for example indexing algorithms, GIS, text search, APIs, disk management etc., if we talk about databases) to write code and test, without being a database expert or even knowing the inner details of the system he/she writes code in, and not being brilliant developers themselves.
Is this a dream? Maybe, Is Drizzle the answer (I know someone will suggest that), and I say no, it's just not enough, it's just more of the same (plugins), it doesn't really provide anything new in how we develop things or how those developments are published and distributed.
In short, I think the Open Source vs. Open Code debate is just nitpicking and boring. Neither model just isn't good enough to be truly friendly and open for contribution. The difference lies more in how and with what we we can commercialize our efforts, which is a valid concern, but my main concern, as you can see, is that I believe that neither model is truly open. And I would rather see a truly contributor friendly Open Code model than the current state of affairs.
/Karlsson
PlanetMySQL Voting:
Vote UP /
Vote DOWN";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:15:"Anders Karlsson";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:30;a:6:{s:4:"data";s:63:"
";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:32:"New webinar recordings available";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:34:"http://fghaas.wordpress.com/?p=538";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:72:"http://fghaas.wordpress.com/2010/07/06/new-webinar-recordings-available/";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:463:"A couple of new webinar recordings are available from our web site:
Upgrading to Pacemaker on Debian squeeze tells you all there is to know about upgrading to the Pacemaker cluster manager on the upcoming Debian release, “squeeze”.
High Availability for Oracle databases shows you how to leverage the Linux cluster stack to deploy highly available Oracle databases.
Like our live webinars, the recordings are of course free of charge. Enjoy!
";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Tue, 06 Jul 2010 06:29:33 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:5:{i:0;a:5:{s:4:"data";s:6:"Debian";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:1;a:5:{s:4:"data";s:8:"Linux-HA";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:2;a:5:{s:4:"data";s:6:"Oracle";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:3;a:5:{s:4:"data";s:9:"Pacemaker";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:4;a:5:{s:4:"data";s:7:"Webinar";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:2121:"A couple of new webinar recordings are available from our web site:
Like our live webinars, the recordings are of course free of charge. Enjoy!

PlanetMySQL Voting:
Vote UP /
Vote DOWN";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:15:"Florian G. Haas";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:31;a:6:{s:4:"data";s:58:"
";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:51:"mk-query-digest, query comments and the query cache";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:43:"http://www.mysqlperformanceblog.com/?p=3283";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:98:"http://www.mysqlperformanceblog.com/2010/07/05/mk-query-digest-query-comments-and-the-query-cache/";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:4938:"I very much like the fact that MySQL allows you to embed comments into SQL statements. These comments are extremely convenient, because they are written into MySQL log files as part of the query. This includes the general log, the binary log and the slow query log. Maatkit includes tools which interact with these logs, including mk-query-digest. This tool, in particular, has a very nice option called --embedded-attributes which can process data embedded in query comments.
The support for embedded attributes makes some cool tricks possible. Peter and I co-presented a talk at this past MySQL Conference and Expo. In this talk I presented my Instrumentation-for-PHP class as a demonstration instrumentation application for the LAMP stack. Instrumentation-for-php collects information about the PHP process such as wall time, cpu time, mysql query times, etc, and automatically places this information into the Apache environment. It also includes support for "augmenting" SQL queries with additional information such as the PHP source line and user session information by embedding this information in a specially formatted SQL comment. This information can then be cross-referenced between MySQL logs and Apache logs.
PLAIN TEXT
SQL:
SELECT 1
-- File: myfile.php Line: 25 Function: myFunc SessionID: a1b2c3d4e5f6a7b8c9d0aef UserID: swany78
Notice that the query ends with a comment, and that this comment includes important information, such as the PHP session ID, and the application user name. This allows mk-query digest to discover these attributes and use them for aggregation. You could also filter on the embedded information to display only queries from a particular file, particular session or particular user for example.
The --embedded-attributes option takes two parts. The first part is a Perl regular expression which identifies which comment will be treated as the one with attributes. The second part is another expression used to split the comment into attribute key/value pairs.
For example:
--embedded-attributes '^-- [^\n]+','(\w+): ([^ ]+)'
The first part: '^-- [^\n]+' matches everything starting with -- and ending with newline.
The second part includes two expressions, each enclosed by parenthesis so that they will be captured. The first matching expression:(\w+): matches one or more words followed by colon and a space. The second match expression:([^ ]+)captures everything up to the first space.
A problem with this approach is that MySQL treats identical queries with different comments as different queries. This essentially renders the query cache inoperative when this approach is taken. A cache that can't return any results is worse than not having a cache at all. The cache won't return any result if each query has some sort of unique identifier in it. Evicting queries will become very frequent when every query is unique and the query cache will use additional resources for no effective benefit.
Luckily the problem is mostly* fixed in Percona Server 11.
I wrote a simple test script to demonstrate the difference. The test loops twice over a set of query patterns. Each pattern is executed 1000 times in each iteration of the loop. The queries differ only by comments. In the best case scenario where comments are stripped we would expect 1000 misses and then all hits from this test. You can enable query cache stripping in Percona Server at runtime:
PLAIN TEXT
SQL:
mysql> SET global query_cache_strip_comments='ON';
Query OK, 0 rows affected (0.00 sec)
Query
Hits
Misses
Hits
Misses
Stripping
No Stripping
no comment
1000
1000
1000
1000
random at start
2000
0
0
2000
random at middle
2000
0
0
2000
static at start
2000
0
1000
1000
As you can see, the first loop of the queries (that is, those with no comment at all) get 1000 misses. Everything beyond that is a hit when stripping is enabled. Contrast this with the results where stripping is not enabled. There are many more misses due to the random values in the query comments. Also notice the query with static comments. There are 1000 hits and 1000 misses because MySQL treats identical queries with identical comments as the same. When comments are stripped every one of these queries is answered by the cache.
If you are benefiting from the query cache, but would like better instrumentation in your queries, consider using Percona Server and turning on the --strip-query-cache-comments feature.
* Percona Server 11.1 still has a problem with # and -- comments at the start of a query(before SELECT), and in some cases, with varying white-space. These issues will be rectified in the next release. I've updated instrumentation-for-php to place the comment at the end of the query to avoid these problems.
Entry posted by Justin Swanhart |
2 comments
Add to: | | | | ";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Tue, 06 Jul 2010 02:31:47 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:4:{i:0;a:5:{s:4:"data";s:5:"mysql";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:1;a:5:{s:4:"data";s:8:"comments";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:2;a:5:{s:4:"data";s:14:"percona server";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:3;a:5:{s:4:"data";s:11:"query cache";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:8279:"I very much like the fact that MySQL allows you to embed comments into SQL statements. These comments are extremely convenient, because they are written into MySQL log files as part of the query. This includes the general log, the binary log and the slow query log. Maatkit includes tools which interact with these logs, including mk-query-digest. This tool, in particular, has a very nice option called --embedded-attributes which can process data embedded in query comments.
The support for embedded attributes makes some cool tricks possible. Peter and I co-presented a talk at this past MySQL Conference and Expo. In this talk I presented my Instrumentation-for-PHP class as a demonstration instrumentation application for the LAMP stack. Instrumentation-for-php collects information about the PHP process such as wall time, cpu time, mysql query times, etc, and automatically places this information into the Apache environment. It also includes support for "augmenting" SQL queries with additional information such as the PHP source line and user session information by embedding this information in a specially formatted SQL comment. This information can then be cross-referenced between MySQL logs and Apache logs.
SQL:
-
SELECT 1
-
-- File: myfile.php Line: 25 Function: myFunc SessionID: a1b2c3d4e5f6a7b8c9d0aef UserID: swany78
Notice that the query ends with a comment, and that this comment includes important information, such as the PHP session ID, and the application user name. This allows mk-query digest to discover these attributes and use them for aggregation. You could also filter on the embedded information to display only queries from a particular file, particular session or particular user for example.
The --embedded-attributes option takes two parts. The first part is a Perl regular expression which identifies which comment will be treated as the one with attributes. The second part is another expression used to split the comment into attribute key/value pairs.
For example:
--embedded-attributes '^-- [^\n]+','(\w+): ([^ ]+)'
The first part: '^-- [^\n]+' matches everything starting with -- and ending with newline.
The second part includes two expressions, each enclosed by parenthesis so that they will be captured. The first matching expression:(\w+): matches one or more words followed by colon and a space. The second match expression:([^ ]+)captures everything up to the first space.
A problem with this approach is that MySQL treats identical queries with different comments as different queries. This essentially renders the query cache inoperative when this approach is taken. A cache that can't return any results is worse than not having a cache at all. The cache won't return any result if each query has some sort of unique identifier in it. Evicting queries will become very frequent when every query is unique and the query cache will use additional resources for no effective benefit.
Luckily the problem is mostly* fixed in Percona Server 11.
I wrote a simple test script to demonstrate the difference. The test loops twice over a set of query patterns. Each pattern is executed 1000 times in each iteration of the loop. The queries differ only by comments. In the best case scenario where comments are stripped we would expect 1000 misses and then all hits from this test. You can enable query cache stripping in Percona Server at runtime:
SQL:
-
mysql> SET global query_cache_strip_comments='ON';
-
Query OK, 0 rows affected (0.00 sec)
| Query
| Hits
| Misses
| Hits
| Misses |
|
| Stripping |
No Stripping |
| no comment
| 1000
| 1000
| 1000
| 1000 |
| random at start
| 2000
| 0
| 0
| 2000 |
| random at middle
| 2000
| 0
| 0
| 2000 |
| static at start
| 2000
| 0
| 1000
| 1000 |
As you can see, the first loop of the queries (that is, those with no comment at all) get 1000 misses. Everything beyond that is a hit when stripping is enabled. Contrast this with the results where stripping is not enabled. There are many more misses due to the random values in the query comments. Also notice the query with static comments. There are 1000 hits and 1000 misses because MySQL treats identical queries with identical comments as the same. When comments are stripped every one of these queries is answered by the cache.
If you are benefiting from the query cache, but would like better instrumentation in your queries, consider using Percona Server and turning on the --strip-query-cache-comments feature.
* Percona Server 11.1 still has a problem with # and -- comments at the start of a query(before SELECT), and in some cases, with varying white-space. These issues will be rectified in the next release. I've updated instrumentation-for-php to place the comment at the end of the query to avoid these problems.
Entry posted by Justin Swanhart |
2 comments
Add to:
|
|
|
| 
PlanetMySQL Voting:
Vote UP /
Vote DOWN";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:15:"Justin Swanhart";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:32;a:6:{s:4:"data";s:33:"
";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:2:{s:0:"";a:5:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:24:"InnoDB fuzzy checkpoints";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:53:"http://www.facebook.com/note.php?note_id=408059000932";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:53:"http://www.facebook.com/note.php?note_id=408059000932";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:3285:"InnoDB uses fuzzy checkpoints. If you search, you can find some details on this. I don't think the behavior of it is widely understood. There are few details in SHOW INNODB STATUS unless you use the Facebook patch and it is easy to miss this problem.
InnoDB uses a background thread to request and perform writes for dirty pages when there are too many dirty pages. This is done by background threads to avoid delaying foreground threads and to use IO capacity when a server is otherwise idle. I draw a distinction between request and perform. Background IO threads perform the IO operation in Linux by calling write or pwrite. By request I mean the work done to find the dirty pages to be written back to disk.
InnoDB must also write dirty pages back to disk to enforce the fuzzy checkpoint constraint. The open InnoDB transaction log files have a minimum and maximum LSN which I will call minOpenLogLSN and maxOpenLogLSN. The constraint is that no dirty page can have a change LSN less than minOpenLogLSN. Were InnoDB to reach that state then all INSERT, UPDATE and DELETE statements would block until pages were flushed. This can briefly hurt response time. Therefore, InnoDB implements two limits to flush dirty pages before this constraint is reached: async limit, sync limit. Both limits are checked by user threads before they make a page dirty.
Unlike enforcement of the max dirty page limit, this is not enforced by the main background thread and I think it should be. When the either limit is reached, the user thread finds enough dirty pages to be written and then blocks until the writes are done.
It turns out that async limit isn't really async. The user thread blocks for the IO to complete whether the sync or async limits have been reached. This can delay your user threads. The delay can be annoying for a server that has many concurrent transactions. The delay can be serious for a replication slave as the replication SQL thread will experience most of the latency. This can cause a lot of replication lag. I opened feature request 55004 for this and attached a patch. I think this can be fixed by adding a background thread to enforce the async limit so that user threads would only get delayed by the sync limit.
If you want to understand the code, then read feature request 55004. The basic issue is that while write IO requests are usually done by background IO threads (or real async IO in the latest versions of the InnoDB plugin) any thread that requests dirty page writes will block for the writes to complete. I think the intent of the code was to not block as when the async limit is reached there is a call to log_preflush_pool_modified_pages with sync=FALSE while the function is called with sync=TRUE when the sync limit is reached. And log_preflush_pool_modified_pages doesn't call buf_flush_wait_batch_end when sync=FALSE.
Alas, buf_flush_batch always calls buf_flush_buffered_writes and buf_flush_buffered_writes does:
one large write for the doublewrite buffer
fsync to force the doublewrite buffer write to disk
write database pages in place from the doublewrite buffer
force database page writes to disk
The async and sync limits mentioned above have the relationship:
minOpenLogLSN < asyncLimitLSN < syncLimitLSN < maxOpenLogLSN
";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Mon, 05 Jul 2010 16:38:01 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:4029:"InnoDB uses fuzzy checkpoints. If you search, you can find some details on this. I don't think the behavior of it is widely understood. There are few details in SHOW INNODB STATUS unless you use the Facebook patch and it is easy to miss this problem.
InnoDB uses a background thread to request and perform writes for dirty pages when there are too many dirty pages. This is done by background threads to avoid delaying foreground threads and to use IO capacity when a server is otherwise idle. I draw a distinction between request and perform. Background IO threads perform the IO operation in Linux by calling write or pwrite. By request I mean the work done to find the dirty pages to be written back to disk.
InnoDB must also write dirty pages back to disk to enforce the fuzzy checkpoint constraint. The open InnoDB transaction log files have a minimum and maximum LSN which I will call minOpenLogLSN and maxOpenLogLSN. The constraint is that no dirty page can have a change LSN less than minOpenLogLSN. Were InnoDB to reach that state then all INSERT, UPDATE and DELETE statements would block until pages were flushed. This can briefly hurt response time. Therefore, InnoDB implements two limits to flush dirty pages before this constraint is reached: async limit, sync limit. Both limits are checked by user threads before they make a page dirty.
Unlike enforcement of the max dirty page limit, this is not enforced by the main background thread and I think it should be. When the either limit is reached, the user thread finds enough dirty pages to be written and then blocks until the writes are done.
It turns out that async limit isn't really async. The user thread blocks for the IO to complete whether the sync or async limits have been reached. This can delay your user threads. The delay can be annoying for a server that has many concurrent transactions. The delay can be serious for a replication slave as the replication SQL thread will experience most of the latency. This can cause a lot of replication lag. I opened feature request 55004 for this and attached a patch. I think this can be fixed by adding a background thread to enforce the async limit so that user threads would only get delayed by the sync limit.
If you want to understand the code, then read feature request 55004. The basic issue is that while write IO requests are usually done by background IO threads (or real async IO in the latest versions of the InnoDB plugin) any thread that requests dirty page writes will block for the writes to complete. I think the intent of the code was to not block as when the async limit is reached there is a call to log_preflush_pool_modified_pages with sync=FALSE while the function is called with sync=TRUE when the sync limit is reached. And log_preflush_pool_modified_pages doesn't call buf_flush_wait_batch_end when sync=FALSE.
Alas, buf_flush_batch always calls buf_flush_buffered_writes and buf_flush_buffered_writes does:
- one large write for the doublewrite buffer
- fsync to force the doublewrite buffer write to disk
- write database pages in place from the doublewrite buffer
- force database page writes to disk
The async and sync limits mentioned above have the relationship:
minOpenLogLSN < asyncLimitLSN < syncLimitLSN < maxOpenLogLSN
PlanetMySQL Voting:
Vote UP /
Vote DOWN";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:33;a:6:{s:4:"data";s:43:"
";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:45:"On “open core” versus Open Source";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:27:"http://jcole.us/blog/?p=405";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:73:"http://jcole.us/blog/archives/2010/07/05/on-open-core-versus-open-source/";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:2051:"It seems like everyone in the MySQL community has been chiming in on the so-called “open core” versus Open Source debate. It seems like at least one potential opinion has been unsaid so far, and it’s the one I share: It doesn’t matter if you’re closed source, “open core”, or Open Source, as long as you are completely honest with your customers and/or your users about which of those you are, and that you communicate any changes, particularly in a more closed direction.
I think this is the problem that MySQL (regardless of owners) have suffered in the past — they faced pressure to make money (which is fine), and they decided to used closed source approaches to do so (which is fine), and they picked some features, which they’d promised their users for a long time, to do it with (which is fine, although arguably not that nice), and they failed to communicate it well to either users or customers (which is tragic). The end result is, as could be expected, a bad taste in both users’ and customers’ (and in many cases, even employees’) mouths. In addition to this failure to communicate, they’ve also fallen over pretty terribly when others attempt to do the communication for them, as demonstrated when they made the decision (as far as I know, overturned) two years ago to offer some features only on the Enterprise side of the Community-Enterprise split. If you’re being completely honest and open with your communications to your customers and users, a post like mine should spark a “so what, we already heard that” instead of a huge storm of debate.
From my perspective, MySQL has never been “true” Open Source, as the documentation (which I helped to write a large part of as an early employee of MySQL) is not free, and never has been.
All of this is just fine, but don’t be surprised when the public is critical of you, and isn’t clamoring for the mountain tops to sing your “open source” praises.";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Mon, 05 Jul 2010 16:19:06 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:1:{i:0;a:5:{s:4:"data";s:5:"MySQL";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:2529:"It seems like everyone in the MySQL community has been chiming in on the so-called “open core” versus Open Source debate. It seems like at least one potential opinion has been unsaid so far, and it’s the one I share: It doesn’t matter if you’re closed source, “open core”, or Open Source, as long as you are completely honest with your customers and/or your users about which of those you are, and that you communicate any changes, particularly in a more closed direction.
I think this is the problem that MySQL (regardless of owners) have suffered in the past — they faced pressure to make money (which is fine), and they decided to used closed source approaches to do so (which is fine), and they picked some features, which they’d promised their users for a long time, to do it with (which is fine, although arguably not that nice), and they failed to communicate it well to either users or customers (which is tragic). The end result is, as could be expected, a bad taste in both users’ and customers’ (and in many cases, even employees’) mouths. In addition to this failure to communicate, they’ve also fallen over pretty terribly when others attempt to do the communication for them, as demonstrated when they made the decision (as far as I know, overturned) two years ago to offer some features only on the Enterprise side of the Community-Enterprise split. If you’re being completely honest and open with your communications to your customers and users, a post like mine should spark a “so what, we already heard that” instead of a huge storm of debate.
From my perspective, MySQL has never been “true” Open Source, as the documentation (which I helped to write a large part of as an early employee of MySQL) is not free, and never has been.
All of this is just fine, but don’t be surprised when the public is critical of you, and isn’t clamoring for the mountain tops to sing your “open source” praises.
PlanetMySQL Voting:
Vote UP /
Vote DOWN";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:11:"Jeremy Cole";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:34;a:6:{s:4:"data";s:53:"
";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:55:"Do you really want autoReconnect to silently reconnect?";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:38:"http://mysqlblog.fivefarmers.com/?p=19";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:99:"http://mysqlblog.fivefarmers.com/2010/07/05/do-you-really-want-autoreconnect-to-silently-reconnect/";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:3716:"Chances are, if you write Java applications using MySQL’s Connector/J driver, you’ve run across the autoReconnect property. I remember that when I first found it, it seemed I had found the grail itself. “No more nasty connection closed error messages,” I thought. Except … it doesn’t really work that way, does it? I’ve seen this question asked many times in many different contexts: “Why doesn’t Connector/J just reconnect to MySQL and re-issue my statement, instead of throwing this Exception?”
There are actually a number of reasons, starting with loss of transactional integrity. The MySQL Manual states that “there is no safe method of reconnecting to the MySQL server without risking some corruption of the connection state or database state information.” Imagine the following series of statements:
conn.createStatement().execute(
"UPDATE checking_account SET balance = balance - 1000.00 WHERE customer='Todd'");
conn.createStatement().execute(
"UPDATE savings_account SET balance = balance + 1000.00 WHERE customer='Todd'");
conn.commit();
Now, what happens if the connection to the server dies after the UPDATE to checking_account? If no Exception is thrown, and the application never learns about the problem, it keeps going. Except that the server never committed the transaction, so that gets rolled back. But then you start a new transaction by increasing the savings_account balance by 5. Your application never got an Exception, so it kept plodding through, eventually commiting. But the commit only applies to the changes made in the new connection, which means you’ve just increased the savings_account balance by 5 without the corresponding reduction to checking_account. Instead of transferring $1000.00, you just gave me $1000.00. Thanks!
“So?” you say. “I run with auto-commit enabled. That won’t cause any problems like that.”
Actually, it can be worse. When Connector/J encounters a communication problem, there’s no way to know whether the server processed the currently-executing statement or not. The following theoretical states are equally possible:
The server never received the statement, and therefore nothing happened on the server.
The server received the statement, executed it in full, but the response never got to the client.
If you are running with auto-commit enabled, you simply cannot guarantee the state of data on the server when a communication exception is encountered. The statement may have reached the server; it may have not. All you know is that communication died at some point, before the client received confirmation (or data) from the server. This doesn’t just affect auto-commit statements, though – imagine if the communication problem pops up during Connection.commit(). Did it commit on the server before communication died? Or did the server never receive the COMMIT request? Ugh.
There’s also transaction-scoped contextual data to be concerned about. For example:
Temporary tables
User-defined variables
Server-side prepared statements
These things all die when connections die, and if your application uses any of them, any number of ugly things could happen- some silently – if a connection is re-established, but the application plods on unknowingly.
The bottom line is that communication errors generate conditions which may well be unsafe for the driver to silently reconnect and retry, and the application should be notified. As an application developer, how you handle that information is up to you, but you should be glad that Connector/J notifies you.";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Mon, 05 Jul 2010 15:00:54 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:3:{i:0;a:5:{s:4:"data";s:4:"Java";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:1;a:5:{s:4:"data";s:5:"MySQL";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:2;a:5:{s:4:"data";s:11:"Connector/J";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:4556:"Chances are, if you write Java applications using MySQL’s Connector/J driver, you’ve run across the autoReconnect property. I remember that when I first found it, it seemed I had found the grail itself. “No more nasty connection closed error messages,” I thought. Except … it doesn’t really work that way, does it? I’ve seen this question asked many times in many different contexts: “Why doesn’t Connector/J just reconnect to MySQL and re-issue my statement, instead of throwing this Exception?”
There are actually a number of reasons, starting with loss of transactional integrity. The MySQL Manual states that “there is no safe method of reconnecting to the MySQL server without risking some corruption of the connection state or database state information.” Imagine the following series of statements:
conn.createStatement().execute(
"UPDATE checking_account SET balance = balance - 1000.00 WHERE customer='Todd'");
conn.createStatement().execute(
"UPDATE savings_account SET balance = balance + 1000.00 WHERE customer='Todd'");
conn.commit();
Now, what happens if the connection to the server dies after the UPDATE to checking_account? If no Exception is thrown, and the application never learns about the problem, it keeps going. Except that the server never committed the transaction, so that gets rolled back. But then you start a new transaction by increasing the savings_account balance by 5. Your application never got an Exception, so it kept plodding through, eventually commiting. But the commit only applies to the changes made in the new connection, which means you’ve just increased the savings_account balance by 5 without the corresponding reduction to checking_account. Instead of transferring $1000.00, you just gave me $1000.00. Thanks!
“So?” you say. “I run with auto-commit enabled. That won’t cause any problems like that.”
Actually, it can be worse. When Connector/J encounters a communication problem, there’s no way to know whether the server processed the currently-executing statement or not. The following theoretical states are equally possible:
- The server never received the statement, and therefore nothing happened on the server.
- The server received the statement, executed it in full, but the response never got to the client.
If you are running with auto-commit enabled, you simply cannot guarantee the state of data on the server when a communication exception is encountered. The statement may have reached the server; it may have not. All you know is that communication died at some point, before the client received confirmation (or data) from the server. This doesn’t just affect auto-commit statements, though – imagine if the communication problem pops up during Connection.commit(). Did it commit on the server before communication died? Or did the server never receive the COMMIT request? Ugh.
There’s also transaction-scoped contextual data to be concerned about. For example:
- Temporary tables
- User-defined variables
- Server-side prepared statements
These things all die when connections die, and if your application uses any of them, any number of ugly things could happen- some silently – if a connection is re-established, but the application plods on unknowingly.
The bottom line is that communication errors generate conditions which may well be unsafe for the driver to silently reconnect and retry, and the application should be notified. As an application developer, how you handle that information is up to you, but you should be glad that Connector/J notifies you.
PlanetMySQL Voting:
Vote UP /
Vote DOWN";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:11:"Todd Farmer";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:35;a:6:{s:4:"data";s:43:"
";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:34:"How is join_buffer_size allocated?";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:43:"http://www.mysqlperformanceblog.com/?p=3252";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:81:"http://www.mysqlperformanceblog.com/2010/07/05/how-is-join_buffer_size-allocated/";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:3109:"When examining MySQL configuration, we quite often want to know how various buffer sizes are used. This matters because some buffers (sort_buffer_size for example) are allocated to their full size immediately as soon as they are needed, but others are effectively a "max size" and the corresponding buffers are allocated only as big as needed (key_buffer_size). There are many examples of this. What about join_buffer_size?
I saw a my.cnf with a 128M join_buffer_size the other day and needed to research this quickly before I gave advice. The join buffer is a special case. Unlike many of the buffers that are allocated per-thread (i.e. per-connection), this one is allocated per-join-per-thread, in special cases. A join buffer is allocated to cache rows from each table in a join when the join can't use an index. This is because we know that the nested loop is effectively going to do a table scan on the inner table -- it has to, because there's no index. If the query joins several tables this way, you'll get several join buffers allocated, for example, this one will have two:
select * from a_table
join b_table on b.col1 = a.col1
join c_table on c.col2 = b.col2
You can see these un-indexed queries in SHOW STATUS as Select_full_join, and you want zero of them. Anyway, back to the question about how the buffer is allocated. Its meaning is actually "minimum join buffer size." Here's the code, in sql/sql_select.cc in 5.1.47 source:
14176 /*****************************************************************************
14177 Fill join cache with packed records
14178 Records are stored in tab->cache.buffer and last record in
14179 last record is stored with pointers to blobs to support very big
14180 records
14181 ******************************************************************************/
14182
14183 static int
14184 join_init_cache(THD *thd,JOIN_TAB *tables,uint table_count)
14185 {
... snip ...
14268 cache->length=length+blobs*sizeof(char*);
14269 cache->blobs=blobs;
14270 *blob_ptr=0; /* End sequentel */
14271 size=max(thd->variables.join_buff_size, cache->length);
14272 if (!(cache->buff=(uchar*) my_malloc(size,MYF(0))))
14273 DBUG_RETURN(1); /* Don't use cache */ /* purecov: inspected */
14274 cache->end=cache->buff+size;
14275 reset_cache_write(cache);
14276 DBUG_RETURN(0);
14277 }
On line 14271 the server decides the size of the cache: it is the greater of the join_buffer_size or the size that's been determined to be needed. And it's allocated all at once. So a 128M join_buffer_size is indeed very bad!
And this leads me to an interesting possibility to run the server out of memory:
set session join_buffer_size = 1 << 30; # 1GB
select * from
(select 1 union select 1) as x1
join (select 1 union select 1) as x2
join....
That should try to allocate 1GB of memory for each join. If you execute this and nothing bad happens, you might be seeing this bug: http://bugs.mysql.com/55002
Entry posted by Baron Schwartz |
One comment
Add to: | | | | ";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Mon, 05 Jul 2010 14:56:31 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:1:{i:0;a:5:{s:4:"data";s:5:"mysql";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:5349:"When examining MySQL configuration, we quite often want to know how various buffer sizes are used. This matters because some buffers (sort_buffer_size for example) are allocated to their full size immediately as soon as they are needed, but others are effectively a "max size" and the corresponding buffers are allocated only as big as needed (key_buffer_size). There are many examples of this. What about join_buffer_size?
I saw a my.cnf with a 128M join_buffer_size the other day and needed to research this quickly before I gave advice. The join buffer is a special case. Unlike many of the buffers that are allocated per-thread (i.e. per-connection), this one is allocated per-join-per-thread, in special cases. A join buffer is allocated to cache rows from each table in a join when the join can't use an index. This is because we know that the nested loop is effectively going to do a table scan on the inner table -- it has to, because there's no index. If the query joins several tables this way, you'll get several join buffers allocated, for example, this one will have two:
select * from a_table
join b_table on b.col1 = a.col1
join c_table on c.col2 = b.col2
You can see these un-indexed queries in SHOW STATUS as Select_full_join, and you want zero of them. Anyway, back to the question about how the buffer is allocated. Its meaning is actually "minimum join buffer size." Here's the code, in sql/sql_select.cc in 5.1.47 source:
14176 /*****************************************************************************
14177 Fill join cache with packed records
14178 Records are stored in tab->cache.buffer and last record in
14179 last record is stored with pointers to blobs to support very big
14180 records
14181 ******************************************************************************/
14182
14183 static int
14184 join_init_cache(THD *thd,JOIN_TAB *tables,uint table_count)
14185 {
... snip ...
14268 cache->length=length+blobs*sizeof(char*);
14269 cache->blobs=blobs;
14270 *blob_ptr=0; /* End sequentel */
14271 size=max(thd->variables.join_buff_size, cache->length);
14272 if (!(cache->buff=(uchar*) my_malloc(size,MYF(0))))
14273 DBUG_RETURN(1); /* Don't use cache */ /* purecov: inspected */
14274 cache->end=cache->buff+size;
14275 reset_cache_write(cache);
14276 DBUG_RETURN(0);
14277 }
On line 14271 the server decides the size of the cache: it is the greater of the join_buffer_size or the size that's been determined to be needed. And it's allocated all at once. So a 128M join_buffer_size is indeed very bad!
And this leads me to an interesting possibility to run the server out of memory:
set session join_buffer_size = 1 << 30; # 1GB
select * from
(select 1 union select 1) as x1
join (select 1 union select 1) as x2
join....
That should try to allocate 1GB of memory for each join. If you execute this and nothing bad happens, you might be seeing this bug: http://bugs.mysql.com/55002
Entry posted by Baron Schwartz |
One comment
Add to:
|
|
|
| 
PlanetMySQL Voting:
Vote UP /
Vote DOWN";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:22:"MySQL Performance Blog";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:36;a:6:{s:4:"data";s:68:"
";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:42:"More on the open core : the pragmatic view";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:59:"tag:blogger.com,1999:blog-16959946.post-8398729666730250861";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:77:"http://datacharmer.blogspot.com/2010/07/more-on-open-core-pragmatic-view.html";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:278:" I joined the number of those who have a public opinion on the open core debate. Roberto Galoppini has graciously accepted to host a post on this topic in his Commercial Open Source Software blog. Please read it directly from there:Open to the core - The pragmatic freedomEnjoy!";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Mon, 05 Jul 2010 14:03:00 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:6:{i:0;a:5:{s:4:"data";s:11:"open source";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:1;a:5:{s:4:"data";s:9:"open core";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:2;a:5:{s:4:"data";s:5:"mysql";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:3;a:5:{s:4:"data";s:9:"community";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:4;a:5:{s:4:"data";s:6:"oracle";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:5;a:5:{s:4:"data";s:8:"business";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:1177:"
PlanetMySQL Voting:
Vote UP /
Vote DOWN";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:14:"Giuseppe Maxia";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:37;a:6:{s:4:"data";s:53:"
";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:54:"Tales of the Trade #4: new home for the MySQL dolphins";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:36:"http://code.openark.org/blog/?p=2655";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:87:"http://code.openark.org/blog/mysql/tales-of-the-trade-4-new-home-for-the-mysql-dolphins";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:0:"";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Mon, 05 Jul 2010 12:18:55 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:3:{i:0;a:5:{s:4:"data";s:5:"MySQL";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:1;a:5:{s:4:"data";s:5:"Humor";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:2;a:5:{s:4:"data";s:6:"Planet";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:573:"
PlanetMySQL Voting:
Vote UP /
Vote DOWN";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:12:"Shlomi Noach";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:38;a:6:{s:4:"data";s:58:"
";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:27:"Vote for Maria’s new name";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:35:"http://www.bytebot.net/blog/?p=1825";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:83:"http://feedproxy.google.com/~r/ColinCharles/~3/bXS5N4hT7NE/vote-for-marias-new-name";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:725:"Remember the idea behind renaming Maria, the storage engine? People regularly get confused with Maria the storage engine, and MariaDB the database. We got a whole lot of good names, and here’s a short-listed amount of names (short-listed by Monty Program employees down to about 15), and now its up to YOU to help us choose what the new name should be. Quick, go take the survey.
Voting ends on Friday, July 9, 2010, at 23:59UTC.
The winner will be announced on the first day of OSCON on July 19 2010. The winner walks away with a Meerkat net-top from System76.
Related posts:Quick notes: Monty Program Group Blog; Rename Maria
Monty speaks about Maria
Trying to reliably make MyISAM crash; Maria is sturdy as
";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Mon, 05 Jul 2010 07:52:44 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:4:{i:0;a:5:{s:4:"data";s:7:"MariaDB";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:1;a:5:{s:4:"data";s:5:"MySQL";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:2;a:5:{s:4:"data";s:7:"contest";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:3;a:5:{s:4:"data";s:6:"rename";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:3239:"Remember the idea behind renaming Maria, the storage engine? People regularly get confused with Maria the storage engine, and MariaDB the database. We got a whole lot of good names, and here’s a short-listed amount of names (short-listed by Monty Program employees down to about 15), and now its up to YOU to help us choose what the new name should be. Quick, go take the survey.
Voting ends on Friday, July 9, 2010, at 23:59UTC.
The winner will be announced on the first day of OSCON on July 19 2010. The winner walks away with a Meerkat net-top from System76.
Related posts:
- Quick notes: Monty Program Group Blog; Rename Maria
- Monty speaks about Maria
- Trying to reliably make MyISAM crash; Maria is sturdy as



PlanetMySQL Voting:
Vote UP /
Vote DOWN";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:13:"Colin Charles";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:39;a:6:{s:4:"data";s:103:"
";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:52:"Open Source Saves Malaysian Government RM188 Million";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:35:"http://www.bytebot.net/blog/?p=1821";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:111:"http://feedproxy.google.com/~r/ColinCharles/~3/TfCPFfBMg5M/open-source-saves-malaysian-government-rm188-million";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:1565:"Back in January 2009, we found out that the Malaysian Government had saved about RM40 million using open source. In a little over a year, that number has been topped: over the past six years, the total costs savings are now quoted to be RM188.39 million (USD$58.54 million)! That’s a hell of a lot of money for software licenses, don’t you think?
Worth noting is that before the OSS Master Plan started, there were zero companies supporting OSS registered with the Ministry of Finance. Now more than half of the 4,000 companies do (53% is the quoted number). For more information, read the latest newsletter from MAMPU’s OSCC. Key takeaways:
Saved RM188.39 million on software licenses over six years
Successful OSS adoption in 691 government agencies by the end of 2009 (till April 1 2010, the number looks like it has increased to 699 agencies).
In total, 95% of agencies are adopting some form of OSS solution, 87% are using it for back-end infrastructure (here its clear there’s Linux, MySQL in use), and 66% are using OSS on the desktop! (via OpenOffice.org and Firefox)*
* – Software use extrapolated from the actual OSS Master plan, and what was in the report in January 2009. I’m sure Joomla! is also used quite heavily, but never recall seeing it as the choice for CMS in the plan.
Related posts:Open Source saves Malaysian Government RM40 million
Free and Open Source Software: Use and Production by the Brazilian Government
Malaysian Government releases first Open Source software package – MyMeeting
";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Mon, 05 Jul 2010 07:40:32 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:13:{i:0;a:5:{s:4:"data";s:13:"FLOSSAdvocacy";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:1;a:5:{s:4:"data";s:7:"Mozilla";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:2;a:5:{s:4:"data";s:14:"Open Standards";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:3;a:5:{s:4:"data";s:10:"OpenOffice";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:4;a:5:{s:4:"data";s:7:"firefox";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:5;a:5:{s:4:"data";s:10:"government";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:6;a:5:{s:4:"data";s:8:"malaysia";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:7;a:5:{s:4:"data";s:5:"MAMPU";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:8;a:5:{s:4:"data";s:5:"MySQL";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:9;a:5:{s:4:"data";s:14:"openoffice.org";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:10;a:5:{s:4:"data";s:10:"opensource";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:11;a:5:{s:4:"data";s:4:"OSCC";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:12;a:5:{s:4:"data";s:17:"software licenses";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:4458:"
Back in January 2009, we found out that the Malaysian Government had saved about RM40 million using open source. In a little over a year, that number has been topped: over the past six years, the total costs savings are now quoted to be RM188.39 million (USD$58.54 million)! That’s a hell of a lot of money for software licenses, don’t you think?
Worth noting is that before the OSS Master Plan started, there were zero companies supporting OSS registered with the Ministry of Finance. Now more than half of the 4,000 companies do (53% is the quoted number). For more information, read the latest newsletter from MAMPU’s OSCC. Key takeaways:
- Saved RM188.39 million on software licenses over six years
- Successful OSS adoption in 691 government agencies by the end of 2009 (till April 1 2010, the number looks like it has increased to 699 agencies).
- In total, 95% of agencies are adopting some form of OSS solution, 87% are using it for back-end infrastructure (here its clear there’s Linux, MySQL in use), and 66% are using OSS on the desktop! (via OpenOffice.org and Firefox)*
* – Software use extrapolated from the actual OSS Master plan, and what was in the report in January 2009. I’m sure Joomla! is also used quite heavily, but never recall seeing it as the choice for CMS in the plan.
Related posts:
- Open Source saves Malaysian Government RM40 million
- Free and Open Source Software: Use and Production by the Brazilian Government
- Malaysian Government releases first Open Source software package – MyMeeting



PlanetMySQL Voting:
Vote UP /
Vote DOWN";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:13:"Colin Charles";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:40;a:6:{s:4:"data";s:83:"
";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:72:"Speaking in the US - San Francisco User Group - Community Summit - OSCON";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:59:"tag:blogger.com,1999:blog-16959946.post-8444082858256593736";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:84:"http://datacharmer.blogspot.com/2010/07/speaking-in-us-san-francisco-user-group.html";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:364:" On July 15th and 16th I will be in San Francisco for a few meetings, and it will be my pleasure to meet the San Francisco MySQL User Group, where I will talk about MySQL Sandbox.From there, I will continue to Portland, to attend the Community Leadership Forum and, of course OSCON, where I will have three talks: two in the main event, and one at the Intel booth.";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Mon, 05 Jul 2010 05:00:00 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:9:{i:0;a:5:{s:4:"data";s:5:"intel";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:1;a:5:{s:4:"data";s:10:"conference";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:2;a:5:{s:4:"data";s:5:"mysql";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:3;a:5:{s:4:"data";s:9:"community";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:4;a:5:{s:4:"data";s:13:"san francisco";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:5;a:5:{s:4:"data";s:5:"oscon";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:6;a:5:{s:4:"data";s:10:"leadership";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:7;a:5:{s:4:"data";s:10:"user group";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:8;a:5:{s:4:"data";s:6:"oracle";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:1544:"

PlanetMySQL Voting:
Vote UP /
Vote DOWN";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:14:"Giuseppe Maxia";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:41;a:6:{s:4:"data";s:73:"
";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:54:"MySQL track at the German Oracle User Group conference";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:64:"http://blogs.sun.com/datacharmer/entry/mysql_track_at_the_german";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:64:"http://blogs.sun.com/datacharmer/entry/mysql_track_at_the_german";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:1043:"
As we have seen for other events, the MySQL community has been invited to attend and participate in conferences organized by the Oracle User Groups.
After the past, present and future events in the United States, now we start with Europe.
There is a MySQL Track at the DOAG 2010 Conference, the main event of the German Oracle User Groups, and the CfP expires on July 10th.
The event is of course important for German speakers, but English speakers are also accepted.
As the other events in the US, this is a good occasion for MySQL users to get acquainted with the independent Oracle user group organization, and find common business needs. There are many MySQL users among the Oracle User Group members, and much curiosity about this small database that powers the Internet.
If you want to get a talk at this conference, feel free to submit a proposal. Or simply mark the dates: November 16th to 18th.
MySQL users, don't be shy!
Follow Paul McCullagh's example, and get ready to explore a yet uncharted but promising territory.";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Mon, 05 Jul 2010 04:00:00 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:7:{i:0;a:5:{s:4:"data";s:5:"mysql";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:1;a:5:{s:4:"data";s:9:"community";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:2;a:5:{s:4:"data";s:10:"conference";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:3;a:5:{s:4:"data";s:4:"doag";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:4;a:5:{s:4:"data";s:6:"groups";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:5;a:5:{s:4:"data";s:5:"mysql";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:6;a:5:{s:4:"data";s:4:"user";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:2131:"
|
As we have seen for other events, the MySQL community has been invited to attend and participate in conferences organized by the Oracle User Groups.
After the past, present and future events in the United States, now we start with Europe.
There is a MySQL Track at the DOAG 2010 Conference, the main event of the German Oracle User Groups, and the CfP expires on July 10th.
| |
The event is of course important for German speakers, but English speakers are also accepted.
As the other events in the US, this is a good occasion for MySQL users to get acquainted with the independent Oracle user group organization, and find common business needs. There are many MySQL users among the Oracle User Group members, and much curiosity about this small database that powers the Internet.
If you want to get a talk at this conference, feel free to submit a proposal. Or simply mark the dates: November 16th to 18th.
MySQL users, don't be shy!
Follow Paul McCullagh's example, and get ready to explore a yet uncharted but promising territory.
PlanetMySQL Voting:
Vote UP /
Vote DOWN";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:14:"Giuseppe Maxia";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:42;a:6:{s:4:"data";s:78:"
";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:59:"A review of Cloud Application Architectures by George Reese";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:33:"http://www.xaprb.com/blog/?p=1937";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:97:"http://www.xaprb.com/blog/2010/07/04/a-review-of-cloud-application-architectures-by-george-reese/";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:6538:"Cloud Application Architectures
Cloud Application Architectures. By George Reese, O’Reilly 2009. (Here’s a link to the publisher’s site).
This is a great book on how to build apps in the cloud! I was happy to see how much depth it went into. It’s short — 150 pages plus some appendixes — so I was expecting it to be a superficial overview. But it isn’t. It is thorough. And it is also obviously built on his own experience building very specific applications that he uses to run his business — he isn’t preaching about stuff he doesn’t know first-hand. Finally, George Reese is a good writer! It’s impressive. This is how he covers so much ground with so much depth in so few pages, and it all makes sense. He takes a side trip every now and then, but it’s always in the right place at the right time — how to do a snapshot for backups, for example — and isn’t distracting. For a technical book, it has an amazing narrative flow.
The book begins with an intro to cloud computing in general, with definitions and an explanation of different models, plus cost estimates of traditional IT, managed hosting, and cloud computing for an app. There’s a brief overview of the Amazon platform. This book is mostly about Amazon, and states that up front. There are references and comparisons to other providers throughout, and later there’ll be two appendixes on GoGrid and Rackspace, each written by a representative of that company. I was happy that the author brought in people to write those, instead of doing it himself. They are non-promotional in nature, and quite short. That adds value to the book, which would have been fine without them, honestly.
Back to chapter two now — a deeper introduction to Amazon, moving through all the major components, but especially EC2, S3, and EBS. Here we also start to see a focus on the platform as a whole — availability zones, security, redundancy, reliability. These topics are treated fairly and woven into every chapter. It’s clear that the author doesn’t want to isolate these topics, but rather explain them in context so your mind is always on them as each new topic is introduced. Chapter 3 picks all this up again: considering a move into the cloud? More cost comparisons, more explanations of concepts such as availability and how they translate into the Amazon cloud. Performance, disaster recovery and a few other topics show up here.
Chapter 4 is about how to build an app in the cloud: web app design, making multiple machines work together, handling failure, building AMIs, privacy, and operating databases (especially MySQL) in the cloud. The privacy section is particularly good. I’d recommend this to anyone building an app that might process personally identifiable information or financial information, in or out of the cloud. And as I said already, this is one of the types of things he weaves into the whole book. Chapter 5 picks right up and keeps going: it’s about security. Data security, regulatory compliance, network security, host security, how to respond if there’s a breach. And then Chapter 6 is on disaster recovery: planning, implementing, managing.
Chapter 7 is titled “scaling,” but it’s more than that. It starts with capacity planning. Here’s one of my favorite quotes: “some think they no longer need to engage in capacity planning… [others] think of tens or hundreds of thousands of dollars in consulting fees. Both thoughts are dangerous myths…” There’s a reference to John Allspaw’s excellent book on capacity planning. (I saw that he was a tech reviewer for this book, too.) This chapter covers how you predict and provision for capacity needs in the cloud, including the “automatic scaling” holy grail, how it can bite you, and how to keep that from happening. It also talks about how you scale vertically in the cloud. It doesn’t talk about why it’s hard to really be sure about your capacity needs in the cloud, but that’s okay given the other material covered in the chapter.
And that’s it! After this, it’s 3 appendixes. One is an AWS reference, and then there’s the two on GoGrid and Rackspace.
What’s to criticize? Well, not a lot really. I read every word in this book, I promise. Here’s what I noticed: he talked about database corruption from unexpected shutdowns — he should have said “use InnoDB,” because that’s pretty much a MyISAM problem. He talked about taking backups from replication slaves — he should have said “don’t just trust replication, verify it with mk-table-checksum.” I also think he encourages a little too much trust that cloud providers are always magically going to have the capacity you need; it felt a bit naive, but this is actually a fundamental point in whether you’re going to use the cloud or not. Nobody knows how much excess capacity Amazon has, and as we know, weird things happen. But if you’re going to embrace a cloud platform, you’re going to have to trust to a certain extent.
A couple other things to nitpick: in Chapter 1, when talking about availability, he writes “[if] even 1 minute of downtime in a year is entirely unacceptable, you almost certainly want to opt for a managed services environment… [if] 99.995% is good enough, you can’t beat the cloud.” But these numbers are unrealistic and don’t have enough context to explain what he means. Finally, in a couple of places he talks about algorithms for generating unique identifiers and dealing with concurrent access, but these don’t have a deep enough explanation to prevent novices from shooting themselves in the foot with wrong assumptions such as a timestamp will always increase between each subsequent access. But a savvy developer will recognize those problems and won’t be bitten.
This book is the first one to go onto my list of essential books in a while. I’ll be keeping this one on my own bookshelf.
Related posts:Review of Scalable Internet Architectures by Theo SchlossnagleA review of The Art of Capacity Planning by John AllspawUnder-provisioning: the curse of the cloudA review of Pentaho Solutions by Roland Bouman and Jos van DongenA review of Web Operations by John Allspaw and Jesse Robbins";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Mon, 05 Jul 2010 02:36:28 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:8:{i:0;a:5:{s:4:"data";s:6:"Review";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:1;a:5:{s:4:"data";s:3:"SQL";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:2;a:5:{s:4:"data";s:3:"AWS";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:3;a:5:{s:4:"data";s:15:"Cloud Computing";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:4;a:5:{s:4:"data";s:12:"George Reese";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:5;a:5:{s:4:"data";s:6:"GoGrid";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:6;a:5:{s:4:"data";s:12:"John Allspaw";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:7;a:5:{s:4:"data";s:9:"Rackspace";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:8684:"
Cloud Application Architectures
Cloud Application Architectures. By George Reese, O’Reilly 2009. (Here’s a link to the publisher’s site).
This is a great book on how to build apps in the cloud! I was happy to see how much depth it went into. It’s short — 150 pages plus some appendixes — so I was expecting it to be a superficial overview. But it isn’t. It is thorough. And it is also obviously built on his own experience building very specific applications that he uses to run his business — he isn’t preaching about stuff he doesn’t know first-hand. Finally, George Reese is a good writer! It’s impressive. This is how he covers so much ground with so much depth in so few pages, and it all makes sense. He takes a side trip every now and then, but it’s always in the right place at the right time — how to do a snapshot for backups, for example — and isn’t distracting. For a technical book, it has an amazing narrative flow.
The book begins with an intro to cloud computing in general, with definitions and an explanation of different models, plus cost estimates of traditional IT, managed hosting, and cloud computing for an app. There’s a brief overview of the Amazon platform. This book is mostly about Amazon, and states that up front. There are references and comparisons to other providers throughout, and later there’ll be two appendixes on GoGrid and Rackspace, each written by a representative of that company. I was happy that the author brought in people to write those, instead of doing it himself. They are non-promotional in nature, and quite short. That adds value to the book, which would have been fine without them, honestly.
Back to chapter two now — a deeper introduction to Amazon, moving through all the major components, but especially EC2, S3, and EBS. Here we also start to see a focus on the platform as a whole — availability zones, security, redundancy, reliability. These topics are treated fairly and woven into every chapter. It’s clear that the author doesn’t want to isolate these topics, but rather explain them in context so your mind is always on them as each new topic is introduced. Chapter 3 picks all this up again: considering a move into the cloud? More cost comparisons, more explanations of concepts such as availability and how they translate into the Amazon cloud. Performance, disaster recovery and a few other topics show up here.
Chapter 4 is about how to build an app in the cloud: web app design, making multiple machines work together, handling failure, building AMIs, privacy, and operating databases (especially MySQL) in the cloud. The privacy section is particularly good. I’d recommend this to anyone building an app that might process personally identifiable information or financial information, in or out of the cloud. And as I said already, this is one of the types of things he weaves into the whole book. Chapter 5 picks right up and keeps going: it’s about security. Data security, regulatory compliance, network security, host security, how to respond if there’s a breach. And then Chapter 6 is on disaster recovery: planning, implementing, managing.
Chapter 7 is titled “scaling,” but it’s more than that. It starts with capacity planning. Here’s one of my favorite quotes: “some think they no longer need to engage in capacity planning… [others] think of tens or hundreds of thousands of dollars in consulting fees. Both thoughts are dangerous myths…” There’s a reference to John Allspaw’s excellent book on capacity planning. (I saw that he was a tech reviewer for this book, too.) This chapter covers how you predict and provision for capacity needs in the cloud, including the “automatic scaling” holy grail, how it can bite you, and how to keep that from happening. It also talks about how you scale vertically in the cloud. It doesn’t talk about why it’s hard to really be sure about your capacity needs in the cloud, but that’s okay given the other material covered in the chapter.
And that’s it! After this, it’s 3 appendixes. One is an AWS reference, and then there’s the two on GoGrid and Rackspace.
What’s to criticize? Well, not a lot really. I read every word in this book, I promise. Here’s what I noticed: he talked about database corruption from unexpected shutdowns — he should have said “use InnoDB,” because that’s pretty much a MyISAM problem. He talked about taking backups from replication slaves — he should have said “don’t just trust replication, verify it with mk-table-checksum.” I also think he encourages a little too much trust that cloud providers are always magically going to have the capacity you need; it felt a bit naive, but this is actually a fundamental point in whether you’re going to use the cloud or not. Nobody knows how much excess capacity Amazon has, and as we know, weird things happen. But if you’re going to embrace a cloud platform, you’re going to have to trust to a certain extent.
A couple other things to nitpick: in Chapter 1, when talking about availability, he writes “[if] even 1 minute of downtime in a year is entirely unacceptable, you almost certainly want to opt for a managed services environment… [if] 99.995% is good enough, you can’t beat the cloud.” But these numbers are unrealistic and don’t have enough context to explain what he means. Finally, in a couple of places he talks about algorithms for generating unique identifiers and dealing with concurrent access, but these don’t have a deep enough explanation to prevent novices from shooting themselves in the foot with wrong assumptions such as a timestamp will always increase between each subsequent access. But a savvy developer will recognize those problems and won’t be bitten.
This book is the first one to go onto my list of essential books in a while. I’ll be keeping this one on my own bookshelf.
Related posts:
- Review of Scalable Internet Architectures by Theo Schlossnagle
- A review of The Art of Capacity Planning by John Allspaw
- Under-provisioning: the curse of the cloud
- A review of Pentaho Solutions by Roland Bouman and Jos van Dongen
- A review of Web Operations by John Allspaw and Jesse Robbins
PlanetMySQL Voting:
Vote UP /
Vote DOWN";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:22:"Baron Schwartz (xaprb)";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:43;a:6:{s:4:"data";s:58:"
";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:63:"If you're selling to your community... you've got it backwards.";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:25:"290 at http://openlife.cc";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:89:"http://openlife.cc/blogs/2010/july/if-youre-selling-your-community-youve-got-it-backwards";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:932:"My ongoing dialogue with Matthew Aslett inspired me to read more of his recent writings. An excellent piece Do not sell anything to your community is based on a blog post by Stephen Walli.
Inspired by Stephen, I also looked into a set of slides I recently created and will try that style for this post...
Aslett and Stephen make a great point:
the conversion of community users into paying customers has long been a concern for open source-related vendors. It has also long been a source of friction, with vendors that offer proprietary extensions being accused of “bait and switch” or otherwise undermining the value of the open source software in an attempt compel community users into becoming paying customers. In recent years the next generation of start-ups has learned that the best way to encourage a frictionless relationship between a vendor and its community is not to attempt to “convert” users at all.
read more";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Sun, 04 Jul 2010 21:24:32 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:4:{i:0;a:5:{s:4:"data";s:15:"Business models";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:1;a:5:{s:4:"data";s:9:"Community";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:2;a:5:{s:4:"data";s:5:"MySQL";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:3;a:5:{s:4:"data";s:11:"Open Source";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:1555:"My ongoing dialogue with Matthew Aslett inspired me to read more of his recent writings. An excellent piece Do not sell anything to your community is based on a blog post by Stephen Walli.
Inspired by Stephen, I also looked into a set of slides I recently created and will try that style for this post...
Aslett and Stephen make a great point:
the conversion of community users into paying customers has long been a concern for open source-related vendors. It has also long been a source of friction, with vendors that offer proprietary extensions being accused of “bait and switch” or otherwise undermining the value of the open source software in an attempt compel community users into becoming paying customers. In recent years the next generation of start-ups has learned that the best way to encourage a frictionless relationship between a vendor and its community is not to attempt to “convert” users at all.
read more
PlanetMySQL Voting:
Vote UP /
Vote DOWN";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:11:"Henrik Ingo";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:44;a:6:{s:4:"data";s:113:"
";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:60:"A review of Web Operations by John Allspaw and Jesse Robbins";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:33:"http://www.xaprb.com/blog/?p=1924";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:98:"http://www.xaprb.com/blog/2010/07/03/a-review-of-web-operations-by-john-allspaw-and-jesse-robbins/";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:4721:"Web Operations
Web Operations. By John Allspaw and Jesse Robbins, O’Reilly 2010, with a chapter by myself. (Here’s a link to the publisher’s site).
I wrote a chapter for this book, and it’s now on shelves in bookstores near you. I got my dead-tree copy today and read everyone else’s contributions to it. It’s a good book. A group effort such as this one is necessarily going to have some differences in style and even overlapping content, but overall it works very well. It includes chapters from some really smart people, some of whom I was not previously familiar with. John and Jesse obviously have good connections. A lot of the folks are from Flickr.
Here are the highlights in my opinion.
Theo Schlossnagle, who has a place on my list of essential books, opens things with an overview of what web operations really is, and why it’s hard. Don’t skip this. Theo’s introduction is concise and thoughtful.
Eric Ries discusses the benefits of continuous deployment. He is right on the money. Right out of college I spent 3 years as a developer at a company with very little engineering discipline, and then left for another company built by a small ace team practicing extreme programming. Eric nails the benefits of continuous deployment — he really gets it. I hadn’t heard of Eric before, but now I’ve subscribed to his blog.
John Allspaw (whose book on capacity planning is also on my list of essentials) and Richard Cook discuss how complex systems fail. This chapter appeared in part as a whitepaper and blog post on John’s blog, and is expanded in this book. I have spent a lot of time examining failures for clients, and as VP of Consulting, also a lot of time examining Percona’s own mistakes. I fully agree with the conclusions in this chapter. A few key points: there is never a single root cause; our desire to find one blinds us and keeps us from learning; true failures are inherently unpredictable and happen only when a series of things fails; avoiding failure requires experience with failure. This echoes another book I’ve read recently, The Black Swan.
Brian Moon’s chapter on unexpected traffic spikes. If you get a chance to hear Brian speak, take it. He’s an engaging guy with interesting and relevant stories to tell. Stories are always a better experience than bullet points.
Jake Loomis’s chapter on postmortems. My own research into prevention of emergencies agrees almost perfectly with his list of things to do on page 225. Read this chapter carefully! Now, knowing how to put this into action is hard — very hard — but at least you’ll have a place to start. The worst compliment I ever got after fixing a system that’d run out of hard drive space (due to utter lack of basic monitoring) was that I’d “saved the day.” Baloney. Postmortems can be a great way to learn your infrastructure’s weaknesses and prevent emergencies in the future. I’m fully confident that this particular client will again deploy new servers without adding them into Nagios, and the results will be predictable.
Naturally, my chapter about choosing a relational database architecture for web applications (skewed towards MySQL). There is a chapter on NoSQL databases by Eric Florenzano as well, but it is more introductionary-level.
What wasn’t so good? I didn’t get a lot of value out of John’s interview with Heather Champ, on community management and web operations. I did not think the interview format worked well in a book full of essays. But that might just be me. Also, a couple of places in two or three chapters felt a bit rant-ish without a lot of clear actionable advice; I think readers won’t get so much out of this.
Overall, though, this is a great book, badly needed, on a topic that is simply not yet recognized for its true importance. As Theo writes, we’re seeing the emergence of web operations as a very large profession; it’s one whose definition is not yet formalized or agreed-upon, but that’ll change. It’s too important not to. Jesse’s introduction repeats this sentiment: the world now relies on the web, and so the world relies also on the engineers who make it run. Web operations is work that matters.
Related posts:A review of The Art of Capacity Planning by John AllspawMy chapter in the forthcoming Web Operations bookReview of Scalable Internet Architectures by Theo SchlossnagleA review of Cloud Application Architectures by George ReeseA review of Cacti 0.8 Network Monitoring by Dinangkur Kundu and S. M. Ibrahim Lavlu";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Sun, 04 Jul 2010 02:02:27 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:15:{i:0;a:5:{s:4:"data";s:10:"Operations";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:1;a:5:{s:4:"data";s:6:"Review";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:2;a:5:{s:4:"data";s:3:"SQL";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:3;a:5:{s:4:"data";s:9:"Sys Admin";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:4;a:5:{s:4:"data";s:10:"Brian Moon";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:5;a:5:{s:4:"data";s:15:"Eric Florenzano";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:6;a:5:{s:4:"data";s:9:"Eric Ries";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:7;a:5:{s:4:"data";s:13:"Heather Champ";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:8;a:5:{s:4:"data";s:11:"Jake Loomis";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:9;a:5:{s:4:"data";s:13:"Jesse Robbins";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:10;a:5:{s:4:"data";s:12:"John Allspaw";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:11;a:5:{s:4:"data";s:5:"NoSQL";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:12;a:5:{s:4:"data";s:12:"Richard Cook";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:13;a:5:{s:4:"data";s:17:"Theo Schlossnagle";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:14;a:5:{s:4:"data";s:14:"Web Operations";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:6962:"
Web Operations
Web Operations. By John Allspaw and Jesse Robbins, O’Reilly 2010, with a chapter by myself. (Here’s a link to the publisher’s site).
I wrote a chapter for this book, and it’s now on shelves in bookstores near you. I got my dead-tree copy today and read everyone else’s contributions to it. It’s a good book. A group effort such as this one is necessarily going to have some differences in style and even overlapping content, but overall it works very well. It includes chapters from some really smart people, some of whom I was not previously familiar with. John and Jesse obviously have good connections. A lot of the folks are from Flickr.
Here are the highlights in my opinion.
- Theo Schlossnagle, who has a place on my list of essential books, opens things with an overview of what web operations really is, and why it’s hard. Don’t skip this. Theo’s introduction is concise and thoughtful.
- Eric Ries discusses the benefits of continuous deployment. He is right on the money. Right out of college I spent 3 years as a developer at a company with very little engineering discipline, and then left for another company built by a small ace team practicing extreme programming. Eric nails the benefits of continuous deployment — he really gets it. I hadn’t heard of Eric before, but now I’ve subscribed to his blog.
- John Allspaw (whose book on capacity planning is also on my list of essentials) and Richard Cook discuss how complex systems fail. This chapter appeared in part as a whitepaper and blog post on John’s blog, and is expanded in this book. I have spent a lot of time examining failures for clients, and as VP of Consulting, also a lot of time examining Percona’s own mistakes. I fully agree with the conclusions in this chapter. A few key points: there is never a single root cause; our desire to find one blinds us and keeps us from learning; true failures are inherently unpredictable and happen only when a series of things fails; avoiding failure requires experience with failure. This echoes another book I’ve read recently, The Black Swan.
- Brian Moon’s chapter on unexpected traffic spikes. If you get a chance to hear Brian speak, take it. He’s an engaging guy with interesting and relevant stories to tell. Stories are always a better experience than bullet points.
- Jake Loomis’s chapter on postmortems. My own research into prevention of emergencies agrees almost perfectly with his list of things to do on page 225. Read this chapter carefully! Now, knowing how to put this into action is hard — very hard — but at least you’ll have a place to start. The worst compliment I ever got after fixing a system that’d run out of hard drive space (due to utter lack of basic monitoring) was that I’d “saved the day.” Baloney. Postmortems can be a great way to learn your infrastructure’s weaknesses and prevent emergencies in the future. I’m fully confident that this particular client will again deploy new servers without adding them into Nagios, and the results will be predictable.
- Naturally, my chapter about choosing a relational database architecture for web applications (skewed towards MySQL). There is a chapter on NoSQL databases by Eric Florenzano as well, but it is more introductionary-level.
What wasn’t so good? I didn’t get a lot of value out of John’s interview with Heather Champ, on community management and web operations. I did not think the interview format worked well in a book full of essays. But that might just be me. Also, a couple of places in two or three chapters felt a bit rant-ish without a lot of clear actionable advice; I think readers won’t get so much out of this.
Overall, though, this is a great book, badly needed, on a topic that is simply not yet recognized for its true importance. As Theo writes, we’re seeing the emergence of web operations as a very large profession; it’s one whose definition is not yet formalized or agreed-upon, but that’ll change. It’s too important not to. Jesse’s introduction repeats this sentiment: the world now relies on the web, and so the world relies also on the engineers who make it run. Web operations is work that matters.
Related posts:
- A review of The Art of Capacity Planning by John Allspaw
- My chapter in the forthcoming Web Operations book
- Review of Scalable Internet Architectures by Theo Schlossnagle
- A review of Cloud Application Architectures by George Reese
- A review of Cacti 0.8 Network Monitoring by Dinangkur Kundu and S. M. Ibrahim Lavlu
PlanetMySQL Voting:
Vote UP /
Vote DOWN";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:22:"Baron Schwartz (xaprb)";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:45;a:6:{s:4:"data";s:38:"
";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:5:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:63:"MySQL Support has a good home at Oracle – and we’re hiring!";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:37:"http://mysqlblog.fivefarmers.com/?p=7";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:37:"http://mysqlblog.fivefarmers.com/?p=7";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:347:"There’s exciting news about Ulf Sandberg starting up a new MySQL services company. I respect Ulf tremendously – he’s a great guy with good business sense. That said, others have suggested that staff from the MySQL Support team need a new home – an assertion that simply is untrue. MySQL Support has a great home [...]";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Sat, 03 Jul 2010 21:57:00 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:560:"There’s exciting news about Ulf Sandberg starting up a new MySQL services company. I respect Ulf tremendously – he’s a great guy with good business sense. That said, others have suggested that staff from the MySQL Support team need a new home – an assertion that simply is untrue. MySQL Support has a great home [...]
PlanetMySQL Voting:
Vote UP /
Vote DOWN";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:11:"Todd Farmer";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:46;a:6:{s:4:"data";s:38:"
";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:5:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:35:"Where have all the MySQL DBAs gone?";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:69:"tag:blogger.com,1999:blog-375697951860081841.post-9201771631212091364";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:74:"http://www.jonathanlevin.co.uk/2010/07/where-have-all-mysql-dbas-gone.html";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:4769:"I have a friend who works as a MySQL recruiter. He recently told me that he cannot find any more MySQL DBAs in the UK or in the neighbouring countries. His words were "I've exhausted the market".
I myself know that there is a shortage of MySQL DBAs and that a lot of recruitment agencies are looking for them. So I started to wonder how come the situation is the way it is?
Experience curve
Usually what I see in companies, is that they start out with the developers taking care of the databases along side their usual coding duties. At some point, they complain to management that they need a professional DBA to handle the database. When the complaints get loud enough, management looks for a new DBA.
The problem is that the DBA they want needs to have 3-5 years experience as a MySQL DBA.
It seems to me that there is no "phasing into" being a MySQL DBA or at least, you are expected to know a lot to begin with.
As my recruiter friend said, DBAs come from 2 streams: developer or sysadmin.
I, myself, came from the developer side. I started as a developer and got more and more interested in the database. At some point, I was the only person in the company that knew anything about databases, so I was the first one to turn to when there was a problem. Next thing I knew, I was a DBA.
Since I come from a developer background, I have some understanding of the code and how it interacts with the database. I usually put an emphasis on queries and schema design over server settings.
DBAs from sysadmin background, start as a system or network administrator that are at some point assigned responsibility over the databases (with or without their consent) .
Most jobs do actually prefer a DBA from a sysadmin background over one from a developer background.
The problem with phasing into being a MySQL DBA position, is that you would need to have other responsibilities apart from looking after databases. Eventually, you can phase into a position that needs most of your time to do that.
Its just that the in between is such a large chasm to cross, that many people don't really try to (especially developers).
Its not that interesting
For some people being a DBA most of the time isn't that interesting. I have seen DBAs opening their own companies or trying to become managers. I, myself, do a lot of data projects in addition to looking after the databases.
Companies that try to find a DBA, because the other developers complained for one, will almost definitely not have a lot of work for a that DBA a few months after they come in.
Unless you work for a huge company like facebook or a consultancy like pythian, you will not find enough work as a MySQL DBA to avoid you from being bored.
Salary
There is also a problem with salaries. From my experience, most companies look at MySQL DBAs as PHP developers with a bit extra. And in most companies (but recently this has changed a bit), PHP developers get paid quit low.
(To be honest, when I say "most companies" I mean LAMP shops or LAMP e-commerce companies. Usually, if you are the type of company to adopt open source, you wont have enough money to pay people well. This is of course changing and my current company is one of those exceptions, but more often then not, its that way.)
So accordingly, MySQL DBAs get offered salaries similar to mid to high level PHP developers.
This can be fine for some people, but when you look over at the MS SQL and Oracle side.. it leave a lot to be desired. And back to the point of "phasing into" being a DBA, the proprietary databases have all sorts of levels. From beginner at "not so bad salary" to expert at "extremely well paid salary" to consultant at "obscenely high salary". There are very few MySQL DBA consultancy jobs that I have seen (almost all of them in London) and the pay is literately half of Oracle or MS SQL consultancy jobs.
It would be nice to see more balanced salaries for DBAs across the different technologies and recruiters like my friend try to achieve that.
What needs to be done
I think its a difficult situation, to be honest. Even if companies try to encourage more people to become MySQL DBAs, it will still take some years for anything to change. This adds to the problem, I believe, of companies accepting MySQL as their database of choice. They can find MS SQL and Oracle DBAs easy enough (they just have to pay a lot), but how can they justify MySQL when it takes (at least) 3-6 months to secure a MySQL DBA (if at all).
However, companies can try to accommodate DBAs by allowing for more mixed positions (either developer or sysadmins) as well as training.
It will be interesting seeing this space in the market develop.";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Sat, 03 Jul 2010 19:04:00 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:5469:"I have a friend who works as a MySQL recruiter. He recently told me that he cannot find any more MySQL DBAs in the UK or in the neighbouring countries. His words were "I've exhausted the market".
I myself know that there is a shortage of MySQL DBAs and that a lot of recruitment agencies are looking for them. So I started to wonder how come the situation is the way it is?
Experience curve
Usually what I see in companies, is that they start out with the developers taking care of the databases along side their usual coding duties. At some point, they complain to management that they need a professional DBA to handle the database. When the complaints get loud enough, management looks for a new DBA.
The problem is that the DBA they want needs to have 3-5 years experience as a MySQL DBA.
It seems to me that there is no "phasing into" being a MySQL DBA or at least, you are expected to know a lot to begin with.
As my recruiter friend said, DBAs come from 2 streams: developer or sysadmin.
I, myself, came from the developer side. I started as a developer and got more and more interested in the database. At some point, I was the only person in the company that knew anything about databases, so I was the first one to turn to when there was a problem. Next thing I knew, I was a DBA.
Since I come from a developer background, I have some understanding of the code and how it interacts with the database. I usually put an emphasis on queries and schema design over server settings.
DBAs from sysadmin background, start as a system or network administrator that are at some point assigned responsibility over the databases (with or without their consent) .
Most jobs do actually prefer a DBA from a sysadmin background over one from a developer background.
The problem with phasing into being a MySQL DBA position, is that you would need to have other responsibilities apart from looking after databases. Eventually, you can phase into a position that needs most of your time to do that.
Its just that the in between is such a large chasm to cross, that many people don't really try to (especially developers).
Its not that interesting
For some people being a DBA most of the time isn't that interesting. I have seen DBAs opening their own companies or trying to become managers. I, myself, do a lot of data projects in addition to looking after the databases.
Companies that try to find a DBA, because the other developers complained for one, will almost definitely not have a lot of work for a that DBA a few months after they come in.
Unless you work for a huge company like facebook or a consultancy like pythian, you will not find enough work as a MySQL DBA to avoid you from being bored.
Salary
There is also a problem with salaries. From my experience, most companies look at MySQL DBAs as PHP developers with a bit extra. And in most companies (but recently this has changed a bit), PHP developers get paid quit low.
(To be honest, when I say "most companies" I mean LAMP shops or LAMP e-commerce companies. Usually, if you are the type of company to adopt open source, you wont have enough money to pay people well. This is of course changing and my current company is one of those exceptions, but more often then not, its that way.)
So accordingly, MySQL DBAs get offered salaries similar to mid to high level PHP developers.
This can be fine for some people, but when you look over at the MS SQL and Oracle side.. it leave a lot to be desired. And back to the point of "phasing into" being a DBA, the proprietary databases have all sorts of levels. From beginner at "not so bad salary" to expert at "extremely well paid salary" to consultant at "obscenely high salary". There are very few MySQL DBA consultancy jobs that I have seen (almost all of them in London) and the pay is literately half of Oracle or MS SQL consultancy jobs.
It would be nice to see more balanced salaries for DBAs across the different technologies and recruiters like my friend try to achieve that.
What needs to be done
I think its a difficult situation, to be honest. Even if companies try to encourage more people to become MySQL DBAs, it will still take some years for anything to change. This adds to the problem, I believe, of companies accepting MySQL as their database of choice. They can find MS SQL and Oracle DBAs easy enough (they just have to pay a lot), but how can they justify MySQL when it takes (at least) 3-6 months to secure a MySQL DBA (if at all).
However, companies can try to accommodate DBAs by allowing for more mixed positions (either developer or sysadmins) as well as training.
It will be interesting seeing this space in the market develop.
PlanetMySQL Voting:
Vote UP /
Vote DOWN";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:14:"Jonathan Levin";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:47;a:6:{s:4:"data";s:58:"
";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:25:"A life among databases...";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:70:"tag:blogger.com,1999:blog-9144505959002328789.post-2283667862031248389";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:73:"http://karlssonondatabases.blogspot.com/2010/07/life-among-databases.html";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:3433:"A long time ago, in the early 1980's, I decided to change jobs. I was a young guy, with no real experience of commercial software or anything like that, rather I was a self-taught sysadmin for an ancient UNix system. The company I worked for was in the Telco business, so I looked for another job where I could develop myself and at the same to use my telco knowledge.I found a Telco startup, privately held. I must say that the fact that it was privately held meant nothing to me at the time. Nothing. They were building a system, the servers were VAXes running VMS, and again a became a sysadmin.Having been sysadmin at this company for a while, building up the central datacenter (to be honest, in todays words, that was what I was doing, but at the time, I had no real clue. I was mosr enthusiastic and ready to take on any task than I was smart or intelligent or knew what I was doing, really). But I wanted to develop myself, and at the previous job I had learnt to code in C (it was a Unix system after all), so I slowly migrated into database development, managing sysadmin duties on the side.Still, I wasn't truly professional I think. But I was willing to work and I was persistent and just wouldn't let go. I came to the office dressed in a pair of Jeans and a T-Shirt, and wasn't really aware that sometimes it would be a good idea to dress up or something (this was the 1980's still, so that might have been a good idea back then).As a developer, I realized that the system used a database, and a SQL database! I had no clue whatsoever what this was. But I started writing code creating tables and working my way through this, learning as I went. That you needed something called an "INDEX" became obvious to me after I had shown my latest creating to my colleagues and the things was just soooo slow. In the end, I actually picked up the manual for that "SQL Database", whatever THAT was.After about a year at this company, they decided to move their operations abroad, and I wanted to stay in Sweden, so I went looking for another job. The company behind the SQL Database I had used was looking for people it seemed, so I applied for a job. What it was like working for a US based software company was something I had no clue about. I got the job and turned up for my duties as a support engineer dressed in Jeans and a T-Shirt, and got to work. What a support engineer was really supposed to be doing wasn't something I really knew, it was more along the lines of people calling me with questions, and I tried to help them, as best as I could.The company in question was Oracle. And Oracle really did support me, and courage me, to develop myself, to go to training classes (I didn't ask for these, I was just sent away on them), to take on other jobs inside the organization to to develop my technical and business skills.For all this, I am grateful. Oracle largely shaped me for my future career among database companies, and if that is a good or bad thing is up for you to decide. Now I'm back at Oracle, and I still enjoy it. I am aware that not everyone will agree with me here, but I am glad to be back, after nearly 20 years after I left, and many things with this geart company is still around.All in all, I'm sure Oracle is a good home for MySQL. You may think differently, but I am honored to work for Oracle, and even more so with MySQL at Oracle. Frankly, I can't see that it can get any better./Karlsson";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Sat, 03 Jul 2010 12:44:00 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:4:{i:0;a:5:{s:4:"data";s:3:"sql";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:1;a:5:{s:4:"data";s:5:"mysql";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:2;a:5:{s:4:"data";s:8:"database";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:3;a:5:{s:4:"data";s:6:"oracle";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:9959:"A long time ago, in the early 1980's, I decided to change jobs. I was a young guy, with no real experience of commercial software or anything like that, rather I was a self-taught sysadmin for an ancient UNix system. The company I worked for was in the Telco business, so I looked for another job where I could develop myself and at the same to use my telco knowledge.
I found a Telco startup, privately held. I must say that the fact that it was privately held meant nothing to me at the time. Nothing. They were building a system, the servers were VAXes running VMS, and again a became a sysadmin.
Having been sysadmin at this company for a while, building up the central datacenter (to be honest, in todays words, that was what I was doing, but at the time, I had no real clue. I was mosr enthusiastic and ready to take on any task than I was smart or intelligent or knew what I was doing, really). But I wanted to develop myself, and at the previous job I had learnt to code in C (it was a Unix system after all), so I slowly migrated into database development, managing sysadmin duties on the side.
Still, I wasn't truly professional I think. But I was willing to work and I was persistent and just wouldn't let go. I came to the office dressed in a pair of Jeans and a T-Shirt, and wasn't really aware that sometimes it would be a good idea to dress up or something (this was the 1980's still, so that might have been a good idea back then).
As a developer, I realized that the system used a database, and a SQL database! I had no clue whatsoever what this was. But I started writing code creating tables and working my way through this, learning as I went. That you needed something called an "INDEX" became obvious to me after I had shown my latest creating to my colleagues and the things was just soooo slow. In the end, I actually picked up the manual for that "SQL Database", whatever THAT was.
After about a year at this company, they decided to move their operations abroad, and I wanted to stay in Sweden, so I went looking for another job. The company behind the SQL Database I had used was looking for people it seemed, so I applied for a job. What it was like working for a US based software company was something I had no clue about. I got the job and turned up for my duties as a support engineer dressed in Jeans and a T-Shirt, and got to work. What a support engineer was really supposed to be doing wasn't something I really knew, it was more along the lines of people calling me with questions, and I tried to help them, as best as I could.
The company in question was Oracle. And Oracle really did support me, and courage me, to develop myself, to go to training classes (I didn't ask for these, I was just sent away on them), to take on other jobs inside the organization to to develop my technical and business skills.
For all this, I am grateful. Oracle largely shaped me for my future career among database companies, and if that is a good or bad thing is up for you to decide. Now I'm back at Oracle, and I still enjoy it. I am aware that not everyone will agree with me here, but I am glad to be back, after nearly 20 years after I left, and many things with this geart company is still around.
All in all, I'm sure Oracle is a good home for MySQL. You may think differently, but I am honored to work for Oracle, and even more so with MySQL at Oracle. Frankly, I can't see that it can get any better.
/Karlsson
PlanetMySQL Voting:
Vote UP /
Vote DOWN";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:15:"Anders Karlsson";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:48;a:6:{s:4:"data";s:63:"
";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:21:"Monty doesn't say ...";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:56:"http://blogs.sun.com/datacharmer/entry/monty_doesn_t_say";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:56:"http://blogs.sun.com/datacharmer/entry/monty_doesn_t_say";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:2227:"Monty says:
During the last 2 years, I have seen a lot of the people that originally worked at MySQL AB and who joined Sun together with me, go away in different directions. More than 50 % of them have already left Sun/Oracle.
Matthew Montgomery commented...
I'm curious where you are getting that > 50% number? How would you have any access to that sort of HR information? Sounds like one of those numbers that falls under the "83% of all statistics are made up" rule.
And Monty answered
The MySQL Alumini in linkedin group has close to 200 members, a majority that also joined Sun and this group doesn't include all people that have left. I have also heard the number 50 % from MySQL people that recently left Oracle, so I have good reasons to believe that this figure is reasonable accurate.
Then I left a comment, but I haven't seen it published yet. So here goes
Monty,
Your figure is far from being accurate.
The MySQL Alumni group includes at least 75 people (hand counted, there may be more) who are still working with MySQL at Oracle.
The Sakila Alumni is a group of former MySQL AB employees. I am also part of that group, but I still work with the MySQL team at Oracle.
You'd be also pleased to acknowledge that, as of January 16, 2008, when MySQL AB was acquired by Sun, the number of all-time hired employees was way over 600. On that day, there were already more than 200 former MySQL AB employees.
If you have better information that doesn't come from hearsay, I'd like to know.
Giuseppe
Actually, if you go to the MySQL Alumni group on LinkedIn, and click on "members", and then on "advanced search", you get this summary:
So, according to LinkedIn, out of 194 Alumni, 49 are at Oracle, 49 at Sun Microsystems, and 21 in a time machine still at MySQL. That makes 120 people out of 194 who are still with the team, unless their "current company" is not up to date.
And, to make things more clear, let me add something. I was challenged to publish the current staffing numbers to settle the matter. Let me remind everyone who wants to get involved in this silly claim that the burden of proof lies with the claimant. I have no obligation to release company confidential records to dispel FUD.";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Sat, 03 Jul 2010 08:35:41 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:5:{i:0;a:5:{s:4:"data";s:5:"mysql";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:1;a:5:{s:4:"data";s:9:"community";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:2;a:5:{s:4:"data";s:5:"monty";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:3;a:5:{s:4:"data";s:5:"mysql";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:4;a:5:{s:4:"data";s:6:"oracle";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:3238:"Monty says:
During the last 2 years, I have seen a lot of the people that originally worked at MySQL AB and who joined Sun together with me, go away in different directions. More than 50 % of them have already left Sun/Oracle.
Matthew Montgomery commented...
I'm curious where you are getting that > 50% number? How would you have any access to that sort of HR information? Sounds like one of those numbers that falls under the "83% of all statistics are made up" rule.
And Monty answered
The MySQL Alumini in linkedin group has close to 200 members, a majority that also joined Sun and this group doesn't include all people that have left. I have also heard the number 50 % from MySQL people that recently left Oracle, so I have good reasons to believe that this figure is reasonable accurate.
Then I left a comment, but I haven't seen it published yet. So here goes
Monty,
Your figure is far from being accurate.
The MySQL Alumni group includes at least 75 people (hand counted, there may be more) who are still working with MySQL at Oracle.
The Sakila Alumni is a group of former MySQL AB employees. I am also part of that group, but I still work with the MySQL team at Oracle.
You'd be also pleased to acknowledge that, as of January 16, 2008, when MySQL AB was acquired by Sun, the number of all-time hired employees was way over 600. On that day, there were already more than 200 former MySQL AB employees.
If you have better information that doesn't come from hearsay, I'd like to know.
Giuseppe
Actually, if you go to the MySQL Alumni group on LinkedIn, and click on "members", and then on "advanced search", you get this summary:
So, according to LinkedIn, out of 194 Alumni, 49 are at Oracle, 49 at Sun Microsystems, and 21 in a time machine still at MySQL. That makes 120 people out of 194 who are still with the team, unless their "current company" is not up to date.
And, to make things more clear, let me add something. I was challenged to publish the current staffing numbers to settle the matter. Let me remind everyone who wants to get involved in this silly claim that the burden of proof lies with the claimant. I have no obligation to release company confidential records to dispel FUD.
PlanetMySQL Voting:
Vote UP /
Vote DOWN";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:14:"Giuseppe Maxia";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:49;a:6:{s:4:"data";s:58:"
";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:22:"Announcing TokuDB v4.0";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:26:"http://tokutek.com/?p=1565";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:50:"http://tokutek.com/2010/07/announcing-tokudb-v4-0/";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:1749:"Tokutek is pleased to announce immediate availability of TokuDB for MySQL, version 4.0. It is designed for continuous querying and analysis of large volumes of rapidly arriving and changing data, while maintaining full ACID properties.
New in TokuDB v4.0 is our multi-threaded Fast Loader. Capable of utilizing all available CPU cores, the Fast Loader greatly accelerates creating a new index or initializing a database with a bulk load. On an 8 core machine, loading operations speed up by over 4x and with 16 cores the acceleration is even greater.
Other improvements in v4.0 include:
Shorter Logfiles – The logs needed for ACID transactions are now considerably
smaller. In addition to saving disk space, smaller logs means faster recovery in many cases.
Faster Logfile Reclamation – When a transaction commits, the space associated with logging for that transaction is reclaimed more quickly. This helps keep disk space requirements to a minimum.
Support for READ COMMITTED isolation level – Fulfills a very popular customer request.
This new release builds on TokuDB’s unique combination of capabilities:
10x-50x faster indexing for faster querying
Full support for ACID transactions
Short recovery time (seconds or minutes, not hours or days)
Immunity to database aging to eliminate performance degradation and maintenance headaches
5x-15x data compression for reduced disk use and lower storage costs
Because of its high indexing performance and transaction support, TokuDB is well suited to Web applications that must simultaneously store and query large volumes of rapidly arriving data, including:
Logfile Analysis
High-speed Webcrawling
Real-time clickstream analysis
Social Networking
eCommerce Personalization
";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Sat, 03 Jul 2010 05:43:36 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:4:{i:0;a:5:{s:4:"data";s:8:"TokuView";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:1;a:5:{s:4:"data";s:12:"Announcement";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:2;a:5:{s:4:"data";s:5:"mysql";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:3;a:5:{s:4:"data";s:6:"TokuDB";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:2210:"Tokutek is pleased to announce immediate availability of TokuDB for MySQL, version 4.0. It is designed for continuous querying and analysis of large volumes of rapidly arriving and changing data, while maintaining full ACID properties.
New in TokuDB v4.0 is our multi-threaded Fast Loader. Capable of utilizing all available CPU cores, the Fast Loader greatly accelerates creating a new index or initializing a database with a bulk load. On an 8 core machine, loading operations speed up by over 4x and with 16 cores the acceleration is even greater.
Other improvements in v4.0 include:
- Shorter Logfiles – The logs needed for ACID transactions are now considerably
smaller. In addition to saving disk space, smaller logs means faster recovery in many cases.
- Faster Logfile Reclamation – When a transaction commits, the space associated with logging for that transaction is reclaimed more quickly. This helps keep disk space requirements to a minimum.
- Support for READ COMMITTED isolation level – Fulfills a very popular customer request.
This new release builds on TokuDB’s unique combination of capabilities:
- 10x-50x faster indexing for faster querying
- Full support for ACID transactions
- Short recovery time (seconds or minutes, not hours or days)
- Immunity to database aging to eliminate performance degradation and maintenance headaches
- 5x-15x data compression for reduced disk use and lower storage costs
Because of its high indexing performance and transaction support, TokuDB is well suited to Web applications that must simultaneously store and query large volumes of rapidly arriving data, including:
- Logfile Analysis
- High-speed Webcrawling
- Real-time clickstream analysis
- Social Networking
- eCommerce Personalization
PlanetMySQL Voting:
Vote UP /
Vote DOWN";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:13:"Tokuview Blog";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}}}}}}}}}}}}s:4:"type";i:128;s:7:"headers";a:7:{s:4:"date";s:29:"Sat, 10 Jul 2010 10:17:37 GMT";s:6:"server";s:22:"Apache/2.2.15 (Fedora)";s:13:"last-modified";s:29:"Sat, 10 Jul 2010 10:16:17 GMT";s:13:"accept-ranges";s:5:"bytes";s:14:"content-length";s:6:"376256";s:10:"connection";s:5:"close";s:12:"content-type";s:8:"text/xml";}s:5:"build";s:14:"20090627192103";}