PHP & MySQL Tits Bits

Thursday, August 27, 2009

“MySQL server has gone away” Part 1: max_allowed_packet.


Most MySQL users have tried getting this rather cryptic error message: “MySQL server has gone away”. The MySQL documentation describes lots of possible reasons for this here:http://dev.mysql.com/doc/refman/5.1/en/gone-away.html
However this page is of little help for most users, I think. Dozens of reasons are listed, but except for the trivial ones (like physical connection was lost, the MySQL server or the machine where it runs has crashed etc.) there are a few reasons for this that are very common in our experience and a lot of those mentioned are not.
Here we will discuss one situation that to our experience happens very frequently for people working across multiple servers. The situation is that if a client sends a SQL-statement longer than the server max_allowed_packet setting, the server will simply disconnect the client. Next query from the same client instance will find that the ‘MySQL server has gone away’.  At least it is like that with recent server versions.
1) But the documentation at http://dev.mysql.com/doc/refman/5.1/en/error-messages-client.html
.. also lists another client error:
Error: 2020 (CR_NET_PACKET_TOO_LARGE)  Message: Got packet bigger than ‘max_allowed_packet’ bytes
along with
Error: 2006 (CR_SERVER_GONE_ERROR): Message: MySQL server has gone away.
Actually I have not seen the ‘got packet bigger ..’ error myself for many years. Not since MySQL 3.23 or 4.0. I am uncertain if a recent server will sometimes still return ‘got packet bigger’ or not or if also this error message itself has ‘gone away’. If the ‘got packet bigger’ message is still relevant with recent servers it would be nice to have it specified under what conditions it occurs and when only ‘gone away’ will. If this error mesage is now ‘historical’ it should at least be removed from documentation or it should be mentioned that the error no. is reserved for this message - but not used anymore. But it would of course be much preferable to have the ‘got packet bigger’ error returned if that is the problem. It tells what the problem is - “MySQL server has gone away” does not tell anything specific. So ‘got packet bigger’ is a *much* better message than ‘gone away’. Also ‘got packet bigger’ is listed among client errors and not server errors what I would expect.  So maybe some problem with my understanding of things here?
Does anybody have any idea about if and why ‘got packet bigger’ now effectively seems to have ‘gone away’ too?
And most important: why disconnect the client? There are reconnect options of course, but it does not really help here. After a reconnect and executing the same query things just repeat themselves.
2) Basically I never understood why MySQL stick with the default 1M setting for [mysqld] when it is 16M for [mysqldump] in configuration ‘templates’ shipped with the MySQL server (I have tried to ‘hint’ them several times over the last 3-4 years). Obviously they realize that 1M is often too little for backup/restore since they use a larger setting for mysqldump. However users use all other sorts of tools for backup: other script-based tools running on the server, third-party (and MySQL) GUI clients, web-based tools (hosting control panels, phpMyAdmin), backup/restore routines shipping with or built-in applications etc. Often users do not have access to run mysqldump at all on hosted servers (at least not if they are shared servers). Further often Sysadmins are unwilling to change configuration settings and users are left with the option to generate SINGLE INSERTS - with horrible restore performance as a consequence - to ensure cross-server exports/imports (and still it fails with a well-grown MEDIUMBLOB). I deliberately use the term ‘exports/imports’ and not ‘backup/restore’ because it also applies to various tools that can connect to two or more servers at a time and copy data using various techniques without actually generating a file.
The max_allowed_packet problem as described here has been a big problem for us over time. I do not think MySQL fully realises the importance of the problem - mostly because our tools and the tools/clients shipped with the server respectively are used primarily by different segments of users (with some significant overlapping of course). We handle this problem now 100% in SQLyog (we generate the largest BULK INSERTS possible up to 16M everywhere when transferring data from one server to another with all the methods available) but we cannot prevent user  - if he wants to use BULK INSERTS -  to generate a SQL-DUMP on one server that will not import another because BULK INSERTS are too large. We will of course only be able to handle it if we are connected to both servers.
3) One solution would be to allow for max_allowed_packet as a SESSION variable. After a long time of unclarity about this - refer to http://bugs.mysql.com/bug.php?id=22891 and http://bugs.mysql.com/bug.php?id=32223
- it is now clear that it is not and will not be possible to override the GLOBAL setting for the SESSION. I regret this! It would be very nice to be able to “SET max_allowed_packet ..” on top of a SQL-script for instance.
4) And actually - and most basically - I also do not really understand why a max_allowed_packet setting is required at all - except that it makes sense of course that a server admin should be able to restrict not-well-behaving users in bombing the server with statements containing 1 GB large WHERE-clauses! But then we are not talking about 1M but rather something like 16-64-100M as a critical threshold, I think.
Also I am not sure if the reason is that the setting is used to allocate a fixed-size memory buffer for handling the query string or if it is related to handling network packages or whatever. I just wondered for quite some time if such restriction could not be avoided and whether this implementation is a deliberate choice for some reason or rather some consequence of coding techniques used currently.  I would like to get rid of it!

Original Post

Sunday, November 23, 2008

PHP UTF-8 cheatsheet

When we started building DropSend, we decided to support all languages worldwide from the start. The interface is currently in English only, but the application can send, store, sort and process your data whatever language you want. As a result, we have a good number of customers out east.

To support worldwide languages, you need to use UTF-8 encoding for your web pages, emails and application, rather than ISO 8859-1 or another common western encoding, since these don't support characters used in languages such as Japanese and Chinese.

Happily, UTF-8 is transparent to the core Latin characterset, so you won't need to convert all your data to start using UTF-8. But there are a number of other issues to deal with. In particular, because UTF-8 is a multibyte encoding, meaning one character can be represented by more one or more bytes. This causes trouble for PHP, because the language parses and processes strings based on bytes, not characters, and makes mincemeat multibyte strings - for example, by splitting characters 'in half', bodging up regular expressions, and rendering email unreadable.

There are a number of great articles online about UTF-8 and how it works - Joel Spolski's comes to mind - but very few about how to actually get it working with PHP and iron out all the bugs. So, here to save you the time we put in, is a quick cheatsheet and info about a few common issues.




1. Update your database tables to use UTF-8


CREATE DATABASE db_name
CHARACTER SET utf8
DEFAULT CHARACTER SET utf8
COLLATE utf8_general_ci
DEFAULT COLLATE utf8_general_ci
;

ALTER DATABASE db_name
CHARACTER SET utf8
DEFAULT CHARACTER SET utf8
COLLATE utf8_general_ci
DEFAULT COLLATE utf8_general_ci
;

ALTER TABLE tbl_name
DEFAULT CHARACTER SET utf8
COLLATE utf8_general_ci
;


2. Install the mbstring extension for PHP

Windows:
download the dll if it's not in your PHP extensions folder, and
uncomment the relevant line in your php.ini file:
extension=php_mbstring.dll
Linux: yum install php-mbstring

3. Configure mbstring

Do this in php.ini, httpd.conf or .htaccess. (Remember to prepend these with 'php_value ' in httpd.conf or .htaccess.)

mbstring.language  = Neutral ; Set default language to Neutral(UTF-8) (default)
mbstring.internal_encoding = UTF-8 ; Set default internal encoding to UTF-8
mbstring.encoding_translation = On ; HTTP input encoding translation is enabled
mbstring.http_input = auto ; Set HTTP input character set dectection to auto
mbstring.http_output = UTF-8 ; Set HTTP output encoding to UTF-8
mbstring.detect_order = auto ; Set default character encoding detection order to auto
mbstring.substitute_character = none ; Do not print invalid characters
default_charset = UTF-8 ; Default character set for auto content type header

4. Deal with non-multibyte-safe functions in PHP

The fast-and-loose way to do this is with the following php configuration:

mbstring.func_overload = 7 ; All non-multibyte-safe functions are overloaded with the mbstring alternatives

But there are problems with this. php.net has a warning

about this potentially affecting the whole server. And even if this
isn't an issue for you, mbstring can make a mess of binary strings.

So,
a better route is to search your application code for the following
functions, and replace them with mbstring's 'slot-in' alternatives:

mail()  -> mb_send_mail()
strlen() -> mb_strlen()
strpos() -> mb_strpos()
strrpos() -> mb_strrpos()
substr() -> mb_substr()
strtolower() -> mb_strtolower()
strtoupper() -> mb_strtoupper()
substr_count() -> mb_substr_count()
ereg() -> mb_ereg()
eregi() -> mb_eregi()
ereg_replace() -> mb_ereg_replace()
eregi_replace() -> mb_eregi_replace()
split() -> mb_split()

5. Sort out HTML entities

The
htmlentities() function doesn't work automatically with multibyte
strings. To save time, you'll want to create a wrapper function and use
this instead:

/**
* Encodes HTML safely for UTF-8. Use instead of htmlentities.
*
* @param string $var
* @return string
*/
function html_encode($var)
{
return htmlentities($var, ENT_QUOTES, 'UTF-8') ;
}

6. Check content-type headers

Check
through your code for any text-based content-type headers, and append
the UTF-8 charset, so the browser knows what it's working with:

header('Content-type: text/html; charset=UTF-8') ;

You should also repeat this at the top of HTML pages:

<meta http-equiv="Content-type" value="text/html; charset=UTF-8" />

7. Update email scripts

Email
can be tricky. You'll need to update the content-type for any emails
and text-based mime parts to use UTF-8 encoding. You'll also need to
alter the way in which headers are encoded to use UTF-8. mbstring
provides a function mb_encode_mimeheader() to handle this for you, but
it does make a mess of address lists: you'll need to encoding the name
and address parts seperately, then compile them into an address list.

Be sure to encode the subject and other headers too - Korean speakers will tend to put Korean text for the subject.

9. Check binary files and strings

Finally,
double check any binary files and strings handled by PHP, particularly
uploads, downloads and encryption. In some cases it may be necessary to
revert to ASCII just before a download or processing a binary string.

Wednesday, August 27, 2008

PHP

PHP tit's n bits