Convert PHP serialized data to Unicode

I recently had to convert a database of a large Greek website from single-byte Greek to Unicode (UTF-8).
One of the problems I faced was the stored PHP serialized data: As PHP stores the length of the data (in bytes) inside the serialized string, the s…


This content originally appeared on Lea Verou’s blog and was authored by Lea Verou

I recently had to convert a database of a large Greek website from single-byte Greek to Unicode (UTF-8). One of the problems I faced was the stored PHP serialized data: As PHP stores the length of the data (in bytes) inside the serialized string, the stored serialized strings could not be unserialized after the conversion.

I didn’t want anyone to go through the frustration I went through while searching for a solution, so here is a little function I wrote to recount the string lengths, since I couldn’t find anything on this:

function recount_serialized_bytes($text) {
	mb_internal_encoding("UTF-8");
	mb_regex_encoding("UTF-8");

	mb_ereg_search_init($text, 's:[0-9]+:"');

	$offset = 0;

	while(preg_match('/s:([0-9]+):"/u', $text, $matches, PREG_OFFSET_CAPTURE, $offset) ||
		  preg_match('/s:([0-9]+):"/u', $text, $matches, PREG_OFFSET_CAPTURE, ++$offset)) {
		$number = $matches[1][0];
		$pos = $matches[1][1];

		$digits = strlen("$number");
		$pos_chars = mb_strlen(substr($text, 0, $pos)) + 2 + $digits;

		$str = mb_substr($text, $pos_chars, $number);

		$new_number = strlen($str);
		$new_digits = strlen($new_number);

		if($number != $new_number) {
			// Change stored number
			$text = substr_replace($text, $new_number, $pos, $digits);
			$pos += $new_digits - $digits;
		}

		$offset = $pos + 2 + $new_number;
	}

	return $text;
}

My initial approach was to do it with regular expressions, but the PHP serialized data format is not a regular language and cannot be properly parsed with regular expressions. All approaches fail on edge cases, and I had lots of edge cases in my data (I even had nested serialized strings!).

Note that this will only work when converting from single-byte encoded data, since it assumes the stored lengths are the string lengths in characters. Admittedly, it’s not my best code, it could be optimized in many ways. It was something I had to write quickly and was only going to be used by me in a one-time conversion process. However, it works smoothly and has been tested with lots of different serialized data. I know that not many people will find it useful, but it’s going to be a lifesaver for the few ones that need it.


This content originally appeared on Lea Verou’s blog and was authored by Lea Verou


Print Share Comment Cite Upload Translate Updates
APA

Lea Verou | Sciencx (2011-02-13T00:00:00+00:00) Convert PHP serialized data to Unicode. Retrieved from https://www.scien.cx/2011/02/13/convert-php-serialized-data-to-unicode/

MLA
" » Convert PHP serialized data to Unicode." Lea Verou | Sciencx - Sunday February 13, 2011, https://www.scien.cx/2011/02/13/convert-php-serialized-data-to-unicode/
HARVARD
Lea Verou | Sciencx Sunday February 13, 2011 » Convert PHP serialized data to Unicode., viewed ,<https://www.scien.cx/2011/02/13/convert-php-serialized-data-to-unicode/>
VANCOUVER
Lea Verou | Sciencx - » Convert PHP serialized data to Unicode. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2011/02/13/convert-php-serialized-data-to-unicode/
CHICAGO
" » Convert PHP serialized data to Unicode." Lea Verou | Sciencx - Accessed . https://www.scien.cx/2011/02/13/convert-php-serialized-data-to-unicode/
IEEE
" » Convert PHP serialized data to Unicode." Lea Verou | Sciencx [Online]. Available: https://www.scien.cx/2011/02/13/convert-php-serialized-data-to-unicode/. [Accessed: ]
rf:citation
» Convert PHP serialized data to Unicode | Lea Verou | Sciencx | https://www.scien.cx/2011/02/13/convert-php-serialized-data-to-unicode/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.