htmLawed: Fix corruption to UTF8 encoded text
On some combinations of operating systems, PHP and libpcre versions, `\s` will match the iso-8859-x non-breaking-space, 0xa0. This regular expression will munge the UTF8 encoded version, 0xc2a0 to 0xc220, which is not a valid UTF8 character. When inserted into a UTF8 field in mysql, the text will be truncated at and after the first invalid character.
Please register or sign in to comment