Perl 5.6

Perl Source Perl on over, if you want to install from a binary instead see the page (especially for Windows). How to install from source wget tar -xzf perl-5.28.1.tar.gz cd perl-5.28.1./Configure -des -Dprefix=$HOME/localperl make make test make install Read both INSTALL and README. Yoursystem in the perl-5.28.1 directory for more detailed information.

You can read this piece and dive into all the technical details and idiosyncrasies of perl and unicode. Or you can to fix your code. Manpages perluniintro and perlunicode document support for Unicode since perl 5.6.0. Perl 5.8 has better Unicode support and even more Unicode-related documentation.

In addition to and, it includes:, encoding, -f open manpages, and the list is not complete. The major problem with this documentation is its volume. Normal programmer won’t read it. (Cool programmers don’t read documentation, as we all know.) Most programmers don’t even have to read it all, because to start working with Unicode you just need to know the basic facts and rules. I somehow got into several different kinds of trouble with Unicode in Perl, both in 5.6 and 5.8, in several different projects. Always it was about processing and generating data in UTF-8 encoding.

Download and Install Perl: ActivePerl ActivePerl is the leading distribution of open source Perl. Download ActivePerl Community Edition and get started free in development.

The two main problems I’ve seen are:. UTF-8 data getting double-encoded. “Wide character in print” warning Having said the above, reading or at least browsing through the above mentioned manpages is still a good way to understand and solve your Unicode problems. If you don’t have time for that now, read on. The basic facts you need to know There is a distiction between bytes and characters. Characters are Unicode characters, and one character may consist of several bytes. There is a “utf8” flag on every scalar value, which might be “on” or “off”.

“On” state of the flag tells perl to treat the value as a string of characters. That is the source of many many perl/unicode problems. If you take a string with utf8 flag off and concatenate it with another string with utf8 flag on, perl converts the first one to UTF-8. This may sound okay and obvious.

But then you think: How? Perl will need to know the encoding of the string data before converting it, and perl will try to guess it. The algorithm perl uses in guessing is documented (uses some defaults and maybe checks your locale), but my suggestion is: never let perl do that. In my experience, this is the reason for double-encoded UTF-8 strings in 99% cases. An example Imagine you have two variables with Unicode data in it. And you print those variables.

Open FILE, '.