If you are trying to work with websites, you'll sooner or later find yourself in the situation that the advertised website encoding is in fact not the encoding of the stuff the server sends you.
This might show up in something like this:

Encoding::UndefinedConversionError: "\xC3" from ASCII-8BIT to UTF-8

In this case, something said the string (aka: website) you look at is ASCII-8BIT encoded, while it is in fact not.
This happened to me while trying to dump some JSON and I ended up having to look at the encode method in ruby 1.9:

ok_string = bad_string.encode("UTF-8", {:invalid => :replace, :undef => :replace})

This works ok for my area as I don't really care about a few missing accents or umlauts. If you actually want to keep the document in an ok form, you might want to look at the .force_encoding() method.

Comments