Tuesday, March 9, 2010

Visual Studio 2008 text-encoding iso-8859-1 versus UTF-8

I ran into an issue with ASP.NET pages coming back with the wrong character encoding for text processed through controls.
In the code I specified UTF-8, but when I checked the properties in Firefox the encoding was ISO-8859-1 (Western European (ISO) 28591).
One result of this was that a   that had been encoded through a control, such as a LinkButton, was appearing as  (U+00C2 Latin Capital Letter A With Circumflex Alt+0194) followed by the actual non-breaking space.

  • non-breaking space character in ISO-8859-1 is byte 0xA0
  • non-breaking space character in UTF-8 is byte 0xC2,0xA0
  • character bytes 0xC2,0xA0 viewed as ISO-8859-1 comes out as " " (where the second character is a non-breaking space).
<%@ Page Language="C#" AutoEventWireup="true" %>

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">
<head runat="server">
    <title></title> 
    <script runat="server">
        protected void Page_Load(object sender, EventArgs e)
        {
            HttpContext.Current.Response.ContentType = "Content-Type: text/html; charset=UTF-8";
            lnkTest.Text += "1";
        }
    </script>
</head>
<body>
    <form id="form1" runat="server">
    <div>
        <p>Before&nbsp;After</p>
        <asp:LinkButton ID="lnkTest" runat="server" Text="Before&nbsp;After" />
    </div>
    </form>
</body>
</html>

Try changing the content type line as follows:

HttpContext.Current.Response.ContentType = "text/html; charset=UTF-8";

See Also: