Tuesday, March 9, 2010

Visual Studio 2008 text-encoding iso-8859-1 versus UTF-8

I ran into an issue with ASP.NET pages coming back with the wrong character encoding for text processed through controls.
In the code I specified UTF-8, but when I checked the properties in Firefox the encoding was ISO-8859-1 (Western European (ISO) 28591).
One result of this was that a   that had been encoded through a control, such as a LinkButton, was appearing as  (U+00C2 Latin Capital Letter A With Circumflex Alt+0194) followed by the actual non-breaking space.

  • non-breaking space character in ISO-8859-1 is byte 0xA0
  • non-breaking space character in UTF-8 is byte 0xC2,0xA0
  • character bytes 0xC2,0xA0 viewed as ISO-8859-1 comes out as " " (where the second character is a non-breaking space).
<%@ Page Language="C#" AutoEventWireup="true" %>

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">
<head runat="server">
    <script runat="server">
        protected void Page_Load(object sender, EventArgs e)
            HttpContext.Current.Response.ContentType = "Content-Type: text/html; charset=UTF-8";
            lnkTest.Text += "1";
    <form id="form1" runat="server">
        <asp:LinkButton ID="lnkTest" runat="server" Text="Before&nbsp;After" />

Try changing the content type line as follows:

HttpContext.Current.Response.ContentType = "text/html; charset=UTF-8";

See Also: