Lang=En: The Weight of the World, for Now…

“Content Type: UTF vs US-ASCII”

A recent Google Group discussion focused on encoding discrepancies where Burt shared the following concern:


…I got a scare yesterday when I saw in webmaster tools that my perfectly valid XHTML site, encoded in UTF-8 was listed as have all its content in US-ASCII.

It seems that the webmaster tools (reports) (bot?) don’t react to the encoding in the meta header of the page itself, but only to the header the server sends…



John Mu replied with:

In the end, as long as you can see that we’re listing your keywords and your site properly in the search results, it’s probably ok regardless of what is shown in the statistics. So far, I have only seen 2-3 cases where we incorrectly recognized the text encoding — and in those cases, the pages didn’t render properly in my browsers either, so this is definitely something you would notice.

It seems that there’s no big deal with this, but my mind is always bent on the longevity of ‘good’ SEO decisions. As a non-programmer, how do I know that encoding standards won’t make any significant advancements tomorrow, next week, next year? Foundations change all the time.

Beyond that, I see global language barriers getting smaller. How do I make sure that my US English site doesn’t fall through the UK translation when it’s being read by a surfer in India? It may, or may not, make any difference, but when I think of my $1,000.00, $10,000.00, or even $100K shopper abandoning a sale because what I have to say comes across mis-translated, suddenly it all becomes very important.

Besides, would it really kill anyone if Burt just wanted a specific consistency between his code and what Google reads? And, since Burt’s site is in English and this problem, apparently, doesn’t matter too much because of that, should non-English sites have more concern over this kind of discrepancy?

Hmmm, I wonder… Do you?


