Should URLs be in English?

This is a question that we have been asked a few times: should website URLs be in English, even if the website content itself isn’t English?

If your site language uses the Latin script, or the characters in the ASCII set, the question isn’t too difficult. Your URLs should be in the same language as your site.

However, if your site language uses a different script, the issue is more complex. We have looked into it to understand the best practices and set out some recommendations for Amnesty websites.

Background information

Here are a few useful things to know, to better understand the question:

What is the ASCII character set?

ASCII stands for the “American Standard Code for Information Interchange”. It is a list of characters that are used in computers and electronic devices. The original set contains 128 characters: the numbers from 0-9, the upper and lower case English letters from A to Z, and some special characters. It was set up in the 1960s and is used today in modern computers, in HTML, and on the Internet.

What is URL encoding?

If a URL contains characters outside the ASCII set, the URL has to be encoded. To do this, every non-ASCII character is replaced with a % followed by a hexadecimal code to represent that character.

Which languages use mostly non-ASCII characters?

The ASCII characters are based on the Latin script. This script is used in many languages around the world for example in Western and Central Europe, sub-Saharan Africa, the Americas, and the Pacific, as well as many languages in other parts of the world.

Languages using different scripts may almost entirely consist of non-ASCII characters for example, languages using the Arabic, Cyrillic, Greek, Hebrew, Chinese or Thai scripts.

Almost every language uses some non-ASCII characters.

Localization vs long URLs

URLs which contain many non-ASCII characters will end up being much longer, due to the codes replacing each character. So, to decide whether to use English URLs or not, you may need to weigh up two competing search engine optimization factors:

  • using the same language in your website content and in your URLs is beneficial for search engine optimization
  • very long URLs are detrimental for search engine optimization

If your site language is made up of mostly non-ASCII characters, you will need to figure out how to get a balance between the benefits of localization and of shorter links.

Weighing it up for the Amnesty use case

Most Amnesty sites use a multilevel sub-folder structure, which makes the URLs longer. Our sites also tend to have longer slugs, sometimes using the entire headline of a press release as the slug. These URLs would get even longer if non-ASCII characters are used, which could have a negative effect on SEO.

Many Amnesty websites contain content in multiple languages. These sites face the dilemma of choosing URL localization or ease of content management. On amnesty.org, which has content in English, Spanish, French and Arabic, we maintain English in the URL slugs so that it is easier to monitor content performance as a central team.

Smaller sites might choose to localize URLs instead.

The direction you take here should consider your teams content marketing goals and resource to analyze website performance across languages.

After reviewing advice from industry experts, we can provide a few recommendations for Amnesty websites:

Recommendations

Single-language sites in a language mostly made up of ASCII characters:

On sites that only publish in one languge and that language is mostly made up of ASCII characters, the text in the URL should be written in the given language. It is acceptable to use non-ASCII characters when necessary, as this will improve your SEO localization efforts, although it may be worth investigating ways to keep URLs shorter, for example, avoiding too many subfolder layers on your site.

Examples: English, French (https://www.amnesty.fr/agir), Polish (https://www.amnesty.org.pl/nasze-akcje/), Kiswahili

Single-language sites in a language almost entirely made up of non-ASCII characters:

Sites that only publish in one language that is mostly made up of non-ASCII characters require a mixed approach.The domain name should be made up of ASCII characters. This can either be the English translation of the domain or the phonetic spelling in the originating language.

Subfolder names, as long as they are only one or two short words, can be written out in non-ASCII characters. Remember that each non-ASCII character is made up of 5 or more ASCII characters, so long, non-ASCII subfolder names will be much longer than they initially appear. If short subfolder names are not possible, then subfolder names should be either translated or written out phonetically in ASCII characters.

Page level slugs, which are often longer, should either be page IDs or written out in English.

Examples: Chinese (https://zh.amnesty.org/latest/), Russian (https://eurasia.amnesty.org/publikaczii/), Arabic, Dari

Large multi-language sites where most content is translated across all languages and you want to be able to analyze multiple language versions at once:

Pick one language to be the main language. Translated content should have the same URL slug as the main page, with distinguishing language codes just after the domain.

Example: amnesty.org (Arabic, English, French, Spanish)

Small multi-language sites or sites where all content varies depending on the language:

Language in the URL slug should be the same as the content language. If one of the languages uses mostly non-ASCII characters, follow the advice above on single language sites for non-ASCII character languages.