Translate an entire WordPress site at once with TranslatePress

As aWordPress hosting specialist and WordPress webmasterI regularly work on multilingual sites with automatic translation. For this, I generally use TranslatePress Developer edition, with the DeepL API to perform the translation. The problem? Translating everything at once isn't natively possible. So we had to find a solution. Discover this solution and save hours!

Problem: It's not possible to translate the whole site at once.

With TranslatePress, the translation of the content of your pages and articles, and above all their URLs (only available as a paid version), only takes place once someone has visited the page.

However, we'd like all URLs to be up to date, to avoid them changing for Google and generating 301 redirects, or even worse, 404 errors.

On smaller sites, you can visit every page and article in every language.

For a site with hundreds of pages and articles, visiting them all by hand would take hours.

So how do you go about it?

Solution: Visit the entire sitemap automatically!

The solution is as follows: Create a script that visits all the pages on your site, based on its SiteMap.

Your site doesn't have SiteMap? Click here for more details.

A SiteMap is a page, usually in XML format, that you can send to search engines (typically Google Search Console) to help them index all the pages on your site. This is quite essential with today's standards.

If you don't already have a SiteMap, you can easily add one using, for example, the WordPress plugin RankMath SEO.

Then you need a terminal that supports BASH. On Linux and Mac, it's native. On Windows, it's possible that powershell supports BASH, but alternatively, I'd advise you to install WSL. If you have SSH access to a Linux server, this obviously works perfectly.

The script

To save time, I used ChatGPT to create the following script. This will recursively visit all the URLs in your sitemap.xml via curl.

You can call this script "sitemap_visitor.sh", for example, make it executable (chmod +x sitemap_visitor.sh), then run it with your sitemap.

The two arguments to be indicated are :

  1. Your site URL
  2. Waiting time between each request (you can set this to 0 if you trust your server and your translation API consumption)

For example:

./sitemap_visitor.sh https://wwww.votresite.fr/sitemap_index.xml 1

Script:

#!/bin/bash

sitemap_url="$1"
delay="$2"
declare -A visited_sitemaps  # Declare an associative array to track visited sitemaps

# Function to visit all URLs in a sitemap
visit_sitemap_urls() {
  local current_sitemap="$1"

  # Check if the sitemap has already been visited
  if [[ ${visited_sitemaps["$current_sitemap"]} ]]; then
    echo "Skipping already visited sitemap: $current_sitemap"
    return
  fi

  # Mark the current sitemap as visited
  visited_sitemaps["$current_sitemap"]=1

  # Fetch the sitemap
  echo "Fetching sitemap from: $current_sitemap"
  sitemap_content=$(curl -s -L "$current_sitemap")  # Added -L to follow redirects

  if [[ -z "$sitemap_content" ]]; then
    echo "Failed to fetch sitemap. Skipping."
    return
  fi

  # Extract URLs from the sitemap using grep and sed
  urls=$(echo "$sitemap_content" | grep -oP '(?<=<loc>).*?(?=</loc>)')

  if [[ -z "$urls" ]]; then
    echo "No URLs found in the sitemap. Skipping."
    return
  fi

  echo "Found $(echo "$urls" | wc -l) URLs in the sitemap."

  # Visit each URL
  while read -r url; do
    echo "Visiting: $url"
    response_code=$(curl -o /dev/null -s -w "%{http_code}" -L "$url")  # Added -L here too

    if [[ "$response_code" == "200" ]]; then
      echo "Successfully visited: $url"
    else
      echo "Failed to visit $url: HTTP $response_code"
    fi

    # Respectful crawling: wait between requests
    sleep "$delay"

    # Check if the URL is another sitemap
    if [[ "$url" == *.xml ]]; then
      echo "Found nested sitemap: $url"
      visit_sitemap_urls "$url"
    fi
  done <<< "$urls"
}

visit_sitemap_urls "$sitemap_url"

You've now translated your entire WordPress site!

I hope this tip saves you hours of work!

For a site of around a hundred pages, and 5 additional languages, it cost me just under €20 for the DeepL API (+€4.99 subscription fee). Your experience may vary, so be sure to set limits, whether in TranslatePress or DeepL.

PS: This solution also allows you to cache your entire site after a site dump. 😜

Already an LRob customer and want to translate your site?


Looking for a competent and committed WordPress host ?
Or a Webmaster ?

Categories

Web hosting

Succeed on the web

Safety, performance, simplicity.
The best tools to serve you.

Nextcloud hosting

Nextcloud

The best free collaborative suite

Maintenance included

Webmaster WordPress Specialist

WordPress website management

Webmaster WordPress specialist in Orleans

Entrust your site to a WordPress security and maintenance expert

en_US