As aWordPress hosting specialist and WordPress webmasterI regularly work on multilingual sites with automatic translation. For this, I generally use TranslatePress Developer edition, with the DeepL API to perform the translation. The problem? Translating everything at once isn't natively possible. So we had to find a solution. Discover this solution and save hours!
Problem: It's not possible to translate the whole site at once.
With TranslatePress, the translation of the content of your pages and articles, and above all their URLs (only available as a paid version), only takes place once someone has visited the page.
However, we'd like all URLs to be up to date, to avoid them changing for Google and generating 301 redirects, or even worse, 404 errors.
On smaller sites, you can visit every page and article in every language.
For a site with hundreds of pages and articles, visiting them all by hand would take hours.
So how do you go about it?
Solution: Visit the entire sitemap automatically!
The solution is as follows: Create a script that visits all the pages on your site, based on its SiteMap.
Your site doesn't have SiteMap? Click here for more details.
A SiteMap is a page, usually in XML format, that you can send to search engines (typically Google Search Console) to help them index all the pages on your site. This is quite essential with today's standards.
If you don't already have a SiteMap, you can easily add one using, for example, the WordPress plugin RankMath SEO.
Then you need a terminal that supports BASH. On Linux and Mac, it's native. On Windows, it's possible that powershell supports BASH, but alternatively, I'd advise you to install WSL. If you have SSH access to a Linux server, this obviously works perfectly.
The script
To save time, I used ChatGPT to create the following script. This will recursively visit all the URLs in your sitemap.xml via curl.
You can call this script "sitemap_visitor.sh", for example, make it executable (chmod +x sitemap_visitor.sh), then run it with your sitemap.
The two arguments to be indicated are :
- Your site URL
- Waiting time between each request (you can set this to 0 if you trust your server and your translation API consumption)
For example:
./sitemap_visitor.sh https://wwww.votresite.fr/sitemap_index.xml 1
Script:
#!/bin/bash
sitemap_url="$1"
delay="$2"
declare -A visited_sitemaps # Declare an associative array to track visited sitemaps
# Function to visit all URLs in a sitemap
visit_sitemap_urls() {
local current_sitemap="$1"
# Check if the sitemap has already been visited
if [[ ${visited_sitemaps["$current_sitemap"]} ]]; then
echo "Skipping already visited sitemap: $current_sitemap"
return
fi
# Mark the current sitemap as visited
visited_sitemaps["$current_sitemap"]=1
# Fetch the sitemap
echo "Fetching sitemap from: $current_sitemap"
sitemap_content=$(curl -s -L "$current_sitemap") # Added -L to follow redirects
if [[ -z "$sitemap_content" ]]; then
echo "Failed to fetch sitemap. Skipping."
return
fi
# Extract URLs from the sitemap using grep and sed
urls=$(echo "$sitemap_content" | grep -oP '(?<=<loc>).*?(?=</loc>)')
if [[ -z "$urls" ]]; then
echo "No URLs found in the sitemap. Skipping."
return
fi
echo "Found $(echo "$urls" | wc -l) URLs in the sitemap."
# Visit each URL
while read -r url; do
echo "Visiting: $url"
response_code=$(curl -o /dev/null -s -w "%{http_code}" -L "$url") # Added -L here too
if [[ "$response_code" == "200" ]]; then
echo "Successfully visited: $url"
else
echo "Failed to visit $url: HTTP $response_code"
fi
# Respectful crawling: wait between requests
sleep "$delay"
# Check if the URL is another sitemap
if [[ "$url" == *.xml ]]; then
echo "Found nested sitemap: $url"
visit_sitemap_urls "$url"
fi
done <<< "$urls"
}
visit_sitemap_urls "$sitemap_url"
You've now translated your entire WordPress site!
I hope this tip saves you hours of work!
For a site of around a hundred pages, and 5 additional languages, it cost me just under €20 for the DeepL API (+€4.99 subscription fee). Your experience may vary, so be sure to set limits, whether in TranslatePress or DeepL.
data:image/s3,"s3://crabby-images/fc75c/fc75c3c2b70b3164f1311803c02f2148a1e68de6" alt="API cost DeepL"
PS: This solution also allows you to cache your entire site after a site dump. 😜
Already an LRob customer and want to translate your site?
Looking for a competent and committed WordPress host ?
Or a Webmaster ?
Choose LRob!
Leave a Reply