Wagtail Multilingual Sitemap

Google-friendly sitemaps for multilingual Wagtail sites

C05348A3-9AB8-42C9-A6E0-81DB3AC59FEB
           

When handling multi-lingual content on your website, you need to make sure that you generate a properly formatted Sitemap file.

wagtail-logo-png-transparent

When generating content on your website that is in multiple languages, you need to make sure to tell Google about localized versions of your page (and other page indexing engines).

To make this work properly, you'll need your site to provide two specific files, a robots.txt, and a sitemap.xml, both at the / root of your domain. While the Wagtail CMS: Django Content Management System does have a sitemap framework that seems capable of handling i18n translations, and the Wagtail Localize is a translation plugin that makes handling translated content a breeze. However I have to say that I couldn't get properly-formatted sitemap files to generate with these tools. Language alternate tags were present, the hreflang language codes were correct, but the href tags were always pointing at the default (English) language URL.

After much time spent trying to resolve this and overwrite methods, I decided to just use a Django template to accomplish this, just like I did for the robots.txt.

This assumes that you have followed the directions and have configured the list of languages in your project's settings, for instance:

# my_project/settings.py

WAGTAIL_CONTENT_LANGUAGES = LANGUAGES = [
    ('en', "English"),
    ('fr', "French"),
    ('es', "Spanish"),
]

I first had to create my template files, one for the robots, and one for sitemap.

Here is the content of the my_project/templates/sitemap.xml file:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xhtml="http://www.w3.org/1999/xhtml">
{% spaceless %}
    {% for url in urlset %}
        <url>
            <loc>{{ url.location }}</loc>
            {% if url.lastmod %}
                <lastmod>{{ url.lastmod|date:'Y-m-d' }}</lastmod>
            {% endif %}
            {% if url.changefreq %}<changefreq>{{ url.changefreq }}</changefreq>{% endif %}
            {% if url.priority %}<priority>{{ url.priority }}</priority>{% endif %}
            {% if url.language %}<lriority>{{ url.language }}</lriority>{% endif %}
            {% for item in url.alternates %}
                <xhtml:link rel="alternate" hreflang="{{ item.lang_code }}" href="{{ item.location }}"/>
            {% endfor %}
        </url>
    {% endfor %}
{% endspaceless %}
</urlset>

Here is the content of the my_project/templates/robots.txt template file:

User-Agent: *
Disallow: /admin/

# Sitemap files
Sitemap: {{ wagtail_site.root_url }}/sitemap.xml

Now the code.

Below is the content of the my_project/blog/views.py file.

It will only query active pages in the current active language. For each page, we look to see if there is a live translation in any of the other configured languages.

Make sure to set the values of changefreq and priority to whatever your needs are, the values in the script are provided as an example

Note that the value of changefreq provides search engines with general information and does not necessarily reflect the actual frequency of exploration of the page. Accepted values ​​are:

  • always
  • hourly
  • daily
  • weekly
  • monthly
  • yearly
  • never

The value "always" should be used to describe documents that change with each access. The value "never" should be used to describe archived URLs.

from django.views.generic import TemplateView
from wagtail.core.models import Site

from .models import BlogPage as Article
from django.urls import reverse
from django.conf import settings
from wagtail.core.models import Page
from wagtail.core.models import Locale
from django.conf import settings


class RobotsView(TemplateView):

    content_type = 'text/plain'
    template_name = 'robots.txt'

    def get_context_data(self, **kwargs):
        context = super().get_context_data(**kwargs)
        request = context['view'].request
        context['wagtail_site'] = Site.find_for_request(request)
        return context


class SiteMap(TemplateView):

    content_type = 'application/xml'
    template_name = 'sitemap.xml'

    def get_context_data(self, **kwargs):
        context = super().get_context_data(**kwargs)
        request = context['view'].request
        changefreq = 'daily'
        priority = 0.8
        context['urlset'] = []
        current = Locale.get_active().language_code
        domains = settings.LANGUAGES

        MySitemap = Article.objects.filter(locale=Locale.objects.get(language_code=current)).live()
        for page in MySitemap:
            url_info = {
                'location': page.url,
                'lastmod': page.last_published_at,
                'changefreq': changefreq,
                'priority': priority,
                'alternates': [],
            }

            for l in domains:
                lang = l[0]
                if lang == current:
                    continue
                translation = Locale.objects.get(language_code=lang)
                if page.has_translation(translation):
                    translated_blog_page = page.get_translation(translation)
                    if translated_blog_page.live:
                        url_info['alternates'].append({
                        'location': translated_blog_page.url,
                        'lang_code': lang,
                        })

            context['urlset'].append(url_info)

        return context

And finally, make it all work, edit the my_project/urls.py file and add the following:

from my_project.blog import views as blog

...

urlpatterns = [

    ...

    path('robots.txt', blog.RobotsView.as_view()),
    path('sitemap.xml', blog.SiteMap.as_view()),

    ....
]

As usual, since no two Wagtail implementations are alike, you will need to adjust the blog, BlogPage and Article to your exact configuration but I hope this will work for you and save you some time!

Posted Comments: 0

Tagged with:
Wagtail web