The Sitemap Protocol requires certain tags to be present in your XML file in order to be properly recognized. The sitemap file should be also UTF-8 encoded. In this tutorial section we will explain how does this work.
Basic Sitemap example
Here is a very basic example of a sitemap file. For our example here we use a single URL:
<?xml version="1.0" encoding="UTF-8"?>
You can check the next section of this tutorialor for a more complicated Sitemap example.
XML tags in the Sitemap file
Below we will revise the lines of the sitemap file one by one:
- Every Sitemap XML file must begin with an opening tag <urlset> and must end with </urlset>.
- Every "parent" entry should begin with <url> tag and end with </url>.
- In a similar way, every "child" entry should be placed between <loc> and </loc> tags.
- After a <loc> tag, an URL is expected which should start with "http://".
The length of the URL can be 2048 characters at most.
- The <lastmod> tag expects a date in the following format YYYY-MM-DD.
Be advised that you do not have to modify this tag each time you modify the document. The search engines will get the dates of the documents once they crawl them.
- The <changefreq> tag is used as a hint for the crawlers to indicate how ofter the page is modified and how often it should be indexed.
Note that this value may or may not affect the crawl bot behavior which depends solely on the search engine.
The <changefreq> tag expects one of the following values: always, hourly, daily, weekly, monthly, yearly, never.
Be advised that "always" is used for pages which are dynamically generated or changed/modified upon every access. As for the "never" value – be advised that even if you mark your page with a never value most probably it will be indexed once in a week for example.
- The <priority> value can vary from 0.0 to 1.0.
Be advised that this indicates only your personal preferences for the way you would like to have your website indexed.
The default value of a page that is not prioritized is 0.5. Any page with higher value will be crawled before the page with priority 0.5, and all pages with lower priority will be indexed after the page with 0.5 value.
Since the priority is relative it is used only for your website and even if you set a high priority to all of your pages this does not mean that they will be indexed more often, because this value is not used to make comparison between different websites.
Special characters in the Sitemap file
As we have mentioned before your sitemap should be UTF-8 encoded. This can be done when you save your sitemap file. Almost all text editors support saving in UTF-8 format.
Be advised that all data in the Sitemap should use entity escape codes for the characters listed below:
Character Escape Code
Ampersand & &
Single Quote ' '
Double Quote " "
Greater Than > >
Less Than < <
Don't forget, that your sitemap can not be larger than 10 MB!