Robots.txt — Controlling Crawling on Your PrestaShop Store

The robots.txt file is the first line of communication between your online store and search engine crawlers. Correctly configured, it protects your crawl budget, prevents duplicate content from being indexed, and guides Googlebot toward your most strategically important pages. Misconfigured, it can block your product pages from being indexed and collapse your SEO visibility within hours.

What is the robots.txt file?

The robots.txt file is a plain text file placed at the root of your domain (https://your-store.com/robots.txt) that tells crawlers which parts of your site they are allowed or not allowed to explore. It follows the Robots Exclusion Protocol (REP), respected by Google, Bing, and virtually all legitimate crawlers.

Crucially, robots.txt controls crawling, not indexing. A page blocked in robots.txt can still appear in Google if it receives inbound links from other sites — to prevent indexing, you need to use a noindex meta tag.

Robots.txt directive syntax

A robots.txt file is made up of blocks called records, each targeting one or more user agents (robots). The key directives are:

User-agent: Identifies the targeted robot. User-agent: * applies to all robots; User-agent: Googlebot targets Google specifically.
Disallow: Forbids crawling a path. Disallow: /admin/ blocks the entire admin directory.
Allow: Explicitly permits a path even within a blocked directory. Useful for allowing CSS or JS files inside an otherwise restricted folder.
Sitemap: Points to the location of your XML sitemap. Sitemap: https://your-store.com/sitemap.xml — this is a discovery hint, not a crawl directive.
Crawl-delay: Suggests a delay between crawler requests. Ignored by Googlebot but respected by Bing and others.

Robots.txt for PrestaShop: the essential rules

PrestaShop generates many technical URLs that should never be indexed. A well-configured robots.txt for PrestaShop must cover the following cases:

🚫

Directories to block

Block /admin/, /modules/, /cache/, /install/, /config/ and /tools/. These technical directories have no SEO value and unnecessarily consume your crawl budget.

🔗

Faceted navigation

URLs generated by product filters (color, size, brand) create thousands of duplicates. Block filtering parameters with targeted Disallow rules.

🛒

Transactional pages

Block /cart, /order, /login, /my-account and their variants. These pages require an active session and have no value for search engines.

Example PrestaShop robots.txt

User-agent: *
Disallow: /admin/
Disallow: /modules/
Disallow: /cache/
Disallow: /install/
Disallow: /cart
Disallow: /order
Disallow: /login
Disallow: /my-account
Disallow: /*?order=
Disallow: /*?id_currency=
Allow: /modules/lexiikbridge/
Sitemap: https://your-store.com/sitemap.xml

Critical mistakes to avoid

Certain robots.txt mistakes can have catastrophic consequences for your search rankings:

Blocking CSS and JavaScript files: Google needs to load your site's resources to assess user experience. Blocking these files degrades your Core Web Vitals score as measured by Googlebot.
Disallow: / (entire root): A classic mistake during development mode — blocking the entire store prevents complete indexing. Always check before going live.
Blocking product images: If your /img/ directory is blocked, Google Images cannot index your photos, cutting off a significant source of traffic.
Forgetting protocol variants: Your robots.txt must match your site's protocol (HTTPS). A HTTPS site with a robots.txt referenced via HTTP can create inconsistencies.
Blocking and canonical simultaneously: If a page is blocked by robots.txt, Google cannot read its canonical tag — the two directives conflict with each other.

Robots.txt does not guarantee non-indexation

A page blocked in robots.txt can still appear in Google search results if it receives inbound links. Google will display the URL without a title or description. To definitively exclude a page, use the <meta name="robots" content="noindex"> tag or the X-Robots-Tag HTTP header.

Testing and validating your robots.txt

Google Search Console includes a built-in robots.txt testing tool that lets you simulate Googlebot's behavior against any URL on your site. It tells you whether the URL is allowed or blocked, and which rule applies. This is the essential tool to use after every modification to your robots.txt.

To access this tool: Google Search Console → Legacy robots.txt tester (under legacy tools), or use the URL Inspection tool directly, which indicates whether a page is blocked by robots.txt.

Try Lexiik on your PrestaShop store