Pre Launch Screaming Frog Checks

Audience: Web Ops Team and Web Working Group Members.

What: A guide for how we crawl sites before a launch.

Screaming Frog is a crawling tool we use to look for broken links etc.

Before you go live you should crawl the existing production site, this will give you a list of URLs that are likely to be indexed by Google etc. You might need this list of URLs incase you need to do a 301 redirect map now or later. In a spread sheet you can swap the domain part of the URL and then crawl the staging site to check your redirect map works (don’t forget to check this 300s, to make sure they actually return 200s not 404s etc).

  • Type in the URL you wish to crawl
  • Make sure if you have robots blocked that you tell the crawler to ignore the Robots.txt file

We want to look for the following

  • Protocol, we don’t want any internal HTTP links they should all be HTTPS, this can be resolved with a find and replace.
  • Response codes, 500 server errors, these many need resolving by an developer or sys admin.
  • Response codes, 404 file not found errors, we want to make sure the end users has a list of broken links by looking at the to and from URLs these can be identified and resolved.
  • Response codes, 301 redirects, we want to reduce the number of unnecessary 301 redirects, for many 301 redirects this can be resolved with a find and replace. Crawl the returned links in the 300s, to make sure they return 200s eventually.
  • External, we want to look for any strange out/external links.
  • External, staging URLs, when you go live you will want to make sure the site is not using staging site URLs, this can be resolved with a find and replace.