Benchmarking

What are some reasons for why a site is not successfully checked?

Crawling a site can fail for multiple reasons.

  • The site cannot be found. It may have been moved to a new URL, or it is temporarily or permanently unavailable.
  • The site does not respond, it responds with a HTTP 403 error, or a timeout happens. The webserver may be blocking the crawler from accessing the site, or there may be routing issues.
  • The site is a placeholder, or it does not have more than one page.
  • The site may require Javascript to function correctly. This includes single-page applications, and sites under some forms of DDoS protection.
  • The site may require cookies to function correctly.
  • Invalid URLs collected from the site.

Checking the pages on a site may also fail for multiple reasons.
  • Too many pages caused errors in the checker (we require that 90% of the URLs are checkable).
  • Wrong mimetype on documents.
  • Broken links and invalid URLs.
  • Connection issues.
  • Pages that cause memory issues or otherwise crash the checker.