When crawling a page, it fails with a "forbidden" error (code 403)

This error can occur when you try to crawl a website which doesn't allow the HEAD method, but returns 403 rather than 405.

While the HTTP protocol defines a number of methods, WebCopy makes use of only three of these - HEAD, GET and POST.

By default, WebCopy issues the HEAD method before crawling any URI which provides important information such as the content type and length before actually trying to download any content. This speeds up crawls where you are excluding content types that belong to large binary files. However, if the web server doesn't support, or has disabled, the HEAD method, then any crawl of that server will fail.

If this happens, you need to disable the use of the HEAD command by WebCopy. To do this, display the Project Properties dialog, select the Advanced section, then uncheck Use Header Checking. Click OK to save your changes and close the dialog, then retry the crawl.

How can I check up front if HEAD is supported?

You can use the Test URI feature of WebCopy to determine if the URI you want to crawl supports the HEAD method. Simply click Test URI from the toolbar, enter the URL of the site to test, and click Test. WebCopy will try and retrieve the headers, and will notify you of any problems. You can then use the same window to switch to GET and see if this works.

Leave a Comment

While we appreciate comments from our users, please follow our posting guidelines. Have you tried the Cyotek Forums for support from Cyotek and the community?

Styling with Markdown is supported

Comments

Gravatar

Gary Harding

# Reply

I Just found webcopy after an alternative product failed. I needed to 'disable header checking' but it then worked ! Well done on both the product and the help information!