The third tutorial covers rules. Rules allow you to configure how the web site is downloaded.

This tutorial assumes you have followed the steps in the first tutorial.

Adding a new rule

  1. Select Rules from the Project menu, or press Control+R to display the rule editor.
  2. Click the Add button to add a new blank rule and select it for editing
  3. In the Expression field, enter \.gif. This field allows you to enter regular expressions that will be matched against each crawled URI.
  4. The new rule has automatically default to have the Excluded flag, meaning any URI matching the expression will be excluded from the crawl. By tweaking these options more powerful crawl functionality can be utilised, for example to only download images
  5. Click OK to save the rule and close the editor
  6. Press Shift+F5 to copy the project

When the copy has finished, the Skipped table will show that all URL's containing .gif were skipped. A yellow icon indicates that the file was skipped due to a rule.

Editing a rule

  1. Select Rules from the Project menu, or press Control+R to display the rule editor.
  2. Select the rule from the list
  3. Enter \.gif(?:$|#|\?) into the Pattern field
  4. Click OK to save the rule and close the editor

If you copy the website now, you'll get the same results as before. However, the rule is now a little more robust - instead of blindingly ignoring any URL containing .gif, it will only ignore any URL which

  • ends in .gif (http://somewhere.com/test.gif)
  • has .gif before the fragment (http://somewhere.com/test.gif#bookmark)
  • has .gif before the query string (http://somewhere.com/test.gif?value1=a)

By entering regular expressions as rules, you have powerful control over what content is downloaded and what content is skipped. WebCopy includes a regular expression editor to help build and test rule expressions.

For another example on how use rules to control the crawl, see the how to only copy images example topic.

The original location of this content can be found here.

Download

Download current and archived versions of Cyotek WebCopy

Download

Minimum Requirements

Donate

Donate