Cyotek WebCopy Revision History Copy websites locally for offline browsing
Due to changes in how WebCopy determines whether or not to process a given URL there could be differences with how WebCopy 1.9 works against previous versions. Please report any inconsistencies to us!
Added
- Added the ability to read cookies from an external file (#461) (User Manual)
- Added the ability to read cookies from an external file (#462) (User Manual)
- Test URL dialogue now allows configuring cookies
- Added
cookie
,cookie-jar
anddiscard-session-cookies
command line parameters (User Manual) - Added support for the legacy
compress
(#472) and non-standardBZip2
(#473) content encodings (User Manual)
Changed
- Documentation improvements (#400, #443, #461, #462, #482, untracked)
- Test URL dialogue now uses load on demand for settings pages
- 401 challenges no longer display credential dialogues unless the authentication type is either
Basic
orDigest
as no other values have been tested due to lack of resource - Updated mime database
Fixed
- Posting a form did not set an appropriate content type (#437)
- Custom headers were not applied when posting forms (#436)
- If a URL was previously skipped but then included in future scans, the original skip reason could be retained
- A blank error message was displayed for Brotli decompression errors (#447)
- One-time project validation checks were ignoring the content encoding settings of the project (which by default is Gzip and Deflate) and were requesting content with Brotli compression (#446)
- Brotli decompression could fail with streams larger than 65535 bytes (#448)
- The URI transformation service incorrectly attempted to add prefixes to email addresses, this in turn caused a crash if the
mailto:
reference was malformed (#450) - A crash could occur if a content type header was malformed and was either
utf
orutf-
(#450) - Fixed an issue where command line arguments sometimes didn't correctly process ambiguous relative arguments that could be a file name or a unqualified URI (#441). As a result of this fix, all URIs provided to command lines must be fully qualified, e.g.
https://example.com
overexample.com
- Fixed a crash that could occur when switching between empty virtual list views during a crawl and items were then subsequently added (#460)
- A crash which could occur when loading localised text is no longer fatal (#465). Note that we haven't been able to reproduce this, so if you previously received a crash after setting a language other than English, please email support@cyotek.com
- Speed and estimated downtime time calculations were incorrect and could cause a crash when downloading large files (#466)
- A crash would occur when editing a file that didn't have a mime type (#468)
- Speculative fix for a crash that could occur when finishing the New Project Wizard (#467)
- Fixed a crash that occurred if a 401 challenge was received and the
www-authenticate
header was a bare type (#469) - If a website returns a non-standard
Content-Encoding
value (or one currently not supported by WebCopy), no attempt will be made to decompress the file and it will be downloaded as-is. A new setting has been added to disable this behaviour, but is currently not exposed (#471) - Crashes that occurred when applying project validation corrections (for example if the base URL redirects, WebCopy will prompt to use the redirect version) were fatal (#474)
- Trying to save a CSV export with a relative filename crashed (#475)
- The quick scan diagram view could crash if invalid host names were detected (#476). This is another bug reported without context, if any user has previously experienced this please email support@cyotek.com
- The "Limit distance from base URL" setting now only applies to URLs that have a content type of
text/html
, e.g. it will prevent deep scanning whilst still allowing retrieval of all linked resources (#464) - URLs that had exclusion rules would still get requested depending on the combination of project settings (#481)
- The CLI would crash if the
recursive
andoutput
parameters were defined, and the specified output directory did not exist (#483) - Client is no longer marked as dpi-aware, which should resolve pretty much all the problems with the application not displaying correctly on high DPI screens. This is an interim fix until dpi-awareness can be properly introduced.
- Fixed a crash that could occur when trying to query if the scan above root setting should be abled and an invalid URI was project (#493)
- Fixed a crash that could occur when the scan/download progress dialog was closed (#454)
- The Export CSV dialog wasn't localised correctly, resulting in seemingly two Cancel buttons (#495)
Removed
- The PDF meta data provider has been removed
Due to changes in how WebCopy determines whether or not to process a given URL there could be differences with how WebCopy 1.9 works against previous versions. Please report any inconsistencies to us!
Added
- It is now possible to read additional URLs to scan from a text file [#281] (User Manual)
- Added
no-directories
,max-redirect
andheader
arguments (User Manual) - Added
proxy
,proxy-user
andproxy-password
arguments [#337] (User Manual) - Added
input-file
argument [#282] (User Manual) - Added Redirects To column to the results list
- Added Local File column to the files list
- Added Local File, Redirects To, Depth and Distance columns to links lists
- List views now display a configuration menu when context clicking a column header
- The GUI client now supports many of the same command line arguments as the CLI [#403] (User Manual)
- Added a new extension remap mode, Only HTML [#365]. This new option will change the extensions of downloaded files only if the content type is
text/html
, all other files will be as-is (User Manual). This setting is now also the default for new WebCopy projects - Added a new validator to try and detect unsupported websites [#407]
- Added new URL normalisation options for forcing HTTPS [#383] (User Manual)
- Added new URL normalisation option for ignoring case [#202] (User Manual)
Changed
- Setup will now install Microsoft .NET 4.8 if not present
- Adding multiple URLs to scan is now easier using a free text field [#282]
- Command line tools now report unknown parameters
- Major reworking of internal decision making logic [#242]
- New WebCopy projects will default to saving headers
- The sitemap tree now limits the number of child elements to a maximum of 100 by default [#402]. This setting can be changed in the application options
- Documentation updates
- Rule Tester dialogue now includes rule components
- Reworked setting validation
- Rule expressions are now validated before crawling a site
- WebCopy no long treats URLs as case-insensitive for new projects
- The project URL can now be set via the Quick Scan dialogue
Removed
- The Link Checker (GUI and CLI), URI Tester and XPath Tester tools have been removed from distribution due to lack of use
Fixed
- Only the last argument error was displayed when running command line tools
- WebCopy will now retry URLs that fail with "The server committed a protocol violation" exceptions
- If using the default user agent, WebCopy will now try a default browser agent if a 401 response is returned when validating the URL [#382]
- When issuing a 401 challenge dialogue, WebCopy could include additional header information in the description
- The Move Down button was incorrectly enabled when adding a new password entry, causing a crash if clicked [#394]
- Fixed a pair of conditions that could cause site map generation to nest the same tree until it crashes [#391]. This should also resolve a different crash that could occur generating a site diagram [#397]
- Cookie editor now does a better job of validating entered values
- Invalid cookies should no longer cause a crash [#396]
- WebCopy would sometimes remove file extensions that weren't really extensions [#327]
- Various performance improvements, both major and minor [#399, #404]
- Last modified date is now read from meta tags if available [#405]
- Cancelling a crawl should now abort any in-progress downloads
- Fixed an issue where reading from a hybrid stream returned null bytes up to the stream capacity after exhausting existing data
- The rule editor could allow you select conflicting options
- A crash occurred accessing the Quick Scan dialog if the project URL wasn't set (regression) [#430]
- A crash occurred using the Quick Scan dialog if there was a problem with the crawl and the Use Browser Option was enabled [#431]
- A crash occurred accessing the Quick Scan dialog with an invalid project URL [#434]
- Fixed a crash that could occur when trying to use an invalid output path [#435]
Warning! WebCopy projects saved using 1.8 are not backwards-compatible with older versions of WebCopy
Added
- Separate 32bit and 64bit setup files are now available
Changed
- Reorganised columns in results list view to hopefully reduce confusion when URLs are skipped
- Non-crawlable URLs that are skipped during an analyse are now recorded in the results list
Fixed
- WebCopy would not copy sites using an IP address rather than a DNS name
- URL validation checks were running on the base domain and ignoring any deep linking [#385]
- WebCopy wouldn't always strip empty path segments from URLs [#358]
- PDF and RSS will no longer be downloaded when performing a site analysis
- Improvements to several windows affected when running under custom DPI scaling modes [#241]
- The minimum / maximum file size editor required values to be entered as bytes, despite the labelling requesting kibibytes [#387]
- Sometimes WebCopy didn't shrink a path correctly to fit within file system limitations [#393]
- Default documents should no longer be named
index.htm.html
Changed
- When crawling for the first time, if the user-entered URL redirects (for example http to https or to www), WebCopy will now prompt to use the final URL [#368]
Removed
- Aero glass effects used by Windows 7 have been removed [#366]
Fixed
- Changing the URL of an existing project caused further crawling to instantly fail if the domain of the new URL did not match the old [#367] (regression in 1.8.1)
- Added a speculative fix for a hang when crawling a website with the experimental "use web browser" option enabled
- The GUI client now correctly allows unescaped URLs to be entered
- When using the "Use query string in local file names" option, WebCopy now correctly sanitises the query string
- Fixed a crash that occurred when trying to empty the destination folder using the 64bit version of WebCopy
Added
- Link Checker GUI client now allows the checking of external links to be enabled or disabled
- Link Checker GUI client now allows if URLs belonging to parent, sibling or sub domains should be checked
- Added auto scroll option to Link Checker GUI client
- Added progress indicator to Link Checker GUI client
- Added new Use Recycle Bin option to project settings. If set and the Empty website folder before copy is also set, any deleted files will be moved to the Recycle Bin instead
- The View Links dialog now allows the display of excluded URLs to be toggled
- Added proper editor for defining web page language settings at the project level
- Add application level setting for definition web page language settings
- List exports now present a configuration dialogue for which columns to include the export [#275]
Changed
- WebCopy will now prompt to continue if the Empty website folder before copy option is set and files are present in the destination
- The Sitemap Extension will now start from the base domain if the project URL is deep and the Crawl Above Root flag is set
- Updated mime-db to 1.44.0
- The GUI now displays a proper progress indicator and status information when remapping local files
- The CLI client now displays status information when remapping files
- The Origin Report option for new projects now defaults to Single File rather than Embedded
- WebCopy will now always send the
Accept-Language
header. If not defined at the project level, it will use the application level setting. If this is not provided, then the current OS culture information will be used - Documentation has had a good overhaul and is in the best state it has ever been in. All help links from option dialogue boxes point where they should, and missing documentation has been added
- Expanded default contentfilters.json used by the New Project Wizard to cover other common types
- The Accepted Content Types field has been moved from the Advanced category into a category of its own, expanded to use the same type of editor as for the web site language
Removed
- The Report Problem Site extension is no longer bundled with Setup
- Removed global statistics
Fixed
- WebCopy was treating any attribute value that started with
javascript
as unsupported - The sitemap tree could display duplicate URLs
- The sitemap tree would could incorrectly display children of pages that matched a standard document pattern
- Link Checker didn't follow internal redirects
- WebCopy could incorrectly parse the URL from an
@import
at-rule if the CSS was minified and another rule contained an emptycontent
declaration - The Project Diagnostics extension now ignores data URLs when performing length checks
- Cut, Copy and Paste commands didn't work for the filter fields in list views
- Fixed a crash that could occur when ordering the sitemap
- Reworked HEAD support detection to be more robust
- 401 challenges were only processed during HEAD requests
- Fixed a performance issue running XPath queries
- Per-URL origin reports could be overwritten if URLs differed only by extension
- The New Project Wizard no longer creates duplicate rules if content types are present in multiple pre-defined groups
- Fixed a crash that could occur when closing the options dialog after switching views
- Windows that save their position and size should no longer keep increasing in size each time the window is opened and a custom font is being used with a point size above 8
- Options dialogues are now slightly more usable when using custom fonts with a point size above 8
- The Quick Scan dialogue now correctly disables the Scan button when busy, preventing a crash trying to perform multiple scans
- Setting the URL in the main window now correctly defaults http if a scheme is not explicitly set, preventing a crash when using secondary actions such as trying to capture a form
- The New Project Wizard dialogue now ensures that user entered URLs have a default scheme applied if omitted by the user
- WebCopy could incorrectly parse blank
url
CSS functions - Fixed inconsistencies in when the Download All Resources option would be enabled or disabled
- Fixed a crash posting a blank form definition
Added
- Added additional options to proxy server configuration, allowing the use of system proxies and user defined bypass lists
- The
poster
attribute ofvideo
elements is now detected
Fixed
- Fixed a crash that could occur when verifying the initial path
- The Internet Explorer DOM provider failed to process some pages if an attribute request failed with an
DISP_E_TYPEMISMATCH
error - The Limit Crawl Depth, Limit Distance from Root URL, Maximum Files, Maximum File Size and Minimum File Size options would be processed if they had previously been set even if subsequently disabled
- Proxy settings dialog now does a better job of validating the address
- Fixed a crash when clicking help links in stand alone tools
- Fixed a crash which could occur when using the Select URI dialog
- Speculative fix for a crash when painting list views
- Speculative fix for a crash trying to capture a form
- Speculative fix for a crash setting the same folder
Added
- Added new setting for controlling the HTTP download buffer size
- Added new setting for controlling the size of the memory cache when downloading transient content
- The Rule Tester dialog now supports rules that use content types
- Added new rule options to allow download priorities to be set
- The Browse button in the Rule Editor now allows the selection or either URLs or Content Types depending on the rule setting
Changed
- Documentation improvements
- The Custom Headers editor is now consistent with other similar editors
- If a given URL is a redirect, the new location is given higher priority in the crawl queue than existing non-redirect entries
- Various graphics and glyphs have been replaced or made consistent with the other styles
Removed
- The rule option Do not allow children to inherit this rule has been removed
Fixed
- WebCopy should now correctly handle non-ASCII domain names
- HTML documents above the root would not be scanned if both the Download All Resources and Crawl Above Root settings were enabled
- Certificate checking always used the
HEAD
method, even if head checking was explicitly disabled for the project - Per-URI memory cache is now correctly cleared after fully processing a given URI
- All transient downloads now make use of a memory buffer if possible, instead of only during an analyse
- The Status column in URI lists no longer displays Skipped for redirects.
- WebCopy now uses buffer pools instead of constantly creating and destroying buffers
Added
- CSV export now allows which columns to include
- Cookies can now be set from the project properties dialog
- Double clicking a cookie from a list view will display the URL decoded value in an information window
- Cookie list context menu now includes a Copy Name/Value Pair option
- Links list now displays the relationship of entries to the project URL
- Added the ability to control the scan depth
- Content type inclusions/exclusions now support regular expressions
- Added the ability to specify how many items are displayed on the MRU list
- Added support for HTML5 character entities
- Added experimental support for crawling JavaScript enabled sites
- Added New Project wizard
- Rules now allow you to select which components the expression will be applied to
- Rules can now be applied to content types
Changed
- Setup now uses InnoSetup 6
- Cookies extension now uses the same cookies list as other parts of WebCopy
- The Feedback dialog no longer has the screenshot option set by default
- The Feedback dialog now remembers the email address, if provided
- The Feedback dialog now allows you to toggle which displays you want included in the screenshot
- Sitemaps are now always sorted alphabetically
Removed
- Due to a rewrite of how the site browser tree works, the Simplify Sitemap option no longer has meaning and has been removed
- The Sorted option has been removed from projects
Fixed
- Non-modal link property dialogues are now positioned more appropriately
FirstRun
setting was not being cleared after the initial startup- Creating or opening a project now closes any open property windows
- Cookies extension didn't always show all detected cookies
- Attempting to open a file that wasn't a WebCopy project would create a blank project, but would incorrectly assign it the filename of the non-project
- Fixed various serialization and clone omissions
- Site browser tree, sitemap extension and website diagram extension no longer builds an entire sitemap upfront
- Sitemap extension now correctly encodes text
- HTML entity encoding and decoding should now better handle surrogate pairs
- Site browser tree no longer renders any page with sub pages as a folder
- WebCopy wouldn't detect encoding correctly if it was specified only via the
<meta charset="">
element - WebCopy wouldn't write remapped links for URLs that led to a redirect
- Checks to see if a given URL was an ancestor of the root URL were not being skipped if the domains didn't match
- The sitemap tree view wouldn't display some URL's if the crawl above root option was set
- The Download All Resources option wasn't working the way it should
Changed
- URI properties dialog are now opened as non-modal windows if possible
Fixed
- Quick Scan dialog temporary restricts maximum number of displayed pages to 200, resolving a crash that occur on sites with thousands of detected pages
- Fixed an issue where WebCopy would always full download files above the copy root if the Download All Resources option was enabled
- Fixed an issue where WebCopy wouldn't correctly exclude entries above the root if the Download All Resources option was enabled
- Files over 2GB in size wouldn't be downloaded
- After 2147483647 files had been downloaded, no further downloads would occur
- Setup programs were only signed with SHA256, meaning Windows Vista couldn't read the signatures
- Setup tried to install .NET 4.6.2, causing an installation failure on Windows Vista which only supports 4.6.0
Added
- It is now possible to authenticate with a website using an embedded web browser prior to copying, allowing WebCopy to work with sites that have complex login procedures or multi-factor authentication [#333]
- When copying a website with an SSL certificate, if the Ignore certificate errors option is not set, WebCopy will now display a dialog asking what to do [#329]
- Added a new option to include the original extension when remapping files [#324]
- Added a new option to include the query string in local filenames [#267]
- Added new options for limiting downloads based on file size (minimum and maximum) and on the number of files downloaded
- Added Last Modified column to various URL list views
- Added URL browser dialogs to various selection fields
Changed
- Reinstated URI editor for URI Transforms
- URL browser dialogs now keep the original selection
Fixed
- HTML entities in attribute values were not decoded when scanning for links
- Local files no longer have their extensions changed if the URI extension doesn't match the first extension in the content type database, e.g.
.jpg
files no longer get renamed to.jpeg
- Sitemap generation now correctly ignores redirected URL's
- WebCopy could incorrectly abort a download with an insufficient disk space error if even free space was available
Added
- Added new
nocrashreport
switch to command line clients - Command line clients can now display solution information when reporting crashes [#201]
Fixed
- Partial output is no longer printed by CLI tools when using the
quiet
switch - Statistics are now printed when using the
statistics
switch even ifquiet
is also specified - All output is now correctly written to log files when the
log
switch is used, irrespective of thequiet
switch setting - Pressing Enter or Escape in the Capture Form dialog no longer closes the dialog if the embedded web browser has focus
Fixed
- CLI automatically assumed it was installed into a protected folder and refused to download files if the working directory was the application directory
- CLI couldn't read meta data from PDF's
- Exception reports no longer include the user name of the current user
- Exception reports no longer include the raw host name
- Clicking the Open in Browser button in the Test URL dialog crashed the application if no URI had been entered
- Crawl only exclusion rules we no longer working as expected (regression)
- Backup files had the wrong file extension (regression)
Added
- The Basic Authentication dialog now allows the prompting of future passwords to be disabled
- Preview functionality of the Test URI dialog now supports a subset of images
- Added proxy settings to Test URI dialog
- Added new options for specifying custom headers [#219]
- The Test URI dialog now allows the configuration of content encoding, custom headers and URI transforms [#296]
- Added stand-alone version of the test URI tool
- The Capture Form dialog will now try and find the best match if multiple forms are detected on a page [#230]
Changed
- The layout of the Test URI dialog has been reworked [#296]
- Setup has a new option to determine if icons should be created for stand-alone tools
- Setup has a new option to determine if experimental 64bit versions of tools should be installed
- Minor improvements to External Tools dialog
- Minor start-up improvements
- The option to save headers with the project file is now enabled by default for new projects
- Some context menu items which disappeared from virtualised lists have now been re-instated
- If the character set for a HTML document isn't explicitly specified, WebCopy will now try and autodetect an appropriate value [#303]
Removed
- Removed the Content tab from the Link Properties dialog
- Removed the unused Modified URI field from the Link Properties dialog
- Removed Find more user agents online link from the Edit User Agents dialog
- Removed the Allow Editing checkbox from the Link Properties dialog
- Removed the Disable Updates flag from link information
Fixed
- The Basic Authentication dialog truncated long realm text [#312]
- WebCopy no longer tries to unpack custom settings belonging to unloaded extensions [#278]
- WebCopy no longer stores downloaded content against the link information when a 400 or 500 series response is returned
- Output editors in the Test URI dialog now honour the Fixed Font setting
- Project files were no longer being compressed when saved
- Fixed a crash that could occur when running the Empty Meta Data report
- Backup files were not being created when saving projects
- Default external tools configurations were not added when starting WebCopy for the first time
- Some files were still download even if they had been excluded via a rule (regression from 1.4)
- Editing a local file using the build in text editor always used UTF-8 and would corrupt files using a different encoding
- The default user agent was using the file version of the WebCopy client instead of the product version
- The Quick Scan window is now resizeable and remembers its position
- Corrected some settings that weren't being cached
Added
- Added a new diagnosis extension to help investigate certain project errors which are not reproducible in current test data
- Added new exclusion options to more finely control the remap extension mode
- The
Content-Disposition
header is now supported and if set will help define the local filename
Changed
- Tabbed or tree based option/property dialogs now include a search field
- Split the Copy options page into Folder and Local Files pages
Fixed
- Uninstall should no longer prompt for feedback when running Setup to upgrade an existing installation
- Fixed an issue where the Download all resources setting was switched off when opening the options dialog (regression)
- Fixed a crash which could occur when clicking the Test URI button in a form editor for a project with no base URI set
- Speculative fix for a crash which could occur when deleting an empty rule or form
- Speculative fix for a crash which could occur when displaying the Select URI dialog
- Setup was installing the Problem Site Report extension into the wrong folder, overwriting the RSS extension manifest
- Strings over 32767 bytes in size are now supported in WebCopy projects
- Pressing Enter in multi-line edit fields in the Inclusions / Exclusions option page closed the dialog
- Fixed a number of cases where modifying a collection might not mark the project as changed
- Fixed a crash that could occur if WebCopy couldn't get an encoding [#304]
Added
- WebCopy now includes a database of all registered mime types, instead of relying solely on what is registered on the local computer [#271]
- Added a new remap mode setting, all except arbitrary binary data, which will correctly remap content types to the appropriate extension regardless of the source URL, unless the content type is
application/octet-stream
[#182] - Rule, Password and Domain Alias editors now highlight items containing invalid expressions
Changed
- The default remap mode for all new projects is now all except arbitrary binary data (previously it was only if no extension present, probably the worse default possible) [#182]
- The Select Mime Types dialog can now load in all registered types rather than only what was detected in a given website
- The Capture Form dialog has had form detection re-wrote so that it now reflects the current state of the web browser, allowing you to enter details into the form and have those captured
- The Capture Form dialog now automatically selects non-hidden fields for inclusion in the generated form definition
- URI Transforms can now be disabled without having to remove them completely
- URI Transforms didn't work correctly if the URI field was specified
- Temporarily disabled the JavaScript detected warning as it appears to be causing more confusion with users than it resolves
- Added new controls to the main window for ease of use
- Documentation updates
Fixed
- The Capture Form dialog only detected
input
elements within forms. Now correctly detectsinput
,output
,select
,textarea
,object
andbutton
elements, while excludingreset
,button
andimage
input types [#279, #285] - The Capture Form dialog always listed (and processed) form parameters in reverse order [#284]
- The Capture Form tool didn't generate form definitions correctly for forms embedded in an
iframe
- The URI Transforms and Domain Aliases editors now behave the same as other list editors
- The Test URL dialog didn't merge form parameters property unless the Merge from field was full specified in addition to the core Uri field
- Fixed a crash that could occur when posting forms and a merged parameter was
null
- If header checking is enabled but the request doesn't support the
HEAD
method (via the405
response code), the source URI will be downloaded normally and header checking disabled for that host [#194] - Fixed a crash which could occur when deleting rules or forms from their respective lists [#289]
- Right clicking blank areas of URI, rule or form lists now displays the context menu [#276]
- Fixed a crash which could occur when right clicking an empty URI list
- Fixed an issue where CSS remapping could crash due to a blank source value [#269]
- Fixed an issue where WebCopy could hang with a certain combination of HTML and XPath [#291]
- Fixed the layout of the main application window slightly overlapping the status bar (regression)
- Fixed an issue where it was possible that text tokens weren't replaced
- Viewing a diagram could display an error if WebCopy had been started with command line arguments
- Link origin wasn't persisted
- Clearing the link map didn't clear the sitemap tree [#257]
- Setup would display an error stating Unknown custom message name "lcid" if an appropriate version of .NET Framework was not installed and was required to be downloaded by Setup
Fixed
- The improved Quick Scan dialog crashed if the Visual Link Map extension wasn't installed
- If the URI to crawl redirected to an external URI, no feedback would be provided to the user regarding the redirect and the crawl would just appear to halt with an empty response
- A crash no longer occurs if a website returns a
Content-Encoding
header that either isn't a standard value or one that is not supported by WebCopy - List views now remove line breaks from displayed content
This version of WebCopy changes how rules are executed. Existing rules using the Do not allow children to inherit this rule flag may not function correctly, this flag will be removed in subsequent update.
Added
- The Quick Scan dialog has had a major overhaul to make it usable, if not useful. While currently a work in progress, it now offers the following features [#261]
- The ability to set a limit on pages per domain during the quick scan
- A diagram of the scan results is displayed in the dialog using colour coding to show which URI's will be included in a copy, and which will not
- You can change how you expect the website to be crawled and it will automatically update the diagram to reflect the new setting
- Using the diagram you can exclude URI's from being crawled, or add excluded domains to be crawled
- Confirming the dialog no longer resets many settings in your project back to defaults
- A Rule Checker tool has been added, which takes a given URI and passes it through all rules, allowing you to see which rules are matched and which aren't
- Added new Stop processing more rules flag. This flag is automatically applied to projects created using older versions of WebCopy
- List filters previously removed as part of [#65] have now been reinstated
- List filters now support empty / not empty options
Changed
- The Rules, Forms and Password list editors now share a common base and are now consistent in how to add and edit items
- You can now re-order rules, forms and passwords in their respective editors by dragging items in the list
- Rules no longer stop executing after the first match is found, but continue through all rules, allowing for more complex scenarios
- Rule lists are no longer sorted by default making it easy to see the execution order
- When calling the CLI to download a single file and the
/o
argument points to an existing directory, the CLI will generate a filename based on the URI to download [#250] - When trying to copy a website, custom expressions are now validated and the copy will not commence if any are invalid
- The Crawl Content rule flag can now be set independently of the Exclude flag. This finally allows you to create a copy job that will scan an entire website, but only keep files such as images
- Documentation updates
Deprecated
- The Reverse and Do not allow children to inherit this rule rule flags are deprecated and will be removed in a future version of WebCopy
Fixed
- Redirects with a relative
Location
header could be incorrectly combined into absolute URI's - Empty analytics sessions are no longer transmitted
- Failure to obtain shell icons should no longer crash the application
- Loading a diagram didn't update UI state correctly
- Changing some diagram properties didn't cause the diagram to be updated
- URI's which had a blank
charset
attribute in theContent-Type
header weren't processed properly - Fixed a crash which could occur using the CLI trying to open a file that wasn't a WebCopy project [#253]
- Reordering rules and forms didn't reflect properly in the user interface
- Application no longer crashes if there is an issue exporting or copying large diagram images [#262]
- CLI will no longer attempt to download if the output folder is protected [#249]
- When using custom xpath expressions, multiple expressions would be incorrectly created if the same attribute was listed multiple times
- Several more list views have been virtualized [#64, #65]
- The rule editor no longer tries to convert patterns into URI's
- Cloning a WebCopy project skipped numerous values
- The keep alive setting wasn't persisted correctly
- Fixed an issue where the Quick Scan dialog could crash with a duplicate key error [#251]
- Failure to generate the website diagram is no longer fatal [#247]
- Website diagrams are now generated directly from link information, rather than building a sitemap and generating from that - this should reduce memory requirements of creating the diagram [#247]
- Form and rule lists should now correctly update if their respective contents change
- The main results list view is now virtual which should resolve all memory issues relating to working with URI lists [#64, #65]
- The Capture Form tool will no longer crash if there is a problem creating the embedded browser
- Pressing Escape in the Capture Form tool no longer closes the window
Added
- Added new advanced options to configure which security protocols are supported [0000228]
Changed
- Images now open in the registered application for the file type
- WebCopy will now always try and find an encoding defined in HTML content before falling back to the
Content-Type
header
Removed
- URI lists virtualized as part of [0000064] temporarily no longer support filtering
Fixed
- WebCopy could no longer access websites that only supported newer versions of TLS [0000228]
- A cross thread exception crash could occur when accessing the Quick Scan dialog
- A crash could occur if WebCopy had difficulties processing an URI
- Attempting to access help from the project properties dialog had no effect [0000229]
- A crash could occur when clicking slices in the Website Size chart [0000208]
- Unexpected errors processing URI's will no longer abort the entire crawl
- Massive memory and performance improvements for some URI lists [0000064]
- URL parser didn't correctly handle URI's which included two
:
characters at any point before the/
or\
characters - Known values weren't picked up from
meta
tags if the contents of thename
attribute weren't using lower case - Sometimes the Capture Form tool crashed when trying to navigate to a new URL
- A third party library could cause the entire application to crash with certain HTML
- The crawl mode was reset from Sibling to Sub when re-opening the project properties dialog
- Various performance improvements
Added
- Added a new optional extension for providing feedback/smiles/frowns or support requests from within the application
- Error diagnosis dialogs now include a reference to the original report
- The Test URI dialog now includes a new tab which lists all links that were detected on the source page
Fixed
- Errors loading cached RSS feed resulted in the RSS extension from not functioning [0000204]
- Fixed potential exit crash when updating statistics [0000175]
- Fixed a potential issue where the last character in a directory path could be removed [0000213]
- Fixed a crash which could occur when setting file timestamps [000216]
- Fixed a crash that could occur changing the URI of a project containg forms, and the form URI hadn't been set
- Fixed a crash that could occur when right clicking some list views
- CLI tool didn't handle invalid command line arguments as well as it could [0000223]
- The Website Size dialog could crash with a divide by zero exception [0000222]
- The WebCopy CLI now creates missing output directories when requesting to download a single URL into a file [0000224]
Added
- The Origin Report setting now has a new option to embed the original URL as a comment in the body of the HTML
Changed
- The Download All Resources option is now automatically set for new projects [0000193]
- The Directory Character option is now automatically set to
/
for new projects - The Update Local Timestamps is now automatically set for new projects
Fixed
- A crash no longer occurs opening the Options dialog and the languages folder doesn't exist [0000187]
- A crash no longer occurs opening the Options dialog and duplicate languages are present [0000186]
- A crash could occur when loading the sitemap and shell icons were enabled [0000190]
- Fixed a number of issues that could occur after opening or saving a project and the MRU was updated (0000161, 0000176, 0000177)
- WebCopy no longer aborts the crawl after trying to download a URL with the same name as a file system reserved word [0000195]
- WebCopy wasn't detecting flash movies in an
object
tag [0000173] - Data URI's with padded data weren't processed correctly [0000197]
- Fixed trying to run the CLI and specifying a project file that did not exist [0000200]
- Added speculative fix for a crash generating sitemaps - this is a common issue yet we've been completely unable to reproduce it in any of our test scripts or saved projects. If anyone can supply information on how to trigger this crash it would be gratefully received! [0000160]
- Fixed a regression where it was possible for redirects to get stuck in an infinite loop
- Minor improvements to URI exception reporting
- RSS entries would duplicate themselves depending on if the feed was accessed via HTTP or HTTPS. Note that a side effect of this fix will result all entries being marked as unread
- Fixed a possible crash that could occur when trying to load a themed font [0000203]
- Fixed a crash that could occur if a rule had a empty pattern [0000188]
- WebCopy will now try to remove
base
tags after completing a crawl [0000191]
Added
- Added a new extension which allows users to easily submit websites that WebCopy isn't copying correctly [0000015]
- WebCopy now warns if JavaScript is detected as being in use by the website being copied [0000066]
- WebCopy now reports links it detected but couldn't process [0000031]
- Added open output folder action to Crawl Complete dialog
Changed
- Due to some curious feedback, the checks to validate digital signatures on WebCopy binaries have been reinstated
- Sitemap ordering has been changed to a simple sort order, as the natural sort took an extremely long time to run on large websites, with little benefit for the performance hit [0000004, 0000150]
- External status is now stored with a link entry instead of being calculated each time it is requested [0000004]
Fixed
- Fixed an issue where WebCopy wouldn't display content properly (for example in the Test URI dialog) if the web server returned compressed content regardless of the value of the request's
Accept-Encoding
header - Fixed an issue where pages weren't processed correctly (e.g. corrupt titles in the UI) if the page wasn't using UTF8, didn't specify a
charset
in theContent-Type
header but did specify the correct content type via a http-equiv meta tag in the document HTML [0000144] - Fixed an issue where remapped files were always read or written as UTF8 regardless of the original encoding [0000144]
- The Test URI dialog will now automatically try and add
http://
if the user just types/pastes a schema-less value - Fixed an issue where CSV export could fail
- Creating an automatic rule for an external URI now creates a valid rule [0000026]
- The sitemap treeview shouldn't reload itself quite as often as it previously did [0000003, 0000004]
- WebCopy now correctly processes URI's above the crawl root if the Download all resources option is set [0000154]
Added
- Added a new Keep Alive setting. Setting this to false can help prevent the "The server committed a protocol violation. Section=ResponseStatusLine" crawl failure [0000002]
- Added a new Prefix Mode setting. This setting allows you to force URI's to either have or remove the www prefix, useful for avoiding duplicated files when copying a website which uses a mix of prefixed and non-prefixed URI's
- Added the ability to replace sections of a URI when crawling documents
- Added a new report to view non-HTTP links
- (Experimental) Added new Extract Data URIs setting. Enabling this option will extract inlined images using the
data:
protocol into separate files.
Changed
- Setup should now automatically uninstall previous versions
- Numerous changes to how plugins are discovered, loaded and configured. Due to no longer storing plugin details in the Windows Registry, this will cause any disabled plugins to be re-enabled
- WebCopy will now correctly report non-HTTP links such as
mailto:
orftp:
as skipped rather than silently ignoring them - Internal engine changes [0000062]
Removed
- The project scan and repair tool is no longer included in setup
Fixed
- WebCopy could incorrectly exclude some URL's believing them to be
mailto:
links - Fixed several occurrences when a crash could occur when invalid path characters were present in URL segments
- Some HTML tags appeared as "Unknown" in list views
- URI's would be incorrectly combined if the relative URI was just query string and the source URI already had a query string
- Download percentages were calculated incorrectly
- Fixed a crash that could occur after copying a website if the Update local time stamps option was set
- Report viewer didn't show external URI's
- Fixed case-sensitivity issues in some built-in reports
- Fixed a crash that could occur if non-NBT files were present in report folders with the rpt extension
- Fixed a startup crash if the addins folder didn't exist [0000112]
- Fixed a crash that could occur when trying to calculate the depth of a URI [0000116]
- Fixed a crash that occurred if a project with a blank URI was opened and the user then attempted to browse to the blank URI [0000120]
- Fixed an issue where Setup sometimes wouldn't replace files
Added
- The original Include Subdomains option has been replaced with a new set of more comprehensive options, allowing for copying of sibling domains, linked resources, or everything
- Added a new Download all resources option. When set, WebCopy will download any non-HTML linked resource, regardless of source domain. If the file would normally be excluded, it will only be downloaded, not crawled
- Although the UI editor for additional hosts stated regular expressions could be used, this was never implemented. Regular expressions can now be used with additional hosts
- Added a new Sitemap plugin that will generate a simple HTML sitemap of all downloaded files
- Added a new Cookie Viewer plugin that allows a global view of cookies created during a crawl
- Added native support for the
picture
element - The Capture Form window now remembers it's size and position
- Added an address bar to the Capture Form tool to allow access to hidden login URI's, or for any other type of manual navigation
- Added a Scan button to the Capture Form tool allowing a manual scan for forms in the page if WebCopy failed to detect them initially
- The maximum redirect chain length setting can now be configured from the Advanced options group
- Added a new option to control if external redirects should also be followed
- Added support for the
brotli
compression algorithm - Added a new option to control if the results list should automatically scroll to show the active item while performing a website scan
Changed
- Now requires Microsoft .NET 4.6
- When following redirects after posting form data, all built in skip rules are ignored, so if a post to one site directs to another to complete the post, WebCopy will now always follow the redirect
- The Create Desktop Icon option in setup is no longer checked by default
- The Test URL dialog now uses the proxy settings of the currently open project
- The Website Links dialog has been slightly redesigned to prevent a crash when working with projects containing many thousands of links
- Information dialogs accessed from list views now display the selected context in an easier to read format than plain CSV
- The default maximum redirect chain length has been increased from 5 to 25
- HTTP Compression options have been removed from the Advanced options group into their own dedicated group
- Options for processing redirects have been removed from the Advanced options group into their own dedicated group
- Minor performance improvements
- Minor optimizations to reduce memory load
Removed
- Removed the setup option for creating a Quick Launch icon
- Removed support for opening legacy XML based WebCopy projects
- Due to offline help always being outdated due to the general weakness of the product manuals, the offline help files are no longer included in the setup, and requesting help will always display the online version
Fixed
- Problems loading or saving the user agent store should no longer be fatal
- Some dialogs only supported local help requests and were unable to show help if the local help file was not available
- Fixed a crash that occurred downloading a file if the
Content-Encoding
header of the HTTP response was set toidentity
- Fixed a memory leak in the sitemap component
- A number of HTML 5 specific tags were listed as Unknown in crawl result list views
- The UI now correctly reports if part of a crawl was aborted due to too many redirects
- Crawling the same project multiple times in succession reused the cookies from the preview crawl
- Trying to call
wcopy.exe
with just the file name of an existing WebCopy project always displayed a message about unsupported protocols and refused to continue - Fixed an unauthorized access crash that could occur when using the Capture Form tool (regression)
- Fixed an issue where external links could appear in some lists even when filtering options were set to exclude them
- Fixed a number of issues which prevented automatically logging into websites where the
post
URI was different to theget
URI and value merging was required, or the post returned302
and the new location must be read to complete the login - Fixed an issue where occasionally the Capture Form tool didn't refresh available forms after navigating to a page
- Fixed an issue where refreshing a page in the Capture Form tool didn't
- The CLI tool no longer incorrectly reports failures to download a single file as an application exception
- The CLI tool now correctly outputs the reason why a given URI failed when performing recursive downloading
- The Capture Form tool now correctly detects forms that are contained within frames
- Export to CSV option featured when context clicking some list views didn't correctly escape the CSV
- Fixed an issue where the entire application was terminated if CSV export failed
- Fixed an issue where URI's that were both invalid and very long could crash WebCopy
- Fixed a crash that could occur when attempting to remap a CSS file
- Fixed an issue where URI segments containing spaces (or other encoded characters) weren't correctly decoded when generating local folder names
Fixed
- Fixed an issue where downloaded files would ignore the save folder and start from the root directory if the URL was malformed and included a double slash after the domain, e.g.
http://example.com//image1.png
- Fixed a crash that would occur when trying to process a
data:
URI greater than65519
characters - The Capture Form tool was incorrectly using the
id
attribute of form elements instead of thename
- Setup was incorrectly downloading .NET Framework 4.5.2 setup if .NET Framework 4.6 was installed
- Speculative fix for loading date times from project files
- Speculative fix for odd crashes when opening the Capture Form dialog
Added
- Double clicking an entry in the Cookies list view of the Test URL dialog now displays the details of the selected item
Fixed
- WebCopy wasn't scanning the contents of
style
elements correctly @import
CSS rules were not being remapped if they did not useurl()
notation- Fixed a crash which could occur when a request made via the Test URL dialog failed, and no response was available
- Fixed an issue where the Capture Form dialog sometimes did not list forms for a page when it should have
Added
- Added support for the
srcset
attribute - You can now specify custom attributes to include in link scanning
- When logging an exception, diagnosis actions are such as new version downloads or links to workarounds are now displayed, if applicable
- Now supports finding links via the
300
"Multiple Choices" HTTP status code - Slight improvements to scan performance
Fixed
- Fixed a crash that occurred if you entered an invalid path into the Save Folder field then attempted to copy a website
- Fixed a problem where projects using a sub path and the Crawl above root URI option could save duplicate URI's into the project, causing a crash when attempting to reload the project
- Fixed a issue where sitemaps belonging to projects using a sub path and the Crawl above root URI option were corrupt
- When changing settings via the main Options dialog, some settings would not be applied as the old versions were cached
- Fixed a start up crash that occurred if the
externaltools.xml
file was present, but invalid - The XPath expression for
<meta http-equiv='refresh'
support wasn't strict enough and was picking up more elements that it should - HTML attribute scan rules that used regular expressions to transform only part of value of the attribute were incorrectly merged the transformed value
- The link checker tools would not report URI's that weren't found if the URI was also external
- The samples default tool link was incorrect
- Demo project corrections
Added
- Added a new option to control whether or not new pre-release (beta) versions are included in update checks
- 64bit versions of WebCopy (GUI / CLI) and Link Checker (GUI / CLI) are now available
- You can now choose to display all errors, or only errors detected during the current scan in the Errors tab
- Activating list items in the different result tabs now opens the appropriate properties dialog
- Added useragent, prehead and no-prehead command line options to
wcopy.exe
- Uses alpha version of new exception logging library
Removed
- Disabled glass effects unless using Windows Vista or Windows 7
Fixed
- Build was deploying the .NET 3.5 version of Luminitix
- If posting a form failed, the copy was automatically cancelled, but the reason why the post failed was not available
- Pressing enter in the sitemap tree view could cause the link properties dialog to be displayed twice
- The Link Checker GUI / CLI clients and the WebCopy CLI client no longer require the source URI to be qualified with the scheme, and will automatically add
http
if no scheme is present - CLI tools now correctly report errors
- Default user agents of CLI tools were malformed
- In certain circumstances, command line arguments would not be parsed correctly
- 401 challenge dialogs were not displaying correctly, instead a "Cross-thread operation not valid" message would be displayed in the log
Fixed
- Fixed a crash that occurred using the External Tools dialog the Environment Variables sub menu was clicked
- Fixed an issue where token menus (for example those in the External Tools dialog) containing environment variables could be excessively wide
- Fixed an issue where the exit code of the CLI tools could be incorrect
- If a view crashes when updating, it is now disabled for the remainder of the session without crashing the entire application
Added
- Added a new command line version of WebCopy, allowing you to download single files or entire websites via the command line - perfect for use in scripting and maintenance tasks!
- Added a new Dead Link Checker tool, which you can use to scan a website and detect dead links. This tool is available as both a GUI client and a command line interface
- Experimental When analysing a site, WebCopy will now attempt to keep content in memory where possible, and only write it to disk if the content is above the default capacity
- CSV exports of link maps now include an integer column for the HTTP status in addition to the textual description
- Added a new option to disable the automatic URI remapping of downloaded files
Changed
- WebCopy now requires Microsoft .NET Framework 4.5
- The Errors tab no longer lists redirects, instead you can use the Redirects report
Removed
- Windows XP is no longer supported
- Removed the prefix with the website name / prefix with the website url option. This setting was confusing, served no real purpose and the default value was wrong
Fixed
- Fixed a crash that could occur when loading a WebCopy project if the link map included a link to a resource without a default document, and a link to the same resource with the default document
- WebCopy was processing some failed URI's despite the fact they had failed (regression from previous version)
- WebCopy wasn't processing some response headers correctly if they weren't cased as expected (regression from previous version)
- WebCopy no longer remaps links in local content that has not changed during the current session
- Fixed an issue where WebCopy would send the
if-modified-since
oretag
headers even though the local content was no longer available - Fixed a rare crash that could occur when remapping document URI's at the end of a download
- Fixed a number of issues with
<base>
tag processing - Fixed an issue where WebCopy could incorrectly loose custom port information when combining two URI's in certain circumstances
- WebCopy was incorrectly shortening file names with multiple periods, ie
jquery.min.js
tojquery.js
(regression from previous version) - Sometimes WebCopy would try to map a document URI to a file name that was actually a directory, causing a crash
- URI's path segments which contain illegal characters are now sanitized when converting them into file paths
- In-line CSS is now correctly crawled
- Crawling will no longer follow redirect chains beyond the 5th consecutive redirect
- Fixed an issue where meta data could be read incorrectly based on encoding type
- Problems that occur reading meta data for a downloaded file no longer block the crawl with a modal error dialog, instead the error is presented in-line at the end of the crawl the same as other errors
- Time stamps displayed on completion dialogs are no longer displayed in UTC
- Some columns in the results list view were not updated correctly unless the action was successful
- Fixed several occurrences where link information wasn't being updated correctly
Fixed
- The Quick Scan dialog can now have the scan cancelled
- Fixed an issue where progress percentages for file downloads using gzip or deflate content encoding would be incorrect (regression from 1.0.9.1)
- Fixed a crash that occurred in the Test URI dialog when using the
POST
verb on a page without any forms (regression from 1.0.9.1) - Fixed a crash that could occur when using non-HTTP/S URI's from the Test URI dialog
- Fixed a crash that could occur when using the Test URI dialog and the current request failed (regression from 1.0.9.1)
- Fixed a threading crash that could sometimes occur when trying to access the Quick Scan dialog
- Fixed a crash that could sometimes occur when closing the Quick Scan dialog while a scan was in progress
Added
- Reinstated digital signatures
- When posting a form, existing values will be automatically merged with the user defined custom values
- Added a new tool for capturing a form, making it much easier to extract the basic tokens for posting a form
- Cookies are now supported by the Test URL dialog when making multiple requests from the same domain, including their own tab for viewing
- All standard HTTP verbs are now supported by the Test URL dialog
Changed
- The Test URL dialog has been split in two, so that the result content is always visible
- The Rule Editor, Form Editor and Test URL dialogs are now all resizeable
Fixed
- Fixed an issue where some form values would not be encoded correctly
- GZip and deflate compressed data is now decompressed during the download, rather than after the entire content has been download
- The HTML view in the Test URL dialog now correctly updates each time a new request is made
- WebCopy would often given file names a numeric suffix even if there was no reason to
- If WebCopy tried to shrink a file name to fit within path limits, it incorrectly started by trimming the extension, instead of the name
- WebCopy failed to shrink file names where the base path was above 248 characters and promptly crashed
- Some files were missing from the setup that prevented exception reports from being submitted (regression from previous version)
- Fixed a duplicated shortcut between Rules and Test URI
- Exiting WebCopy while the RSS extension was updating caused a crash
- Fixed an issue where files could be loaded with the wrong encoding when remapping documents, causing subtle corruption with the final output
- The Scan Project repair tool crashed on start up (regression from previous version)
- Opening a project always marked it as changed, causing the UI to prompt to save changes unnecessarily
Changes and new features
- Temporarily removed digital signatures, these will be reinstated shortly
- Added Windows 10 to application manifests
- Added Requests Per Minute limit mode
- Added a new Enforce Limit Checks option. When set, limit requests will be enforced for all URI's that involved a HTTP request. If not set (default) limit requests will be enforced only for URI's that were successfully processed
Fixed
- Fixed an issue where a WebCopy project could become corrupt
- Limit checks are no longer applied to URI's that were skipped due to being external or by a rule
- Changing the window font is now correctly applied to the main window when the settings are applied, rather than requiring the application to be restarted
- Fixed a crash that could occur when attempting to obtain the display string for an enum value
- Fixed a crash that could occur if there was a connection error when trying to post a form
- Fixed an issue where the RSS feed wouldn't update when the Update Now option was used, unless a daily update was already pending
- Fixed a crash that could occur displaying the rules editor
Changes and new features
- Deprecated The prefix with the website url / prefix with the website domain name option of a crawl project has been deprecated and will be removed in a future update.
- Experimental Added the ability to specify additional hosts. This allows you to include multiple domains per project, for example a CDN
- Experimental Added proxy server support
- Activating an item in either the Request Headers or Response Headers tabs of the Test URI dialog now displays the header information in a dialog for easy viewing/copying
- The contents of the Select Mime Types dialog are now sorted
- Items in the Title Replacements and Forms editors can now be reordered via drag and drop
- Added a helper tool for backing up and restoring settings, or for resetting settings to default values
- Added a stand-alone update check tool
Fixed
- The Status Code column in the Results list is now no longer cleared when an action is performed that didn't involve an HTTP request, such as remapping the local file
- The value of the Play Sounds setting wasn't being honoured by the Crawl Complete dialog
- The prefix with the website url / prefix with the website domain name option of a crawl project now defaults to prefix with the website domain name for new projects
- Pressing enter in the Post Values field of the Test URI dialog no longer activates the default button on the dialog
- Fixed an issue where only the end of a host was inspected when checking if a given URI was a sub domain of another. For example, it would incorrectly return that
static.oneexample.com
was a subdomain ofexample.com
- Fixed an issue it was possible WebCopy wouldn't prompt to save changes when exiting
- An error is no longer displayed if you open a project saved using a newer version of WebCopy. The project will now be opened where possible, but a warning will now be displayed
- Repeatedly clicking column headers in sortable lists now correctly cycles between Ascending, Descending and None, instead of only Ascending and Descending.
- Fixed a problem where clicking the Add button in the Form Editor would clone the active form, including the internal ID of the form which should be unique, leading to crashes
- Fixed an issue where settings were both loaded and saved using thread specific culture data, which could cause a crash if the computer culture information was subsequently changed. All settings are now saved and load using an invariant culture.
- A crash no longer occurs if font information cannot be read correctly from stored settings
Changes and new features
- Simplified the highlighting and displaying of matches in the Regular Expression editor
- Added support for ETag's and the
If-None-Match
header when reading headers to determine if a resource should be downloaded - The duration of each URI crawled is now recorded
- The duration of the entire crawl is now recorded
- Added the ability to limit crawling to a number of requests per second. Options to configure crawl limits can be found beneath the Advanced node in the Project Properties dialog
- Added Slow Pages report
- Reports are now loaded from disk (and on demand) instead of being pre-defined; user reports are now supported
- Setup now allows you to customize which components are installed
- Added product RSS notifications add-in
Fixed
- The Regular Expression editor now correctly displays line breaks in the Replace tab
- WebCopy now tries to be more intelligent when generating paths and file names for local files reach maximum path lengths, by shortening file names and sub folders to try and make files fit. This should reduce occurrences of
PathTooLongException
exceptions being raised - CSS
@imports
directives are now correctly processed if theurl
keyword was missing. Previously only@import url('<file>');
would work, now@import '<file>';
also works - Fixed an issue where the update check could cause the main window to be unresponsive
- The Quick Scan dialog no longer crawls either sub-domains or above the root URI
- Fixes an issue where the Quick Scan dialog wasn't cleaning up correctly when closed
- A URI that returns an error status code is no longer flagged as "skipped" with the reason "Invalid Content Type".
- If a crawl was cancelled due to a HTTP status code, the results list no longer flags any such URI as "skipped", but retains the original, correct, value
- After editing a rule or form, the enabled state of the item could no longer be correctly toggled via the check boxes in their respective lists
- In certain circumstances, creating a backup of a file could take a substantial amount of time
- Fixed an occasional
The path is not of a legal form
exception when using the External Tools dialog - Fixed an issue where colour settings were sometimes not loaded correctly
- Fixed an issue where font settings were sometimes not saved correctly
- Font sizes are now displayed as whole numbers
- Fixed an occasional crash resizing the application window with a collapsed panel
- Corrected baseline positioning of editors and labels in dynamic user interfaces
Changes and new features
- Panels in Option dialogs now load on demand
- Option pages are now only initialized when requested by the appropriate dialog
- Removed status code 520 (origin error) from the list of supported codes for automated error reporting during a crawl
- The Image Viewer window no longer defaults to Fit when displaying an image, but now defaults to Actual size
- Added additional themes for configuring the appearance of the GUI client window
Fixed
- Fixed an issue cloning
LinkInfo
objects which hopefully is responsible for a rare Cannot access a disposed object crash using the Quick Scan dialog - Fixed an issue where the sitemap tree view could be populated up to 3 times rather than the expected once when opening a project
- Dynamic options in the Options dialog are now positioned more sensibly in relation to the options label and editor, and other options in the same group
- Fixed a problem where tool tips did not display under certain conditions, or could display the wrong (or blank) text
- Extension mapping for dropped files was case sensitive
- Reworked tool bar layout code to prevent overflowed buttons
- Removed a number of integration hacks
Changes and new features
- Token menus now include environment variables
- Setup now offers to install the Microsoft .NET Framework 3.5 if not already present
Fixed
- The maximized or minimized state of a window was no longer being restored when reopening the window
- The Find and Replace dialogs in text editing windows now correctly default to the selected text as appropriate
- The token button displayed when prompting the user for arguments for external tool execution now displays a menu with available tokens.
- Attempting to open a folder who's full path contains a period no longer displays an Invalid Path message.
- When restoring window position and size, the restored bounds are automatically recalculated to fit the monitor, for example when using via Remote Desktop with a smaller display resolution, or the removal/repositioning of a monitor in a multi-monitor set up.
- The main application window could no longer be sized smaller than its original startup size.
- Fixed a problem introduced in the last update which caused the crash reporter to no longer submit crash reports
Fixed
- Fixes a crash introduced in 1.0.7.0 when running rules if the URI being processed was shorter than the base project URI
- Fixed a crash that would occur when clearing the link map and the project did not have a valid URI
- Fixed an issue introduced in 1.0.7.0 when deleting items with the popup Rules and Forms editor, where the either the wrong item would be removed visually, or the software would crash.
- Fixed an occasional crash introduced in 1.0.7.0 moving items with the popup Rules and Forms editor. Note that the moving of rules and forms has no functional use and will be removed in a future version of the product. Also note that moving is performed on the underlying collection, not the visual display sort
- Fixed a problem with the Rules, Forms and Password editors where it was often quite difficult to add new items via the popup editors as they kept trying to update a previous selection
- Fixed an occasional crash after using the Quick Scan dialog
Fixed
- Added manifest so that when running under Windows 8.1 / Server 2012 R2 the OS version is correctly reported.
Fixed
- Speculative fix for a Parameter is invalid exception that randomly occurs when painting windows
- Fixed a crash that occurred when modifying a rule, and the project URI had been cleared or was invalid
- Fixed a crash that could occur when inserting a new rule or a new form
- Fixed a crash that could occur when attempting to process root level URI's in the sitemap
- Fixed the status bar not updating correctly during a crawl action
Changes and new features
- Experimental: Added a new option to simplify the sitemap treeview. When this option is set, folder containers are no longer displayed if the folder only has a single page
- Experimental: Modifying a rule now reapplies rules to the sitemap, allowing easier sitemap manipulation without having to rescan the site. Note this feature only works on the current contents of the link map, if the linkmap is incomplete due to existing rules a rescan will be required regardless.
- Sorting of the sitemap now uses natural sorting, so names appear in a logical order, e.g.
1
,2
,10
rather than1
,10
,2
- The Rule and Forms lists now default to sorted
- List views that support sorted columns now use natural sorting
- When building a sitemap, folders are no longer generated for URI's that match except for differing query strings
- New API to allow plugin authors to add additional functionality to application windows when they are created
- Sitemap treeview now displays URI's relative to the base URI
- Rules that do not use the Use Full Uri flag now also strip out the leading path of the base URI. For example, if the base URI of the project is
http://demo.cyotek.com/staticwebsite/
and the current URI being crawled ishttp://demo.cyotek.com/staticwebsite/blog/page1.html
, the text used by the rule engine will be/blog/page1.html
- The Differences tab now lists all URI's which are new to the last scan, in addition to existing checks of modification dates. Due to the introduction of this setting, all URI's will be marked as new for existing projects, until that project is rescanned and saved.
- Removed the Use Modified Uri rule flag
Fixed
- Fixed a problem where clicking OK on the Edit Rule dialog saved changes even if there was a validation error and the user subsequently clicked Cancel
- Fixed a problem where the Quick Scan dialog failed without finding any URL's if the Inclusion / Exclusions options were set
- Fixed a problem where page titles and descriptions containing HTML entities were not decoded
- Fixed a problem where the sitemap could include URI's containing query strings, even if the strip query string segments option was set
- Disabled Glass effects on dialogs when running under terminal services connections
- Fixed a problem where source redirect URI's were not excluded, and appeared in the sitemap
- Outgoing links for an existing link are no longer cleared if the link is excluded for any reason
- Fixed a issue where the skipped status of a URI wasn't reset correctly
Changes and new features
- The Additional URL's section of the Project Properties dialog now allows the entering of relative URI's.
- The sitemap tree view now only loads children on demand, improving performance for large projects
- The different results tabs are also now load on demand, again improving performance for large projects
- Added a new setting which determines if the Sitemap tab is activated when opening existing projects
- Added new options to the Website Size dialog for either using total size or link count for content types, and for limiting the number of slices displayed
- Various minor UI tweaks
Fixed
- Fixed a problem where duplicate URI's could be present in the linkmap in rare circumstances, causing a crash when trying to reopen the project. A repair tool is also available for projects affected by this bug.
- Fixed a potential crash that could occur attempting to retrieve shell icons.
- Fixed a problem where commands linked to URI's that contained spaces in their respective query strings caused the command to fail with an Invalid URI message.
- Fixed rare a problem where it was possible WebCopy would place the same URI twice in the processing queue, and immediately cancel the copy as soon as the second occurrence was hit.
- Fixed a problem where WebCopy did not check the internal document version to ensure it was supported
- Fixed an issue where toolbars were initialized before the window was resized to whatever the user had defined, meaning some toolbars were unnecessarily placed on new rows
- Corrected some invalid message window and dialog titles
- Fixed a crash which occurred when clicking pie slices in the Website Size dialog and filtering was enabled
Changes and new features
- Added a new Ignore SSL Errors option. If this option is set, attempting to scan a website that contains an invalid SSL certificate will be allowed. The default for this setting is false, meaning that WebCopy will not scan websites with invalid certificates.
- Double clicking (or pressing enter) on a file node in the sitemap tree view now displays the appropriate properties dialog
- The Print and Print Preview options are now correctly enabled in the Image Viewer.
Fixed
- Fixed a problem where CSS was remapped incorrectly if the website scanned wasn't the domain root
- Fixed a crash that occurred calculating disk space estimates if a website was being copied to a UNC path
- Fixed a problem where a website that was supposed to be copied to a UNC path was instead copied to the drive where WebCopy was installed using the UNC server name as a sub folder
- Speculative fix for a CSS remapping crash possibly due to malformed CSS
- Fixed a problem where attempting to open an Explorer window to a UNC path displayed an invalid folder message
- The Page Setup option in the Print Preview dialog didn't do anything when activated
Changes and new features
- Simplified the User Agent editor
Fixed
- Fixed a problem where CSS files did not have their URL attributes remapped. This was only evident when using the flatten website folder option on a website where CSS files referenced images from other folders.
Fixed
- If a problem occurs decompressing data compressed using the deflate or gzip algorithms, the download will be automatically retried with these options disabled.
- WebCopy no longer attempts to decompress files that returned none as the content encoding.
- Fixed a crash that occurred if the Use Shell Icons option was enabled but the user did not have access to parts of the Registry when searching for mime types.
- Fixed a crash that occurred when shutting down multiple instances of WebCopy at the same time
- Fixed a crash that occurred when trying to copy a website if the path entered to download the website into contained invalid characters
- Fixed a crash that occurred trying to browse to a path that contained invalid characters
Changes and new features
- Add a new use alternate directory character option. By default, when WebCopy remaps URI's to local files, the / character is replaced with \. Setting this option reverses this default behaviour, so that the \ character is replaced with /. This option can be found in the Copy Settings tab of a project's properties.
Fixed
- Fixed a problem where remapped anchors had fragment information stripped
- Fixed a problem where attempting to open a folder where the drive letter was in lower case caused WebCopy to display an "invalid path" message.
Changes and new features
- Added a new Origin Report setting. This setting allows the generation of either single or multiple origin files which are saved alongside downloaded content and include the source URI. This new setting can be found in the Advanced section of a projects properties.
- Request headers are now stored with each URI the same as response headers and will be saved in the project for later retrieval if the Save Headers option is set.
- The Headers tab in the Link Properties dialog now displays request headers
- Added new options for setting the Accept and Accept-Language request headers.
- Removed status code 406 (not acceptable) from the list of supported codes for automated error reporting during a crawl
- Temporary files are now created in the folder where the website is being downloaded to, speeding up the final moving of files after a successful download, and avoiding potential problems if the disk where the temp folder is located doesn't have sufficient space to store the downloaded file.
- Quick Scan dialog now displays progress state while performing the scan
Fixed
- Referring URI's were being incorrectly set since the last update
- Total download size is now incremented correctly even if the content length was reported as zero by the server
- Filtering a grouped list didn't preserve groups when previously filtered items were restored
- URI's that have a status code of 406 (not acceptable) now have the correct skip reason associated with them
- The Content tab of the Test Link dialog didn't always correctly display returned content
- Fixed a crash that could occur when attempting to sort a list
- Fixed a crash if the Content-Type response header contained a space before the encoding name, for example
text/css; charset= UTF-8
. - Fixed a crash if the Content-Type response header specified
utf8
instead ofutf-8
. - Fixed a crash that could occur if the source URI couldn't be decoded correctly
- Deflate encoding now once again correctly works after being broken in a previous build
- Fixed an occasional crash attempting to get the short form URI pattern when creating rules from an existing URI
- Fixed a problem where tool bars didn't wrap correctly if a new tool bar had to be placed on a new row
Changes and new features
- Experimental: Added the basis of a "quick scan" feature. This scans the top level of the website for unique absolute URI's (removing bookmarks and query strings) and is useful for getting a quick overview of the top level structure of the website, making it easier to detect and exclude pages that have no benefit to copy (such as new thread / reply thread pages in a forum). As with other experimental features, this will be expanded over future updates.
- By default, new projects will now remap local file extensions based on their file type if no existing extension is present
- Removed status code 502 (bad gateway), 503 (service unavailable) and 504 (gateway timeout) from the list of supported codes for automated error reporting during a crawl
Fixed
- Fixed a problem where when using the Excluded and Add Rule commands, the generated URI was invalid if there was a mix of www prefixed and non prefixed URI's
- Fixed a crash that occurred when clicking the Test URI button in the Form Editor and the URI of the project is invalid
- Fixed a problem where occasionally it was possible to execute two crawls at once, causing the second crawl to crash
- Fixed a crash that occurred when WebCopy tried to map the folder aspect of a URI and a file already existed with the same name
- Fixed a crash that occurred when submitting the remove missing links dialog for a project without a valid URI
Changes and new features
- Status bar now shows pending crawl requests.
- The progress bar now attempts to show current process based on total requests. It's not hugely accurate as it doesn't take into account the size of each request, but is better than a marquee! Windows 7 and 8 users will see the same behaviour on the taskbar progress.
- Added support for the data attribute of the object tag.
- Removed status code 500 (internal server error) from the list of supported codes for automated error reporting during a crawl
- Removed downloaded file hash calculation as they currently aren't used by WebCopy
Fixed
- Fixed a problem where GZIP compressed content was downloaded incorrectly if the response headers didn't include a content length
- Fixed a problem where some users experienced a startup crash when initializing fonts
- Fixed a build problem that meant some exception reports were missing information
- Fixed a problem where buffers were incorrectly being processed when downloading which could lead to a potential crash or corrupt file if the response header didn't include a content length, and otherwise just did extra repeated work if a length was available
- Fixed a crash that could occur when crawling websites that had many nested branches of links
- Temporary files generated during the analysis of a website are now deleted as soon as they are no longer required, rather than only once the crawl has completed
- The "is missing" check was ignoring HTTP status codes and only going from the scan index
Changes and new features
- Minor improvements to crawl performance with websites that have a lot of cross page linking
- Sitemap tree view now highlights missing URI's with a configurable color
- Sitemap tree view now highlights folder nodes when all children match the same status
- Added a new Remove Missing Links command. This allows you to selectively remove missing URI's from the sitemap, without having to clear the entire map.
- Website Links dialog lists are now highlighted according to the status of the link
- Added additional filter options to the Website Links dialog
Fixed
- Fixed a crash thath occured when attempting to crawl a CSS file that contained unclosed comments
- URI's which are skipped due to a 403 response code are now correctly flagged as Forbidden as the skip reason
- Exception reports now include details of type load exception data
- A crash no longer occurs if restoring a window's previous state fails
- Fixed a crash which occurred when attempting to combine a URI with a partial URI that contained one of the reserved characters from RFC3986.
- Fixed a problem where URI's were not combined correctly if the relative URI comprised solely of a query string
- Start up errors when loading extensions can now be reported, and no longer prevent the application from starting
- Removed invalid Set as active URL item that appeared on several context menus
- Obsolete outgoing links were not being removed when crawling a source URI
- Filter options in the Website Links dialog are now correctly available no match which tag is active
- Fixed a massive performance issue when populating lists under certain conditions
Changes and new features
- A new Test URI feature is available. This allows you to test a given URL, choose verbs, post information, or experiment with different user agents. Any returned output is viewable, allowing you to easily check if user agents have an impact on returned content, and methods such as HEAD are supported for crawling.
Fixed
- When entering a URL without a schema, a default schema of http:// will be applied
- Old version notices are no longer displayed after opening a project in the old XML format and then creating a new project
- Fixed a problem where the XML generated by an exception was invalid
- Fixed an issue where the content encoding of a page wasn't picked up if the value ended with ;
- Fixed a crash when trying to open or copy the URI of an invalid tree view node
Changes and new features
- Experimental: Added a new option to display excluded URI's in the sitemap tree view
- Experimental: Added new "quick exclude" option to the context menu of the sitemap tree view. This new feature allows you to quickly including or excluding a given URI from being crawled.
- Replaced 3rd party PDF library, this should allow for better compatibility with different types of PDF documents
- Added a disk space check during the crawl. If WebCopy doesn't think there is sufficient space to download a file, it now automatically aborts the crawl.
- Added a new option for automatically opening the last project when starting WebCopy without a command line
- Content types prefixed with x- automatically fall back to the non-prefixed version where the prefixed version cannot be found
- If the crawl of a given page fails, for example with a 500 error, WebCopy now attempts to get any response data which can then be viewed from the new Content tab of the Link Properties dialog. This response data is not saved with the project and is only available for the duration of the session from which it was populated.
- The Addin Manager dialog now lists loaded meta providers
- Improved the completion message for where a crawl was cancelled during to a failure trying to crawl the primary site URI, or any user defined additional crawl URI's
- Completion dialog now includes statistics on the crawl, such as files downloaded and total size
- Added Knowledge Base link to the Help menu
- Help file updated
Fixed
- The sitemap tree view is no longer sorted by an odd combination of last modified+page count and name, but now only by name
- Failure to create the save folder no longer crashes the application
- Improved folder validation for projects using the create folder for domain option.
- Fixed a problem where the crawl map wasn't correctly generated if the base URI of the project include a document name
- Fixed a problem where the Add Rule command didn't create a sensible default if the main project URI included a starting document
- Fixed a problem where the selected entry in some list controls was invisible when the control did not have focus
- The Errors tab is now once again correctly made the active tab after a crawl has completed with errors
- Fixed a problem where the pre-crawl validation didn't correctly handle trying to download to a disk that didn't exist
- If a given URI is detected as "not modified", WebCopy will now still scan the links of the local file rather than stopping processing for that URI. This now allows you to correctly download a website once, then only download changed files during future crawls.
Changes and new features
- Crawl exceptions can now be reported to help improve WebCopy
Fixed
- Validation errors that occur during HTTP parsing are now ignored. This allows crawling of websites such as snapfiles.com which do not follow the HTTP specification
- Exceptions that occur when trying to read website headers (such as the previously mentioned protocol violation) are now correctly reported instead of being silently ignored
- If an error is encounted during a crawl, the message now states that the crawl was cancelled rather than completed successfully
- When saving a WebCopy project, it is correctly written to a temporary file first and then moved to the save file if successfull
- Fixed a rare crash where the exception handler crashed trying to generate the XML report
- Only the first occurrence of each unique mime type was given a shell icon in the the sitemap tree view and other supported views
- The sitemap tree view context menu was enabled even if the tree view was empty, causing crashes if any of the menu items were clicked
- Fixed a crash if you attempted to browse for a folder and had manually entered a path containing invalid characters
Changes and new features
- WebCopy projects are now saved as binary files, and it is no longer possible to save in the old XML format (however you can continue to open them). As a consequence of this change, project files are now smaller (for example a 20MB XML project is now 4MB) and are quicker to load and save. When saving a XML based project, a one time backup will be created of the original XML before the new binary file is written. Note that versions of WebCopy prior to 1.0.2.0 cannot open these binary files.
- Predefined default documents list now includes index.cfm and index.jsp
- Default documents editor now displays the predefined default documents used if the field is left blank
- UI access to "hide" URI's in the link map has been disabled. This functionality will be removed in a future update
- Experimental Sitemap treeview now highlights new or changed links. The color can be configured via the Appearance tab in the Options dialog
- Experimental The Missing tab now also includes new or changed links in addition to links present in a previous scan but now not found
- The last downloaded attribute of URI's are no longer updated if the download was the result of an analyse operation rather than a full copy
- The default save folder option now has an appropriate default
Fixed
- When creating a new project, any existing sitemap wasn't removed from the treeview
- Fixed a crash which occurred when switching back to the Sitemap tab and the current projects URI was blank or invalid
- Fixed a crash which occurred when attempting to view the website diagram and although a crawl map was available, the current projects URI was blank or invalid
- Failure to save a project no longer results in a crash
- Exceptions that occur when opening a project can now be reported
- Pressing return in the form editor data field and default documents editor no longer attempts to submit the editing dialog
- Fixed the client component name for metrics
- If there is a problem setting the progress bar overlay on the taskbar, a crash no longer occurs
- Fixed the wait cursor not always appearing when a blocking action was occurring
- Fixed a rare clean up crash when removing temporary files
- Fixed UI controls being enabled in the URI properties dialog when they should have been disabled
- Pressing enter on a focused link label now correctly activates the link, instead of attempting to activate the default button for the window
- Default images are now correctly used if shell icons are enabled and no appropriate icon is available
Changes and new features
- The Last-Modified HTTP header is now supported. Last modified values specified elsewhere (for example in the meta tag of a HTML document) will override this value.
- Added new option to set the timestamps of local files to mirror the Last-Modified timestamp where available
Fixed
- The "Missing" tab was displaying the wrong content for the Title and Description columns
- Fixed a problem where UI settings sometimes were positioned incorrectly in the Options dialog
- Fixed a problem where drop down based UI settings didn't render the selected item correctly when displaying the drop down component
- Fixed a problem where file dialogs pre-populated with a full path didn't set the initial folder of the dialog
- Addins which had additional dependencies located in the addins or views folders failed to work
- Fixed issues with toolbar initialization
Changes and new features
- Backup settings for project files are now available on the Options dialog
- Update check is now enabled by default
- You can now drag and drop projects from Windows Explorer onto the main application window to open them
- Added an option to disable shell icons in the sitemap tree view
Fixed
- Added an additional path validation check so the application no longer crashes if you set the download folder to be something invalid, such as ftp://.
- Fixed a problem where byte order marks were not being saved into data files, resulting in corrupt text data, such as page titles in the link map. This was only apparent when downloading text documents containing non-ANSI characters. Binary files were not affected.
- Fixed a problem where response encoding was not properly processed
- Fixed a problem where URI processing exceptions that didn't involve an error-state HTTP code weren't reported correctly in the UI
- Fixed a problem where URI which returned quoted character sets (for example
text/html; charset="utf-8"
) caused processing of that URI to fail. When combined with the above bug, this led to URI's being skipped and no way to tell why they were skipped or what any problems were. - Fixed an issue where the default user agent was always being used when crawling a website.
- Fixed a crash saving untitled projects introduced in the last build
- Fixed a crash when creating a new tool introduced in the last build
- Fixed a problem where commands with hot keys could be activated while the user interface was disabled, leading to either a crash or an invalid application state
- Fixed a crash if the user attempt to copy an URI and the clipboard was in use
Changes and new features
- External Tools dialog now includes a preview of the command line and allows tools to be executed from within the editing dialog
- External tools now support using environment variables
- User Agent editor now shows the default user agent
- Added a status bar indicator showing how long the current operation is taking
- Added new External URI's and Images reports
- Added automatic update check which can be enabled/disabled in the Options dialog. When enabled, once a day a check is made, and if an update is found a notification is displayed in the status bar.
- The remap extensions mode is no longer a simple on/off switch, but now allows you to select if extensions should always be remapped, never be remapped, or remapped only if no existing extension is present.
- Added the ability to create content viewers
- Added content preview support to report viewer
- The always download latest version option is now enabled by default for all new WebCopy projects
Fixed
- Fixed an issue where an unexpected exception that occurs during URI preprocessing crashed the application rather than just aborting the active URI
- Exception reports were missing data information
- Fixed settings dialog always reapplying all settings regardless of if the setting page had tracked any changes
- Fixed a crash which occurred if the About dialog was displayed after viewing a report.
- Fixed a problem where a settings page that did not save changes correctly crashed the entire application
- Fixed a crash that occurred if the root Theme menu was clicked
- Fixed an issue where duplicate entries with the same URI and internal ID could be added to the link map when opening a project
- Default user agent was missing platform name
- Single tabbed options dialogs are no longer quite as narrow
- Fixed a number of duplicate accelerators in command menus
- Fixed a crash that occurred if the Courier New font was not available even if an alternate font had been specified in the Options dialog.
- Malformed reports no longer crash the application
- Application should no longer crash if there is a problem rendering button components. Could not reproduce original bug and error report did not include contact details so it's possible issues still exist. This is a workaround, not a true fix.
- Fixed some button tooltips incorrectly including the ampersand and ellipsis characters
- Fixed a problem where modifying a value in the Options dialog partially applied the value even if the dialog was subsequently cancelled
- Fixed a problem where the "show folder paths" options was read via one name, but wrote with another, preventing the value from being usable
- Fixed a problem where the initial state of the Rules Editor was incorrect when displaying the editor in a project with no rules
Changes and new features
- Sitemap tree view now displays file icons rather than a generic document icon
- Added experimental Reports feature for performing dynamic querying of data. At present three read only reports, Redirects, Not Found and Empty Meta Data, are provided, future updates will expand upon these and add the ability to create your own.
- Added the ability to allow editing for link titles and descriptions. This allows you to customize the title and description of a link, and also prevent future updates from resetting the values. Note: If the option to clear the link map when analyzing a project is set, customizations will be automatically lost.
- Added the ability to specify inclusion/exclusion criteria for mime types. This allows you to exclude certain file types from being downloaded, for example you may wish to ignore all EXE files.
- Added new feature to cancel a crawl if a given HTTP status code is met. These options can be configured on the Advanced page of a project's properties.
- Added new Export Site URI's option. This will export all URI's and their statuses to a CSV file for external processing
- The link properties dialog is now resizable
- It is now possible to install additional readers for document meta information
- Added the ability to view a diagram of a websites structure via the the Website Diagram addin.
- Double clicking an item in the Forms list now automatically opens the form editor
- Double clicking an item in the Rules list now automatically opens the rule editor
- Browse folder dialogs now allow entering the path on Windows Vista and above
- Font settings can now be set through the Options dialog
- Substantial API changes to make it easier to use.
- User interface should now remain usable whilst analyzing a website or building sitemaps
- Added the ability to extract the titles from PDF files during crawling
- Added the ability to load meta data from RSS files during crawling
- Added Reverse option to rules. When this option is set, the rule is processed if the regular expression is not matched.
- The link properties dialog now displays incoming and outgoing links for the source URI
- Removed the single instance limit
- HTTP redirect responses are no longer classed as errors
- URL's in link map dialog selection combo box are now ordered
- A new indicator has been added showing the total size of content download during the active crawl operation
- Includes customer experience improvement program
- Various minor user interface enhancements
Fixed
- Fixed an issue where analyzing a website would incorrectly download content files that could not be crawled
- Errors list no longer displays -1 instead of the appropriate error code
- Fixed an issue where it was not possible to open certain project files
- Fixed a problem where it was possible that the wrong item could be edited from the Forms list
- Fixed a problem where it was possible that the wrong item could be edited from the Rules list
- Fixed a problem where an addin that could not be initialized left the application in an unstable state
- Fixed a problem where some application settings were not immediately applied when changed by the user
- Default user agent now follows RFC 2616
- Fixed an issue where URI preprocessing wasn't immediately applied to URI's detected for crawling, which could cause additional unwanted entries appearing in the link and crawl maps.
- Fixed a problem where modifying custom project settings exposed via an addin didn't mark the project as changed
- Fixed a problem where the Disable Links rule condition wasn't working as expected
- Fixed a number of issues with the error reporting tool
- Fixed an issue where creating a new document when changes had been made to the current document did not prompt to save said changes, causing them to be lost
- Fixed an issue where clicking an empty MRU could prompt to remove a blank filename
- Fixed a crash which occurred when hovering over an overflow toolbar button
- Items in the toolbars menu now appear in the correct order
- Regular Expression editor now correctly updates when modifying the Replacement field.
- If a link was modified, it did not mark the project as changed
- The error list now includes the description of HTTP response code errors
Changes and new features
- Added a Replace section to the Regular Expression dialog to make it easier to test replacement expressions
- Various performance enhancements
- The Errors tab no longer lists "Unknown Response" for non-200 HTTP codes, but instead includes the code description
- Added the ability to run user defined custom tools from within the application
- Attempting to open a recent file which no longer exists now prompts to remove the missing file from the recent files list
Fixed
- Fixed a crash when crawling if a rule was created with an invalid regular expression
- Reworked application mutex to avoid silent startup and shutdown exceptions
- Fixed regular expression cache not being thread safe
- Status bar wasn't correctly cleared if there was a problem populating a view which required a valid crawlmap
- Fixed status bar messages from occasionally not appearing
Changes and new features
- Product help is now available and the product is now out of beta
- Added the ability to enable the "multi line" option in the Regular Expression editor to easier test patterns using ^ on $ on lists of URL's
- Added a Test URL option for Forms, allowing you to test that your forms can be successfully POSTed prior to running a full crawl
- Changed settings dialogs to use a tabbed interface
- Holding down Shift when clicking the Copy Website or Analyze buttons forces the download of all resources, skipping last modified checks
Fixed
- Fixed a large number of issues with the application services libraries and components
- Fixed an issue where attributes of posted URL's were not correctly loaded if encountered at a later point during the crawl
- Fixed a crash which could occur when using the title replacement options and a page had a null title
- Fixed a crash which could occur when scanning a HTML tag containing a malformed URL
- Fixed an issue where email addresses were stripped if they contained the # character and the "strip fragments" option was enabled
Changes and new features
- The Link Map window now remembers its size and position
- The URI control for selecting the website to analyze is now tied to the system URI history
- Removed the confirmation prompt when rebuilding a crawlmap from saved history information
- The link scanner now supports the use of the base tag. If present, the URI value will be combined with links on the page.
Fixed
- Fixed various problems which could occur when trying to crawl a site with malformed links containing double slashes after the domain
- If the copy process crashes the application will continue to run after dismissal of the exception reporting dialog
- Fixed a crash which would occur if a generated file name was the same as an existing directory name
- Fixed several crashes which occurred if a valid content type was downloaded as an empty file
- The list of incoming URI's for any given URI were being incorrectly populated
- Fixed an issue where if a URI was referred to in multiple locations, after the first time it was encountered the outgoing and incoming URI links would not be updated correctly for future encounters
- When reloading a project, the link map is no longer crawled looking for pages directly matching the root element, but all non-excluded internal URI's are formed into the map, resolving a problem where the crawl map generating from reloading a project may not match the crawl map generating from analyzing a website
- Fixed the & character from not appearing correctly in the status bar
- Fixed issue with application window being sent behind other top level windows when cancelling a crawl
- Fixed tab order on main window
- Fixed one occurrence where links were not combined correctly causing an infinite cascade (or at least until you hit the path limit for your OS). Additional causes of this bug may still be present, investigations are continuing.
Changes and new features
- A new rule option has been added that can be used to prevent a rule from matching a child URI
- If-Modified-Since header and the NotModified HTTP status code are now supported
- Added a new option to allow the latest version of a file to be always downloaded, skipped if the If-Modified-Since checks
- A new "Missing" tab has been added that shows URL matches in a previous scan that were not matched in the latest scan
- Redirect processing now honors 303 and 307 response codes
- Report lists now display tooltips
- If a link redirects to another, the destination is now stored with the original link
- Content length is now stored with link information, independently of if headers are stored
- Link properties dialog now shows redirect information and content length
- Added the ability to view the size of a website by content type
Fixed
- Exception reports were using the file version instead of the product version
- Fixed a rare XML crash when saving a project
- Fixed a crash which would sometimes occur when editing a rule or a form
- When downloading a file, the Last Downloaded timestamp is now stored as UTC
- Fixed an error where the content type was not set correctly if HEAD checking was disabled
- Fixed a problem where the local file for a URL would be continously regenerated if the "Empty Save Folder" option was not set
- Fixed a problem where it was possible for a URL to be crawled even though pre processing had rejected the URL
- Empty directories are no longer generated for URL's which fail pre processing, such as redirects or unsupported content types
- Fixed a crash which would occur if the referring URL was not available
- Fixed a crash which would occur if the "content-type" header wasn't present when pre-processing a URL
- URI's which end with / but point to a valid text/html document no longer strip of the final segment when generating the local filename and the flatten directories option is disabled
- Link properties dialog now correctly includes the time when a file was last downloaded
- Buttons in the main window now correctly follow the colors of the main theme
Changes and Updates
- Meta refresh redirects are now crawled and remapped
- Changed how redirects are handled, these will now appear in the main report lists
- Files list now displays the content type of entries
- Skipped list now displays the content type of entries
- Added new Not Found and Redirect exclusion reasons, redirects and missing files will no long appear as "None" in skip lists.
Fixed
- Two URL's with the same host bar the www prefix (e.g. http://cyotek.com/ and http://www.cyotek.com/) are now treat the same when determining if a URL is external.
- URI's were not correctly combined on pages being crawled as a result of a redirect.
- Reloading a sitemap which contained redirects did not display a map for any content discovered after the redirect
- No longer attempts to download content for redirected responses
- Project's weren't always being correctly marked as changed
- Application wouldn't start on 64bit Windows (regression from 1.0.0.3).
- Lists are correctly cleared before an analyze or copy action (regression from 1.0.0.3).
- When creating or opening a project, the contents of the Files tab were not being cleared (regression from 1.0.0.3).
Changes and Updates
- Substantial performance improvements have been made when loading large projects containing many links.
- Updated to use Html Agility Pack 1.4
- A new option to control if headers should be saved in the project file has been added. This option is disabled by default.
- Cut, copy and paste commands are now available from the main window. However, lists and trees currently only support copy.
Fixed
- Titles and Descriptions were attempted to be obtained from all files, causing a rare crash.
- The Accept GZip Compression option was never correctly read from the project file.
- Toolbar visibility was not preserved between sessions
Changes and Updates
- Add-ins can now be enabled and disabled.
- Appearance themes are now enabled.
- The views Skipped and Files now have a context menu.
- The Speed, Time Elapsed and Time Remaining columns have been removed as they aren't working.
Fixed
- Relative paths weren't being saved in project files correctly
- The application wasn't correctly attached to the error handling system
- Command line arguments are now correctly processed.
- Filenames were not being regenerated when opening a project.
- Completion messages now correctly warn when errors were detected during copying.
- Fixed a problem where running on XP either didn't display disabled images or crashed.
Changes and Updates
- A new options page for controlling the local copy options has been added.
- The project properties dialog now displays several of the common editors to provide access to properties which could not be changed in the alpha build.
- The context menu for various lists now has an Edit Local File option.
- Added a new option to control if extensions are remapped based on their content type.
- Results list now shows elapsed time and estimated time of downloads.
- 401 authentication requests are now supported, either via predefined credentials or during the crawl via a password dialog.
- The default buffer size has been increased to a larger value, allowing for faster downloads. In addition, the buffer size is now configurable.
- Gzip compression is now supported.
- Deflate compression is now supported.
Crawling is now performed on a separate thread, resolving sluggish behaviour with the user interface.Disabled for this build- The Link Map Viewer now has a tab for displaying all links found. All lists in this dialog have had new columns added with more details on the links.
- The project properties dialog now provides access to properties which could not be changed in previous builds.
- Object model simplified, some confusing class inheritance has been removed.
- Added the ability for additional content type handlers to be used.
- Added the ability to specify multiple seed URI's.
- A new configuration section has been added allowing you to store authentication credentials in a project file and to disable the password dialog when crawling.
- Added a new viewer extensibility options allowing new tabs to be added to the interface.
- Major refactoring of the base IApplication implementation.
- Response headers are now stored in the link map. The Link Properties dialog now displays these headers.
- The Link Properties dialog now displays local path information and the ability to open, open the containing folder, or edit the local file.
- Scanning of subdomains is now supported.
- You can now select from a common list of user agents.
- Crawling will no longer occur above the root level by default. A new option has been added to toggle this behaviour.
- Exclusions have been renamed to Rules to reflect their changing nature in this build and future planned enhancements.
- When using the Add Rule context menu item from a result list, the editing dialog is now displaying allowing the entire rule to be configured.
- The Add Rule command now includes any applicable query string in the URL for the rule.
- A basic Regular Expression Editor is available and can be accessed via the Function button displayed next to supported fields.
- Error text associated with a page error is now stored in the link map.
- The page errors list will now be regenerated on loading a project with a saved link map.
- The Link Map Viewer now displays link titles and error text.
Fixed
- Redirects were not followed for 301 or 307 status codes.
- The error list wasn't properly recording all errors which occurred during a crawl.
- The failure to download a file due to a non-HTTP related error should no longer crash the application.
- The prompt to create a missing save folder now includes the folder name instead of a formatting placeholder.
- Fixed an issue where local file names contained escaped HTML entities.
- Fixed an issue where it was possible for local file names to contain illegal characters.
- Analyzing a website now only downloads files supported for crawling.
- CSS contained within comment blocks is no longer crawled.
- Page links found in an IFrame or Frameset were not scanned.
- Cancelling a crawl now also correctly aborts the current transfer instead of waiting for it to complete.
- If a list was scrolled horizontally, the content menu displayed from the filter bar wasn't positioned correctly.
- Fixed a bug where response headers were not available if the request was not an expected response code.
- The result expression editor no longer displays results for a blank expression.
- Duplicate keyboard accelerators have been fixed.
- The Sorted property of a crawl map now correctly defaults to false.
- Fixed a problem where it was possible for the CommandManager to try and load classes it had no business loading, causing error messages to be displayed on startup.
- Fixed a problem where command interface elements were not always given a name, leading to a problem where items could not be accessed unless the full text was known.
- The failure to load an image resource for a command interface element will no longer cause the application to fail to initialize.
- The Add Rule and Add Form dialog's caused a crash when being used to create rather than modify items.
- If a link to child of a page which has been matched to a rule with the DisableCrawl option is detected, the entire link will now be excluded.
- Fixed some selection inconsistencies in rules and forms editors.
- The Add Rule command now automatically escapes regular expression elements within the URL, such as the ? of a query string.
- Fixed some layout problems in Windows XP.
Minimum Requirements
- Windows 10, 8.1, 8, 7, Vista SP2
- Microsoft .NET Framework 4.6
- 20MB of available hard disk space
Donate
This software may be used free of charge, but as with all free software there are costs involved to develop and maintain.
If this site or its services have saved you time, please consider a donation to help with running costs and timely updates.
Donate