The end result? A new Use Web Browser option can be found in the Project Properties dialog. When set, WebCopy will do its own downloading and remapping of content, but it will use an embedded Internet Explorer session to do the crawling.
The screenshot above shows a scan of the WebCopy demonstration
site. The page
build a list of links. As seem above, previous versions of
WebCopy are completely oblivious to these extra links.
Listing the cons
Although I'm pleased to be able to finally offer this functionality, there are a few caveats.
This functionality is very new, and very experimental. It is by no means certain that I have ironed out all the potential issues. Caveat Emptor!
- Crawling may be substantially slower. HTML documents will be downloaded twice, and the headless web browsing will also add significant overhead
- This functionality currently uses the latest version Internet Explorer that is installed on your computer. Not all websites play nicely with IE
- Keeping with the Internet Explorer theme, it will share and use global cookies
- Some options won't apply - for example the user agent. If a website is particularly unfriendly, it may serve different content to WebCopy than it does to the hosted Internet Explorer session
- It occurs to me as I write this post that I have no idea what will happen if the scripts try to open a popup window. Probably nothing good!
- Potentially more issues. Experimental code!
I don't want to use Internet Explorer, can't I use Chrome or Firefox?
Neither do I. Microsoft have dropped the ball so many times with web browsers I'm amazed they are still in the game. Although I wish they'd just decoupled Edge from the OS and updated it more frequently than giving into Google and adopting Chromium. But I've probably stated this before, plus, as usual, I digress.
To get back to the point, I expect future versions of WebCopy will support both Firefox and Chromium. However, as these browsers are several times larger than WebCopy, they won't be included by default. So I also need to have a nice system so that you can easily add extra browser engines to WebCopy from within the application and without needing to install anything.
I'm also considering supporting Edge as Microsoft appear to be adding support for this to .NET, as long as you're on the latest Windows 10. However, given that it's probably "old" Edge then this may not happen as adding support for two obsolete browsers and with one only available to a fraction of users is going to be a waste of the time I simply don't have to waste.
I'll have more to write about this in future I'm sure!
- 2019-06-29 - First published
- 2020-11-23 - Updated formatting