It's now been more than a month since my slightly glum post about the state of WebCopy (and other product) updates, so I thought I'd post a brief update.
The last month has been spent developing and testing a new system for collating exception details and work is progressing well. It has been a really fun project to work on as I took the opportunity to use things I wouldn't normally use, such as a full REST API for managing pretty much everything, proper constructor based dependency injection (the last time I "used" DI I basically stuck the container on a static and treating it as a service locator... not the best of ideas) and IoC practices, component based UI using React.js and vanilla-flux, Azure websites, Azure SQL Database and even more. I'll probably write something more about this in technical blog posts in the future.
One of the really cool things about the project is that I've been dog-fooding it using live errors submitted using the existing exception handler - when an exception is logged into Cyotek.com using the old API, it now also dumps the raw context into an Azure service bus queue. Every so often the prototype API will poll this queue, pull out the messages, convert as much information as possible into the new format and then log the data into the new system. By leaving the messages on the queue, it also means if I wipe the database, I can have all that information repopulated. This has helped a great deal in getting varied information for testing - and also helps let me know of problem areas in a way I just couldn't see previously.
Ok, you're having fun - but is it going to help?
In terms of managing the exceptions, some of the key improvements are
Previously all exceptions were logged as isolated units with no concept of repitition. Each unique exception is now a single "event" with multiple "occurrences". A notification is sent for each new event, but (and this is the key bit for me) repeat notifications will also be sent if the exception continues to be raised. Currently, this is every 10 repeats, but I'll probably add some way of configuring this based on frequency or age. This prevents you being overwhelmed if you get dozens+ of emails telling you about the same exception (which is what happens now!), but also (and this is why I stopped using Sentry in the end) means that if something Really Bad is happening, you will continue to get periodic emails so you know something is amiss, instead of getting a single notification for a critical issue which you could easily miss.
This is actually another part of the aggregation, but as it was another pain point in the original system I am happy its resolved. Exception data is provided in localized form, so I frequently get the same exception in English from one user, Chinese from another, and French from a third. Or any other language! Again, with each exception being isolated, it means a lot of time manually translating the text. This sort of information is not used to categorize exceptions in the new system (only the underlying type, call stack, inner exceptions and basic product details are used), so now I can happily see (using the dog-fooded data!) exceptions grouped regardless of language - and no need to translate as long as at least one English occurrence is present. However, I'm also now including the language code too, so at some point I can build in automated translation if required.
Cyotek.com has no idea if an exception has been resolved, and so will continue to blindly send notification emails even if a bug is long fixed should users not update. Events can now be resolved, and notifications will no longer be sent once this has happened. Of course, if the same exception reoccurs in a newer version of the product, it will automatically be re-raised.
This probably won't make it into the first round of client updates, but one of the things I liked about vbCodeShield (going back 15 years here!) was that it was able to provide contextual responses for an exception. For example, it could report that a newer version was available, or provide a link to an article which described work-arounds. This is something that would be useful when Cyotek's desktop applications crash too (even if it's just to let you know we're aware of the issue and working on it!) and so will be added to the new system.
I mentioned in the previous post there was no way of viewing the exceptions except from the notification emails. There is now a nice(ish) looking front end for viewing exception groups and all their data, including rudimentary searching and commenting. Plus the all important Resolve button! It needs more work but as for the time being it will only be viewable by us, it doesn't matter. I do plan to stick the source on GitHub at some point though.
I don't care about any of this, where are the updates?
Currently all development time has been spent on the new system, but as the logging functionality is now mostly complete, and the front end work doesn't matter quite as much as it is internal only, work is resuming on our core product range - commits with bug fixes have already started.
I have continued to receive numerous support requests for WebCopy in the past month and I'm now starting to consider scrapping it completely - for a freeware product is it taking an unacceptable amount of resource just answering support tickets, let alone developing it. All of which is impacting too much on other projects. Still, "don't be hasty" would seem to be a good idea at this point and I'd rather avoid this route if possible - so identifying why so many support requests are needed (bad code, bad documentation, etc) will be a good start.
I'll also be using the dog-fooded information from the new exception system to start looking at the most reported bugs (the most frequently occurring bug is System.Net.WebException: (411) Length Required) and getting those fixed in the next update will be the first priority. And probably updating the documentation to state exactly what WebCopy can (and more importantly) can't do would be a good next step. I've also added 64bit builds to WebCopy for the first time, which may help with memory issues that some users experience (although fixing this properly is also on the list!).