This content has moved - please find it at https://devblog.cyotek.com.

Although these pages remain accessible, some content may not display correctly in future as the new blog evolves.

Visit https://devblog.cyotek.com.

Extracting email addresses from Outlook

The cyotek.com receives an awful lot of spam and a lot of this is sent to email addresses that don't exist. However, as we currently have catch all's enabled, it means we receive it regardless. This is compounded by the fact that I tend to create a unique email address for each website or service I interact with. And it's impossible to remember them all!

As a first step to deleting the catch alls, I wanted to see how many unique @cyotek.com addresses were in use. The simplest way of picking up these would be scanning PST files - we have email going back to 2002 in these files, and there's the odd backup elsewhere going back even further. Last time I used OLE Automation with Outlook was back in the days of VB6 and I recall well getting plagued with permission dialogs each time I dreamed of trying to access the API. Still, I thought I'd take a look.

A console application merrily extracting my personal email addresses from my Outlook store

Setting up

Note: I tested this project on an Outlook profile which has loaded a primary PST, an archive PST, and a Gmail account. I haven't tested this with any other type of account (for example Exchange) or with accounts using non-SMTP email addresses. Caveat emptor!

The first thing to do is add a reference to the Outlook COM objects. I have VS2010 and VS2012 installed on this machine, and one of them has installed a bunch of prepared Office Interop DLL's into the GAC. Handy, I won't have to create my own! Adding a reference to the Microsoft Outlook 14.0 Object Library added three references, Microsoft.Office.Interop.Outlook.dll, Office.dll and stdole to my project.

Note: Depending on your version of VS / .NET Framework, the references may have a property named Embed Interop Types which defaults to true. When left at this, you may have problems debugging as you won't be able to access the objects properly through the Immediate window, instead getting an error similar to

"Member 'To' on embedded interop type 'Microsoft.Office.Interop.Outlook.MailItem' cannot be evaluated while debugging since it is never referenced in the program. Consider casting the source object to type 'dynamic' first or building with the 'Embed Interop Types' property set to false when debugging"

Probably a good idea to set this to false before debugging your code!

Connecting to Outlook

All the code below assumes that you have a using Microsoft.Office.Interop.Outlook; statement at the top of your code file.

Connecting to Outlook is easy enough, just create a new instance of the Application interface. We'll use as a root for everything else.

Application application;

application = new Application();

Remember I mentioned permission dialogs? Older versions of Outlook used to prompt for permissions. Outlook 2010 just seems to quietly get on with things. The only thing I've noticed is that if you try and create a new Application when Outlook isn't currently running, it will be silently started and the system tray icon will have a slightly different icon and a tooltip informing that some other program is using Outlook. Much nicer than previous behaviours!

Getting Account Folders

The Session property of the Application interface returns a NameSpace that details your Outlook setup, and allows access to accounts, profile details etc. However, for this project, the only thing I care about is the Folders property which returns a collection of MAPIFolder objects. In my case, it was the three top level folders for my profile - I was somewhat surprised that the Gmail account was loaded actually.

Now that we have a folder, we can scan it by enumerating the Items property. As Outlook folders can contain items of various types, you need to check the item type - I'm looking for MailItem objects in order to extract those addresses.

Pulling out email addresses

Each MailItem has Sender, To and Recipients properties. To seems to be just a string version of Recipients and so shall be completely ignored - why bother parsing it manually when Recipients already does it for you. The Sender property returns an AddressEntry, and each item in the Recipients collection (a Recipient) offers an AddressEntry property. So we're all set!

The following code snippet is from the example project, and basically shows how I scan a source MAPIFolder looking for MailItem objects.

protected virtual void ScanFolder(MAPIFolder folder)
{
  this.CurrentFolderIndex++;
  this.OnFolderScanning(new MAPIFolderEventArgs(folder, this.FolderCount, this.CurrentFolderIndex));

  // items
  foreach (object item in folder.Items)
  {
    if (item is MailItem)
    {
      MailItem email;

      email = (MailItem)item;

      // add the sender of the email
      if (this.Options.HasFlag(Options.Sender))
        this.ProcessAddress(email.Sender);

      // add the recipies of the email
      if (this.Options.HasFlag(Options.Recipient))
      {
        foreach (Recipient recipient in email.Recipients)
          this.ProcessAddress(recipient.AddressEntry);
      }
    }
  }

  // sub folders
  if (this.Options.HasFlag(Options.SubFolders))
  {
    foreach (MAPIFolder childFolder in folder.Folders)
      this.ScanFolder(childFolder);
  }
}

When I find an AddressEntry to process, I call the following functions:

protected virtual void ProcessAddress(AddressEntry addressEntry)
{
  if (addressEntry != null && (addressEntry.AddressEntryUserType == OlAddressEntryUserType.olSmtpAddressEntry || addressEntry.AddressEntryUserType == OlAddressEntryUserType.olOutlookContactAddressEntry))
    this.ProcessAddress(addressEntry.Address);
  else if (addressEntry != null)
    Debug.Print("Unknown address type: {0} ({1})", addressEntry.AddressEntryUserType, addressEntry.Address);
}

protected virtual void ProcessAddress(string emailAddress)
{
  int domainStartPosition;

  domainStartPosition = emailAddress.IndexOf("@");

  if (!string.IsNullOrEmpty(emailAddress) && domainStartPosition != -1)
  {
    bool canAdd;

    if (this.Options.HasFlag(Options.FilterByDomain))
      canAdd = this.IncludedDomains.Contains(emailAddress.Substring(domainStartPosition + 1));
    else
      canAdd = true;

    if (canAdd)
      this.EmailAddresses.Add(emailAddress);
  }
}

Although I'm scanning my entire PST, I don't want every single email address in there - I ran it once and it brought back just over 5000 addresses. What I want, is addresses tied to the domains I own, so I added some filtering for this. With this filtering enabled it returned a more manageable 497 unique addresses. Although I'm not creating 497 aliases on the email server!

Wrapping up

This is a lot easier than what I was expecting, and in fact this is probably the smoothest piece of COM interop I've done with .NET yet. No strange errors, no forced to compile in 32bit mode, It Just Works.

You can find the example project in the link below.

Update History

  • 2012-09-26 - First published
  • 2020-11-21 - Updated formatting

Downloads

Filename Description Version Release Date
OutlookEmailAddressExtract.zip
  • md5: 8ff6f5d3e6a03b2efeb65db5b2735ea5

Outlook Email Address Extrator, a sample C# application that scans an Outlook profile and pulls out email addresses

1.0.0.0 26/09/2012 Download

About The Author

Gravatar

The founder of Cyotek, Richard enjoys creating new blog content for the site. Much more though, he likes to develop programs, and can often found writing reams of code. A long term gamer, he has aspirations in one day creating an epic video game. Until that time, he is mostly content with adding new bugs to WebCopy and the other Cyotek products.

Leave a Comment

While we appreciate comments from our users, please follow our posting guidelines. Have you tried the Cyotek Forums for support from Cyotek and the community?

Styling with Markdown is supported