FlickrMetadataSynchr

Yesterday I worked on a new version of my FlickrMetadataSynchr tool and published the 1.3.0.0 version on CodePlex. I wasn’t really planning on creating a new version, but I was annoyed by the old version in a new usage scenario. When you have an itch you have to scratch it! And it is always good to catch up on some programming if recent assignments at work don’t include any coding. So what caused this itch?

FlickrMetadataSynchr-v1.3.0.0About two weeks ago I got back from my holiday in China with about 1,500 pictures on my 16 GB memory card. I always make a first selection immediately after taking a picture, so initially there were lots more. After selection and stitching panoramic photos, I managed to get this down to about 1,200 pictures. Still a lot of pictures. But storage is cheap, tagging makes search easy, so why throw away any more? One of the perks of a Pro account on Flickr is that I have unlimited storage, so I uploaded 1,1173 pictures (5.54 GB). This took over 12 hours because Flickr has limited uploading bandwidth.

Adding metadata doesn’t stop at tagging pictures. You can add a title, description and geolocation to a picture. Sometimes this is easier to do on your local pictures, and sometimes I prefer to do it on Flickr. The FlickrMetadataSynchr tool that I wrote is a solution to keeping this metadata in sync. You should always try to stay in control of your data, so I keep backups of my e-mail stored in the “cloud” and I store all metadata in the original picture files on my hard drive. Of course I backup those files too. Even offsite by storing an external hard drive outside my house.

Back to the problem. Syncing the metadata for 1,1173 pictures took an annoyingly long time. The Flickr API has some batch operations, but for my tool I have to fetch metadata and update metadata for pictures one-by-one. So each fetch and each update uses one HTTP call. Each operation is not unreasonably show, but when adding latency to the mix it adds up to slow performance if you do it sequentially.

Imperative programming languages like C# promote a sequential way of doing things. It is really hard to exploit multiple processor cores by splitting up work so that it can run in parallel. You run into things like data concurrency for shared memory, coordinating results and exceptions, making operations cancellable, etc. Even with a single processor core, my app would benefit from exploiting parallelism because the processor spends most of its time waiting on the result of the HTTP call. This time can be utilized by creating additional calls or processing results of other calls. Microsoft has realized that this is hard work for a programmer and great new additions are coming in .NET Framework 4.0 and Visual Studio 2010. Things like the Task Parallel Library and making debugging parallel applications easier.

However, these improvements are still in the beta stage and not usable yet for production software like my tool. I am not the only user of my application and “xcopy deployability” remains a very important goal to me. For example, the tool does not use .NET 3.5 features and only depends on .NET 3.0, This is  because Windows Vista comes with .NET 3.0 out of the box and .NET 3.5 requires an additional hefty install. I might make the transition to .NET 3.5 SP1 soon, because it is now pushed out to all users of .NET 2.0 and higher through Windows Update.

So I added parallelism the old-fashioned way, by manually spinning up threads, locking shared data structures appropriately, propagate exception information through callbacks, making asynchronous processes cancellable, waiting on all worker threads to finish using WaitHandles, etc. I don’t use the standard .NET threadpool for queing work because it is tuned for CPU bound operations. I want to have fine grained control over the number of HTTP connections that I open to Flickr. A reasonable number is a maximum of 10 concurrent connections. This gives me almost 10 ten times the original speed for the Flickr fetch and update steps in the sync process. Going any higher puts me at risk of being seen as launching a denial-of-service attack against the Flickr web services.

If you want to take a look at my source code, you can find it at CodePlex. The app was already nicely factored, so I didn’t have to rearchitect it to add parallelism. The sync process was already done on a background thread (albeit sequentially) in a helper class, because you should never block the UI thread in WinForms or WPF applications. The app already contained quite a bit of thread synchronization stuff. The new machinery is contained in the abstract generic class AsyncFlickerWorker<TIn, Tout> class. Its signature is

/// <summary>
/// Abstract class that implements the machinery to asynchronously process metadata on Flickr. This can either be fetching metadata
/// or updating metadata.
/// </summary>
/// <typeparam name="TIn">The type of metadata that is processed.</typeparam>
/// <typeparam name="TOut">The type of metadata that is the result of the processing.</typeparam>
internal abstract class AsyncFlickrWorker<TIn, TOut>

It has the following public method

/// <summary>
/// Starts the async process. This method should not be called when the asychronous process is already in progress.
/// </summary>
/// <param name="metadataList">The list with <typeparamref name="TIn"/> instances of metadata that should
/// be processed on Flickr.</param>
/// <param name="resultCallback">A callback that receives the result. Is not allowed to be null.</param>
/// <typeparam name="TIn">The type of metadata that is processed.</typeparam>
/// <typeparam name="TOut">The type of metadata that is the result of the processing.</typeparam>
/// <returns>Returns a <see cref="WaitHandle"/> that can be used for synchronization purposes. It will be signaled when
/// the async process is done.</returns>
public WaitHandle BeginWork(IList<TIn> metadataList, EventHandler<AsyncFlickrWorkerEventArgs<TOut>> resultCallback)

It uses the generic class AsyncrFlickrWorkerEventArgs<TOut> to report the results:

/// <summary>
/// Class with event arguments for reporting the results of asynchronously processing metadata on Flickr.
/// </summary>
/// <typeparam name="TOut">The "out" metadata type that is the result of the asynchronous processing.</typeparam>
public class AsyncFlickrWorkerEventArgs<TOut> : EventArgs

The subclass AsyncPhotoInfoFetcher is one of its implementations.

/// <summary>
/// Class that asynchronously fetches photo information from Flickr.
/// </summary>
internal sealed class AsyncPhotoInfoFetcher: AsyncFlickrWorker<Photo, PhotoInfo>

These async workers are used by the FlickrHelper class (BTW: this class has grown a bit too big, so it is a likely candidate for future refactoring). Its method that calls async workers is generic and has this signature:

/// <summary>
/// Processes a list of photos with multiple async workers and returns the result.
/// </summary>
/// <param name="metadataInList">The list with metadata of photos that should be processed.</param>
/// <param name="progressCallback">A callback to receive progress information.</param>
/// <param name="workerFactoryMethod">A factory method that can be used to create a worker instance.</param>
/// <typeparam name="TIn">The "in" metadata type for the worker.</typeparam>
/// <typeparam name="TOut">The "out" metadata type for the worker.</typeparam>
/// <returns>A list with the metadata result of processing <paramref name="metadataInList"/>.</returns>
private IList<TOut> ProcessMetadataWithMultipleWorkers<TIn, TOut>(
    IList<TIn> metadataInList,
    EventHandler<PictureProgressEventArgs> progressCallback,
    CreateAsyncFlickrWorker<TIn, TOut> workerFactoryMethod)

This method contains an anonymous delegate that acts as the result callback for the async workers. Generics and anonymous delegates make multithreaded life bearable in C# 2.0. Anonymous delegates allow you to use local variables and fields of the containing method and class in the callback method and thus easily access and change those to store the result of the worker thread. Of course, make sure you lock access to shared data appropriately because multiple threads might callback simultaneously to report their results.

And somewhere in 2010 when .NET 4.0 is released, I could potentially remove all this manual threading stuff and just exploit Parallel.For 😉

A couple of months ago I received a license for NDepend to evaluate its usefulness. I was already convinced that NDepend is a very useful tool. But up to now, I hadn’t put NDepend to good use in a way that I could blog about it.

Today I decided to bite the bullet and put my own pet project FlickrMetadataSynchr up for analysis. Its source code is available on CodePlex.

NDepend analyses managed code for several quality aspects, like cyclomatic complexity, coupling and unused code. In a way it resembles FxCop, but it also does a lot more in terms of reporting. NDepend also is a lot more flexible in letting you query your code base. For this it uses its own SQL variant called Code Query Language (CQL). For example, you could enter this query into the tool

SELECT METHODS WHERE NbLinesOfCode > 30 AND IsPublic

and NDepend will show you all public methods whose number of lines of code exceeds 30.

Just by using the standard settings, NDepend gives you truckloads of information that point to areas with potential code smell. The report has inline comments that explain why it selects stuff and points out possible false positives for which it is okay to ignore the warning.

You can find my NDepend results here if you want to see what such a report looks like.

Starting with those results from top to bottom, I started refactoring my code to improve the quality. For example, splitting up methods to:

  • Reduce cyclomatic complexity
  • Reduce the number of IL instructions in a method
  • Reduce the number of local variables in a method
  • Increase the comment to code ratio

This should increase maintainability of the code.

Go check out this tool if you are interested in improving the quality of your .NET code or if you are tasked with reviewing somebody else’s code.

1 Comment

After a long day and night of coding, I released version 0.8.0.0 of my Flickr Metadata Synchr tool on CodePlex this morning. I finally solved the long-standing problem I was having with the Windows Imaging Component (WIC) to update metadata. So this is the first fully-functional release of my application.

Functionality of FlickrMetadataSynchr v0.8.0.0

This is what the app does:

  • It allows you to select a set of your photos on Flickr and a folder on your hard drive with images.
  • It reads the metadata for both Flickr images and the local images. The metadata that is read is:
    • Title
    • Description
    • Author
    • Tags
    • Geo-info (GPS coordinates)
    • Date and time taken
    • Last update date and time
  • It matches images on Flickr with local images based on the date and time taken.
  • It determines on a per picture basis in what direction the metadata should by synced, i.e., which side should be updated, if any. Currently the most recently updated side wins. I am getting help from Timo Proescholdt for a better algorithm that will allow for a merge of metadata, i.e., a two-way synch.
  • It updates the metadata on Flickr and in the local images.

This is a screenshot of the app:

 

Previous posts on this tool

Workaround for WIC problems

During my holiday in France in July I received e-mail from Robert A. Wlodarczyk who works at Microsoft. He pinged me to say that he had released new sample code to update metadata using WIC. Yesterday, I tried to incorporate similar code into my application and ran into the same type of problems as before.

Because his sample was working, I wasn't ready to give up again. I finally tracked the problem down to a threading issue. WIC is throwing strange InvalidOperationException and InvalidFormatException exceptions with messages like "Cannot write to the stream" when it is called from a background thread. My app is multi-threaded so that the UI doesn't hang when it is busy syncing.

After I got confirmation from Robert that WIC indeed suffers from a threading issue, I solved the problem with a work around. I now marshal the call to the code that uses WIC to update metadata to the UI thread using the WPF Dispatcher object. This causes the app to become non-responsive for small amounts of time during the update of local metadata. But that is better than a non-fully-functional app.

Fully functional, give it a try

So all is well that ends well. After finally getting WIC to work, I could do away with the C++ code that was causing me headaches 😉 And my app now works on Windows XP again. You just need to have the .NET Framework 3.0 installed.

If you have images on Flickr and you have been busy tagging them, give my app a spin! You can always find the latest release on CodePlex. The source code is also available under a GPL license on CodePlex.

Installing is easy. You just need to unzip the ZIP-file, which contains three files, to a folder. Start the FlickrMetadataSynchr.exe file and you are done. The app remembers the last settings.

If you find any issues, please report them using the Issue Tracker for my app on CodePlex.

The Future

Even though the app is now able to sync metadata in both Flickr images and local images, there is always room for improvement. Here are my ideas, some of which are based of suggestions by people on CodePlex:

  • Improve the synchronization to also allow two-way synchronization for a picture pair. I.e., one side doesn't have to win. For example, if the Flickr image has just the title set and the local image the description, the metadata should be merged.
  • Add UI to see the match that is made by the tool and how it proposes to sync the metadata.
  • Allow you to exclude images if the match isn't good.
  • Allow you to overrule the sync proposal and sync the metadata in a different direction (on a per property basis) .
  • Add UI to store multiple mappings between Flickr sets and local folders. Currently the app only remembers the last folder and Flickr set that was used.
  • Add click-once deployment. That way the app can automatically check for new versions and update itself.

If you have any other ideas please post them at the discussions page for my app on CodePlex.

1 Comment

Today, I wanted to continue working on my FlickrMetadataSynchr tool after a break of a month or so. This project uses SaaS in the form of a hosted Team Foundation Server by Microsoft for source control and work item tracking. This SaaS is called CodePlex

Team Foundation Server is known to be a very robust source control system that is based on SQL Server 2005. You can cluster the database tier, you can have hot standby for the application tier, etc.

Yet, Microsoft was able to corrupt the source control database and not have a proper backup schema in place. I.e., they thought they were making backups of the database, yet they weren't.

That will teach me not to trust a third party with my precious data. So based on my current experiences I don't trust Software as a Service (SaaS).

Even worse. Three weeks after the fact, Microsoft still cannot tell if the source control data will ever be restored. At some point you just have to admit you screwed up and say that nothing can be done about it anymore.

Luckily, I still have the latest version of my sources stored locally. But it is the nature of an integrated source control and work item tracking system that you can't keep a full local backup of the state of the system. If Microsoft (or another vendor) screws up you loose a lot of historic data.

Another SaaS that I have become to depend on quite heavily is Gmail. Considering the perpetual beta status of Google Mail, I have never fully trusted them to keep my data safe from disaster. I am very diligent in backing up my mail locally in Outlook PST files using the POP3 access that Gmail provides.

Do you trust SaaS?

1 Comment

 


I am still looking for a good solution for my digital workflow for pictures on Windows Vista. I want to be able to easily tag my pictures locally on my hard-drive and on Flickr and keep the metadata in sync.


Let me describe the state of affairs.


Metadata in images


With metadata I mean the info I add manually to a picture, like the title, author, description, tags and GPS info. A digital camera already adds lots of other data to your pictures like aperture, camera model, if the flash fired or not, etc. This info is usually stored in an EXIF section in the JPEG file. For other metadata there are several options: EXIF, IPTC of XMP. IPTC is an older standard. XMP is more flexible (e.g., it allows Unicode) and modern.


The Golden Rule


This page states the golden rule of metadata:



Store the metadata in your images


Adding metadata to pictures


While in Redmond two weeks ago, I uploaded a subset of the pictures that I took to Flickr. I added the metadata through the site. Flickr does not embed this metadata in the file. If you download the original image file, the title, description and tags are gone. So I basically violated the Golden Rule.


A better option is to add the metadata before you upload your images to Flickr. Locally I can embed the metadata as IPTC or XMP in the JPEGs using a variety of tools. This way your metadata flows with your image to Flickr. If you download it again, the metadata is stil there.


The easiest option to add metadata to your images is to use the Windows Photo Gallery built into Windows Vista. This stores metadata in an XMP section in the image file.


Unfortunately, Flickr only imports metadata from EXIF and IPTC and not from the XMP section. So all the images that you tag in Vista show up untagged on Flickr.


Vista Flickr Uploader


My colleague Matthijs (who still won't reboot his blog, not even to market his own app 😉 has written a Vista Flickr Uploader tool to solve this problem.


His Vista Flickr Uploader app is written in C# and uses .NET 3.0, most notably Windows Presentation Foundation and the Windows Imaging Component. He open sourced it through CodePlex. This tool solves one way of the problem. When uploading your pictures, it extracts XMP info from your pictures and adds it to Flickr as metadata.


Microsoft Photo Info


As an alternative to tagging through Windows Vista Photo Gallery, you might want to try out the Microsoft Photo Info tool. This is a free download from Microsoft that works on both XP and Vista. It integrates into the Windows Explorer. The great thing about this tool is that it can read and write both IPTC and XMP info. So your metadata is recognized by Flickr when you upload your images.


Embedding metadata from Flickr


So Vista Flickr Uploader or Microsoft Photo Info solve one way of the problem: getting metadata from your images onto Flickr. The other problem still remains: getting metadata from Flickr into your images. I found one tool (.NET 2.0 based) that is able to download images from Flickr and embed the Flickr metadata as IPTC. It was quite unstable and wasn't really able to find my images so I won't link to it.


Remaining problem


Another problem that this download tool caused was inconsistency between the IPTC and XMP info in the image files. Imagine the following scenario:



  • Tag your images with XMP info in Vista.

  • Upload them with the Vista Flickr Uploader.

  • Flickr now shows the metadata from the XMP.

  • Change the title, description or tags on the Flickr site.

  • Download the image from Flickr with a tool that embeds the Flickr metadata into IPTC.

You end up with an image file with both IPTC and XMP info. The XMP info is out-of-date, yet it takes precedence in Windows Vista ;(


So I am looking for a tool that downloads pictures from the Flickr site that embeds the Flickr metadata as both IPTC and XMP info or just as XMP. I wasn't able to find such a tool, so I will probably have to write my own app to do this.


[Update 2007-08-31: I have now written such a tool: FlickrMetadataSynchr.]


ExifTool


A possible alternative might be to use a really powerful metadata tool and library written in Perl that allows you to extract and embed metadata in almost all known formats. Phil Harvey's ExifTool can even be used to read ID3 info from MP3 files. I could use it to wipe XMP info from images so the IPTC info is used again by Vista.


The problem with this command line tool is that it works really low-level with the potential of damaging the metadata beyond repair. I.e., other programs refuse to load your images because they can't make sense of the metadata.