When I say “Thousands of Images” it is not hyperbole. My faithful reader will remember that about at the first of the year I started going through my images since 2000. This undertaking has taken much longer than I expected. I have just reached the point of having enough duplicates removed that I can start reviewing the images. Over the next few days and weeks I am going to go through 16,427 images, while still adding more from recent shooting outings.  This process will be reviewing the images as to whether they are worth keeping or not; in focus, exposed right, people looking at camera… etc.

I am so paranoid about losing images and data I backup quite often. However my backup system was flawed, I had too many and I did not know what was current. Very difficult to be able to tell what images I should be reviewing and to make sure that I am not losing any. Now I am using ChronoSync to address this issue, but that is another blog post.

I tried to use LightRoom to help with this process and it did help quite a bit, but I still had to go through many images on my own due to an improper choice on reimporting after updating to SnowLeopard. It was operator error. When I started on January 1, 2010 the statistics of the image catalog were:

  • 41,880 images, 250.34 GB, backed up on 46 DVDs
    • 27,740 JPG Files
    • 2,248 CR2 Files
    • 718 TIF File (Scans)
    • 1,197 CRW Files
    • 9,964 DNG Files
    • 13 PSD Files
  • A date range from 2000 to 2009
  • Not all of these images were taken by me, so I want to remove them but keep them
  • Duplication was an issue
  • Duplication was an issue
  • My photography skills have improved over the past 10 years, so some of these images are just poor
  • Duplication was an issue
  • Some images are memories and it doesn’t matter the quality

Many of these images were available online through my SmugMug account. It has unlimited storage so I was using it. If you don’t have to delete an image, why would one? However the downside to this was the fact that I had no organization of the galleries, categories, albums, and too some degree keywords. I also was not doing a good job of synchronizing the two collections. So I decided that while going through this exercise I would do the same thing for my SmugMug collections.

The question I had to decide was which would be the leader and the follower. There really was no definitive answer to that one. Within SmugMug I had many images in JPG format at 80%, some of them were sized to meet the maximum sizes. Within the LightRoom Catalog some images were RAW Images with their original name straight out of the camera; some had been renamed, some had been resized, and some had been editted. So it was a variety of conditions. There was also the issue of what processing I ran the images through for how much EXIF information they still contained.

What I figured I would do is rename all the images again. Yes, ALL the images. The idea was to have LightRoom set the name to be the capture time, it would be formatted as “YYYYMMDDHHMMSS” and I figured that would be pretty good. After renaming them all, I found that I had lots of duplicates, which I know I would, but I had no easy way to identify them. I tried a new structure “YYYYMMDDHHMMSS – W x H” so I could quickly see which ones had been resized for various reasons. However not all of these images had the capture time still contained in them. I then would check on SmugMug and see if I could figure out the date and time from the EXIF information that was available.

I have not yet started to integrate the two collections yet. The reason for this mainly is that I am trying to keep the process moving in small steps as it can very easily become overwhelming just based on the amount of data I have. There is also the matter of how the files are organized. Within LightRoom it is stored as follows:

  • Year (YYYY)
    • Month (MM)
      • Day (DD)
        • File Name (YYMMDD-hhmmss.ext)

Within SmugMug it is not nearly as organized, really. The formatting is as follows:

  • Category
    • Subcategory (Optional so this might not be here)
      • Album

There are 206 albums, some have one image, and some are duplicates for different usage (such as Blog Links and Twitter Links). So it is not a simple option of simply combining the two together. Now I could do the Lightroom trick again, but I am hoping I don’t need to as I should have the originals and SmugMug is just copies. I know better than that, but I can dream can’t I?

Of course into each project something unforeseen must happen. Mine happened today, and it is just a huge time waster more than anything. My local mirror of my SmugMug account decided to fail; actually the hard drive that contained it failed. It was “only” a 60GB USB drive that was 3 years old. The most amazing thing is that for the same price I now have a 500GB FW800/FW400/USB2 drive that is much faster and reliable. I also have that backup once a week to the desktop 500GB drive.

As I write this I have two Windows “machines” downloading the images again for me. One is a headless Windows XP machine that I am controlling via RDC from my MacBook Pro; the other is a virtualized machine running on the MacBook Pro. Yes, it does get a little confusing to keep it all straight and have the files download in order, but I don’t mind that the overall speed is slower since I have two machines downloading at the same time. I also have the issue that if one stops while I am not watching it, the other one keeps going so the overall throughput is better.

I will share the results of the process over the next few days. The part that has me the most interested is the way that different programs handle tasks and how to manage that.

Leave a Reply

(required)

(required)