AdSense Mobile Ad

Sunday, December 30, 2012

Amazon S3 and Glacier: A Cheap Solution for Long Term Storage Needs

In the last few years, lots of cloud-based storage services began providing relatively cheap solutions to many classes of storage needs. Many of them, especially consumer-oriented ones such as DropBox, Google Drive and Microsoft SkyDrive, try to appeal their users with free tiers and collaborative and social features. Google Drive is a clear case of this trend, having "absorbed" many of the features of the well-known Google Docs applications, seamlessly integrating them into easy to use applications for many platforms, both mobile and desktop-oriented.

I've been using these services for a long time now, and despite being really happy with them, I've been looking for alternative solutions for other kinds of storage needs. As an amateur photographer, for example, I generate a lot of files on a monthly basis, and my long-term storage need for backup is currently in the tens of gigabytes per month. If I used Google Drive to satisfy those needs, supposing I'm already in the terabyte range, I'd pay almost $50 per month! Competitors don't offer seriously cheaper solutions either. At that price, one could argue that a decent home-based storage solution could be a better solution to his problems.

The Backup Problem

The problem is that many consumer cloud storage services are not really meant for backup, and you're paying for a service which keeps your files always online. On the other hand, typical backup strategies involve storing files in mediums which are kept offline, typically reducing the total cost of the solution. At home, you could store your files in DVDs, and keep hard disk space available for other tasks. Instead of DVDs, you could use hard drives as well. We're not considering management issues here (DVDs and hard drives can fail over time, even if kept off and properly stored) but the important thing to grasp here is that different storage needs can be satisfied by different kind of storage classes, to minimize the long-term storage costs of assets whose size is most probably only going to grow over time.

This kind of issues has been addressed by Amazon, which recently rolled out a new service for low-cost long-term storage needs: Amazon Glacier.

What Glacier Is Not

As soon as Glacier was announced, there has been a lot of talking about it. At a cost of $0.01 per gigabyte per month, it clearly seemed an affordable solution for this kind of problems. The cost of one terabyte would be $10 per month, 5 times cheaper than Google Drive, 10 times cheaper than DropBox (at the time of writing).

But Glacier is a different kind of beast. For starters, Glacier requires you to keep track of a Glacier-generated document identifier every time you upload a new file. Basically, it acts like a gigantic database where you store your files and retrieve them by key. No fancy user interface, no typical file system hierarchies such as folders to organize your content.

Glacier's design philosophy is great for system integrators and enterprise applications using the Glacier API to meet their storage needs, but it certainly keeps the average user away from it.

Glacier Can Be Used as a New Storage Class in S3

Even if Glacier was meant and rolled out with enterprise users in mind, at the time of release the Glacier documentation already stated that Glacier would be seamlessly integrated with S3 in the near future.

S3 is a cloud storage web service which pioneered the cloud storage offerings, and it's as easy to use as any other consumer-oriented cloud storage service. In fact, if you're not willing to use the good S3 web interface, lots of S3 clients for almost every platform exist. Many of them even let you mount an S3 bucket as if it were an hard disk.

In the past, the downside of S3 for backup scenarios has always been its price, which was much higher than that of its competitors: 1 terabyte costs approximately $95 per month (for standard redundancy storage).

The great news is that now that Glacier has been integrated with S3, you can have the best of both worlds:
  • You can use S3 as your primary user interface to manage your storage. This means that you can keep on using your favourite S3 clients to manage the service.
  • You can configure S3 to transparently move content to Glacier using lifecycle policies.
  • You will pay Glacier's fees for content that's been moved to Glacier.
  • The integration is completely transparent and seamless: you won't need to perform any other kind of operation, your content will be transitioned to Glacier according to your rules and it will always be visible into your S3 bucket.

The only important thing to keep in mind is that files hosted on Glacier are kept offline and can be downloaded only if you request a "restore" job. A restore job can take up to 5 hours to be executed, but that's certainly acceptable in a non-critical backup/restore scenario.

How To Configure S3 and Use the Glacier Storage Class

The Glacier storage class cannot be used directly when uploading files to S3. Instead, transitions to Glacier are managed by a bucket's lifecycle rules. If you select one of your S3 buckets, you can use the Lifecycle properties to configure seamless file transitions to Glacier:

S3 Bucket Lifecycle Properties

In the previous image you can see a lifecycle rule of a bucket of mine, which move content to Glacier according to the rules I defined. You can create as many rules as you need and rules can contain both transitions and expirations. In this use case, we're interested in transitions:

S3 Lifecycle Rule - Transition to Glacier

As you can see in the previous image, the afore-mentioned S3 lifecycle rule instructs S3 to migrate all content from the images/ folder to Glacier after just 1 day (the minimum amount of time you can select). All files uploaded into the images directory will automatically be transitioned to glacier by S3.

As previously stated, the integration is transparent and you'll keep on seeing your content into your S3 bucket even after it's been transitioned to Glacier:

S3 Bucket Showing Glacier Content

Requesting a Restore Job

The seamless integration between the two services don't finish here. Glacier files are kept offline and if you try to download them you'll get an error instructing you to initiate a restore job.

You can initiate a restore job from within the S3 user interface using a new Action menu item:

S3 Actions Menu - Initiate Restore

When you initiate a restore job for part of your content (of course you can select only the files you need), you can specify the amount of time the content will be kept online, before being automatically migrated to Glacier again:

S3 Initiation a Restore Job on Glacier Content

This is great since you won't need to remember to transition content to Glacier again: you simply ask S3 to bring your content online for the specified amount of time.

Conclusions

This post quickly outlines the benefit of storing a backup copy of your important content on Amazon Glacier, taking advantage of the ease of use and the affordable price of this service. Glacier integration in S3 enables any kind of users to take advantage of it without even changing your existing S3 workflow. And if you're new to S3, it's just as easy to use as any other cloud storage service out there. Maybe their applications are not as fancy as Google's, but their offer is unmatched today, and there are lots of easy to use S3 clients, either free or commercial (such as Cyberduck and Transmit if you're a Mac user), or even browser based S3 clients such as plugins for Firefox and Google Chrome.

Everybody has got files to backup, and many people is unfortunately unaware of the intrinsic fragility of typical home-based backup strategies, let alone users that never perform any kind of backups. Hard disks fail, that's just a fact, you just don't know when it's going to happen. And besides hard disk failures, other problems may appear over time such as undetected data corruption, which can only be addressed using dedicated storage technologies (such as the ZFS file system), all of which are usually out of range of many user, either for their cost or for their skill requirements for setup and management.

In the last 6 years, I've been running a dedicated Solaris server for my storage needs, and I bought at least 10 hard drives. When I projected the total cost of ownership of this solution I realised how Glacier would allow me to spare a big amount of money. And it did.

Of course I'm still keeping a local copy of everything because I sometimes require quick access to it, but I reduced the redundancy of my disk pools to the bare minimum, and still have a good night sleep because I know that whatever happens my data is still safe at Amazon premises. If a disk breaks (it happened a few days ago), I'm not worried about array reconstruction, because it's not an issue any longer, and I just use two-way mirrors instead of more costly solutions. I could even give up using a mirror altogether, but I'm not willing to reconstruct the content from Glacier every time a disk fails (and it's going to happen at least once every 2/3 years, according to my personal statistics).

So far, I never needed to restore anything from Glacier, but I'm sure that day will eventually come. And I want to be prepared. And you should want to as well.

P.S.: Ted Forbes has cited this blog post in Episode 118 (Photo Storage with Amazon Glacier and S3) of The Art of Photography, his excellent podcast about photography. If you still don't know it, you should check it out. Ted is an amazing guy and his podcast is awesome, with content that ranges from tips, techniques and interesting digressions on the art of photography. I've learnt a lot from him and I bet you will, too.

Wednesday, November 21, 2012

Creating a Compliant PDF for a Blurb Book with TeX

I'm an avid Blurb user since I receive my first photo book. They're just awesome. But I'm not going to talk about photos this time.

Today, I received my first text book, 440 pages, trade format (6x9 inches), printed in black and white and with hardcover and dust jacket. I opened it and it's just awesome. As awesome as a custom-made text book can be, at least for that price (approximately $33). The book is heavy, seems sturdy, doesn't look cheap, the text is finely printed and the paper is very good, with a nice creamy colour and nice to the touch.

I just tried Blurb's text and photo books, and they really live up to the expectations. If you're wondering whether you should try them: you definitely should. Furthermore, Blurb has recently expanded their offerings including magazines, brochures, planners and other products. I haven't tried them yet (and don't know if I will), but should I need one of them, I wouldn't hesitate to purchase them from Blurb.

The Downside of Using TeX with Blurb

TeX (and TeX-derived processors) are wonderful, provided you know how to use them. And if you have some scientific academic background, chances are you do. I've been a faithful TeX user since the 90's, and I never switched to other editors, not even for pure text books. TeX's advantages are manifold, but suffice it to say that the quality of the documents you can produce is high, much higher (from a typesetting standpoint) than what you can achieve with a basic desktop publishing software (such as Microsoft Word). And TeX is it's really WYSIWYG and portable.

Blurb provides many tools to easily create books and upload them to their site. Most of them take care of producing a Blurb-compliant book so that users have not to worry about that. Adobe Lightroom 4, for example, has a very flexible Book module that mimics built upon Blurb's BookSmart. Here you can find the update list of available tools for editors.

What's the problem with TeX? The problem is that the output of a TeX engine depends on the engine itself. For "generic" books, Blurb has got a PDF To Book feature that allows users to upload a Blurb-compliant PDF for print. And what's "Blurb-compliancy"? Blurb obviously checks for a lot of things about the PDF in order to make sure the book is printed as the client wants it to be. Basically, the pages in the PDF files must conform to the sizes specified by Blurb's PDF To Book Specification. which, on the other hand, obviously depend on your book geometry. For example, the number of pages and the paper type determine the height of a book's spine, and you've got to take that into account when designing your book cover. This reasoning applies to all parameters of a book geometry (such as page size, cropped size, bleed, margins and safe boundaries).

To complicate things even further, Blurb requires the PDF to be a compliant PDF/X-3 file. Needless to say, many PDF export tools (such as OS X save to PDF or pdfTeX) don't guarantee such compliance.

When it comes to TeX, then, apart from setting up your page geometry according to Blurb specifications (which is a problem which affects every publishing tool you may use), you've required to produce a compliant PDF/X-3 file, otherwise the Blurb preflight checks will fail.

The purpose of this blog post is describing how I solved this problem and help TeX/LaTeX users out there easily produce a Blurb-compliant PDF file.

My workflow usually is:
  • Typeset the book in TeX with approximate book specifications: you don't really know them until the exact number of pages is known, so don't even bother nailing them at the beginning.
  • Once the number of pages is known, I fine tune the book specs.
  • Once the book is ready for print, I manage to produce a compliant PDF/X-3 file.

Book Specifications

This is the easy part, and that's one TeX is really great for. On the other hand, since TeX is so flexible, how you achieve it depends on parameters such as the TeX engine and the packages you're using.

When editing text books, I usually rely on LaTeX and the memoir document class. Why? Because it encapsulate common functionality for book editing provided by lots of other packages (without dealing with each of them), it's extremely well documented and it's easy to use, especially when it comes to stock and page size tuning. If you use other engines or classes, however, don't worry: rely on their documentation to discover how to tune the page size according to Blurb specs.

Furthermore, as we'll see later, pdfTeX (and company) doesn't produce PDF/X-3 file, and you've got to rely on some tricks to produce a proper file. Maybe you should check other TeX engines (such as the excellent ConTeXt) which now supports PDF/X out of the box. Getting a Blurb-compliant PDF with ConTeXt is a no-brainer. But you might lose the possibility of using other packages you're used to (such as the memoir class, which I really like). If you're beginning your book, I suggest you take a look at ConTeXt and see whether it fits your needs.

Here's an example of a book setup in case you decide to use LaTeX and the memoir class. Let's suppose your book has got 390 pages, hardcover with dust jacket, black and white printing. According to Blurb, the book specifications are (in points):
  • Final PDF should measure: 441x666
  • Page size / trim line: 432x648
  • Bleed: 9
  • Inset for margins (top, bottom, outer): 18
  • Inset for margins (binding): 45 

This is a screenshot of the Blurb calculator:

Blurb Calculator - Specifications for a book

To tune your memoir document, you can use this code (I prefer to use inches in the book):

\setstocksize{9.25in}{6.125in}
\settrimmedsize{9in}{6in}{*}
\settrims{0.125in}{0.0625in}
\settypeblocksize{*}{\lxvchars}{1.618}
\setlrmargins{*}{*}{0.618}
\setulmargins{*}{*}{1}
\setbinding{0.625in}
\checkandfixthelayout

Please note that the settypeblocksize is not required to get a compliant book, it's just a suggestion to get a beautifully proportioned typeblock size.

Change the dimensions accordingly (remember that 1 inch corresponds to 72 point) to match your book dimensions using Blurb calculator. If you need more insights on the features provided by the memoir class, check its documentation.

Producing a PDF/X-3 Compliant File

This is the tricky part, and I was lucky enough to find a proven solution on the Internet (according to a discussion thread in TeX - StackExchange). In fact, I worried about this only when the book was typeset, and I started worrying whether I should switch to ConTeXt (and losing part of the typesetting work) or purchasing Adobe Distiller. Fortunately, there was a solution.

To have pdfTeX produce a compliant file, you should add this fragment just before the \begin{document} statement and tune it to your needs:

\pdfinfo{
/Title (Your Book Title)    % set your title here
/Author (The Author)        % set author name
/Subject (The Subject)      % set subject
/Keywords (The Keyword Set) % set keywords
/Trapped (False)
/GTS_PDFXVersion (PDF/X-3:2002)
}
% I think Blurb ignores both the MediaBox and the TrimBox, but I put it anyway
\pdfpageattr{
/MediaBox [0 0 441.00000 666.00000]
/TrimBox [0.00000 9.00000 432.00000 657.00000]}
\pdfminorversion=3
\pdfcatalog{
/OutputIntents [ <<
/Info (none)
/Type /OutputIntent
/S /GTS_PDFX
/OutputConditionIdentifier (Blurb.com)
/RegistryName (http://www.color.org/)
>> ]
}

The MediaBox and the TrimBox are the important parts, which establish the page geometry. As you can see, the MediaBox is set to the Blurb final PDF specification (beware that the box coordinates are swapped compared to the output of the Blurb calculator).

The TrimBox is a little trickier to "casual users". Since the bleed is 9, I set the first of the trim box corners to (0pt, 9pt) (that's why you'd better get point measures from Blurb) and the second corner was set to (432pt, 657pt). Why? 432pt is the trimmed size width, and you can leave it as is. Since the bleed affects the bottom and the top of the page, you can subtract it from the media box: 666 - 9 = 657. Assuming your text won't run so close to the trim lines.

Run your input file through pdfTeX and you should get a compliant PDF/X-3 file. The compliancy can be checked with the free Adobe Reader application. First of all, you should see the yellow trim box you specified on your PDF pages and, most importantly, you should get the following two entries into the document properties (both required by PDF/X-3):
  • PDF Version: 1.3 (Acrobat 4.x).
  • GTS_PDFXVersion: PDF/X-3:2002

The former is found in the main document properties tab (called Description), the latter into the Custom tab, as shown in the following picture:

Adobe Reader - Document Properties - Custom

What About the Cover?

The same reasoning applies. However, even if I consider myself an hardcore TeX user, I just rely on Adobe InDesign and Blurb's InDesign Plugin to get a good cover. You can certainly argue you can design it with TeX, and it's true, but when I'm dealing with images in a single page cover, I just prefer to use InDesign, especially since Adobe lets us rent it for approximately $30 bucks a month. I try to accumulate some jobs and process them on a single month.

Sometimes it's not possible and it certainly is a deal breaker for many, I do realise it. Anyway, as I already said, there are many other options, just check Blurb's website.

Conclusion

As you can see, producing a TeX book for printing with Blurb should be easy enough for any TeX user. The results are wonderful, both because TeX is a great engine and because Blurb is delivering awesome quality products at reasonable prices.

I hope this helps a lot of TeX users out there.

Monday, September 3, 2012

Architexa Review: Understand and Document Java Code Bases within Eclipse

People at Architexa have just released its Architexa tool for free to individuals and teams up to 3 members. Using their own slogan, Architexa is a tool "to understand and document Java code bases within Eclipse". At first it seems yet-another tool to generate UML diagrams from an existing code base (and it certainly is) but it has got other interesting features that differentiates it from its competitors. In fact, such tools have been around for a long time and some of them even enjoyed good adoption rates (think about Rational Rose).

Architexa, however, seems to focus on a niche: doing things fast and collaboratively, and I must acknowledge they're pretty close to achieving that goal. In fact, Architexa is not a fully fledged UML diagramming tool such as Rational Rose was or other plugins still are. Rather, it offers interesting features and possibilities using UML as the main UI. In my opinion, it's an important point to take into account when reviewing it.

The only catch is Architexa being an Eclipse plugin. Sure, there are lots of Eclipse users out there and giving priority to Eclipse is a sound choice on their part. However, there are lots of another-IDE-kind-of-guy out there (I'm mainly a NetBeans kind-of-guy) and developers working on corporations may not have the freedom to install their IDE of choice.

Installing Architexa

Installing Architexa is as straightforward as installing an Eclipse plugin. Just add the software repository URL you get during signup to Eclipse and install the plugin. The installation procedure is simple and almost unattended: the only question you'll be asked is trusting a certificate. Once the plugin is installed, just restart your Eclipse instance and you're ready to go.

Using Architexa

Architexa user interface is pretty straightforward:
  • A couple of menu items (both in the main menu and in contextual menus) to open one of the available diagrams.
  • Some menu items to manage Architexa indexes.
  • Some menu items to access Architexa documentation right inside your Eclipse IDE.

Three types of diagrams are currently available:
  • Layered Diagram
  • Class Diagram
  • Sequence Diagram

Depending on the context you invoke a particular item, Architexa will take you to the chosen element in the corresponding diagram.

Since I suspect this kind of tool is most useful to developers working on big projects whose entire code base they've got no complete knowledge of, I decided to review it importing a subset of an EJB module of an application I worked on in an empty Eclipse project and start from there.

Layered Diagram

The Layered Diagram looks like a package diagram at first, but there's much more to it than this. It's a very useful representation of your project architecture which, as every Architexa diagram, can be easily configured to contain just the level of detail you need. This way, you can get a quick overview of module dependencies into your project and you can progressively "drill down" in order to discover more detailed relationship between packages and between package contents (classes and interfaces) and other elements.

Here's how a layered diagram looks like:

Layered Diagram


As you can see, the diagram is divided into layers and when you hover a component with your mouse dependencies are shown as arrows of different sizes. By default, objects in an upper layer are dependent on objects on a lower layer.

In the previous diagram, the interceptors packages was collapsed, and you can expand it just double clicking on it. Furthermore, in the following picture you can see the dependencies of an element (AuditableEntityListener) just hovering the mouse pointer over it:

Layered Diagram - Discover the Dependencies of an Item

Architexa UI is great because it relies on intuitive concepts to convey information to the user. The size of the elements in this diagram are proportional to the "quantity" of information which is present in the project. This way, you can get important information with a quick glance, such as:
  • Which big packages and classes are.
  • Where a great number of dependencies are concentrated.
  • Whether your architecture contains cycles.

By default, the level of detail of the diagram is minimum but the user can expand it "on demand". In the following picture you can see how you can add additional information about an element just hovering over it and using the dependencies arrows to add additional levels to the type hierarchy:

Layered Diagram - Add Levels to the Type Hierarchy

Layered Diagram - Add Levels to the Type Hierarchy

This diagram offers a pretty restricted palette you can use to add insightful detail to your diagram in order to use it as a good documentation tool. The palette currently includes elements such as actors, databases and user comments.

Class Diagram

At first sight, a class diagram behaves more or less like a class diagram generated in any other UML diagramming tool you may have tried. However, it's based on Architexa's design philosophy we saw in the previous section: reduce the clutter and get the job done. When you generate a class diagram from a class, you're just presented something like this:

Class Diagram

No doubt that's just the bare minimum you need to know about a class. Once again, you can have Architexa add the information you're interested in using its user interface. First of all, at the class level, you can add referencing types and methods. Then, you can add other class information (such as methods, interfaces, etc.) using the menu that appears when hovering the class and selecting the items you want to show:

Class Diagram - Select Class Information

As you can see, you can filter methods by visibility and items by type (interface, class, methods and fields). If you want to add them all, just use the Add All button:

Class Diagram - Method Information

You can further refine the information of a class item using its contextual menu shown in the next picture:

Class Diagram - Class Item Contextual Menu

As you can see, you can add a wealth of information, such as called and calling methods, referenced and referencing methods and declaring class.

Class Diagram - Types Referenced by a Method

Finally, there's also a quick way to add called method information. When you hover a method, an arrow control is shown:

Class Diagram - Add Called Methods

and if you click it you'll be presented a dialog where you can choose the called methods you want to add:

Class Diagram - Called Methods Selection Dialog

The dialog can be used to filter methods by visibility and there are two buttons that allows you to add either all called methods or the callee hierarchy. The resulting diagram looks like this:

Class Diagram - Called Methods

Obviously, this process can be applied to any element added to the diagram until you've added all the information you want to show.

Sequence Diagram

The sequence diagram provided by Architexa is very similar to diagrams generated by similar tools. Once again, however, Architexa philosophy is reducing the clutter and letting the user decide what he wants to be shown in it.

An initial sequence diagram looks like this:

Sequence Diagram

Hovering over the class lets you choose members to be shown into the diagram, using the method selection window we've already described in previous sections:

Sequence Diagram - Method Selection Window

Once a method is chosen, the diagram is updated:

Sequence Diagram - Selected Method

Hovering on other diagram elements let you add depth to the call hierarchy, selecting more and more levels to be shown. Adding methods called by the persist method results in the following diagram:

Sequence Diagram - Methods Called by the persist Method

Collaborative Features


Architexa provides basic collaborative features and lets you share diagrams with other people. The Architexa main menu contains the following items:

Collaborative Features

As you can see, you can get diagrams from a server or share your own. When you share a diagram, you're given two choices: sharing using a server (which acts as a central repository) or sharing by email (which simply attaches the diagram to a newly created email message):

Sharing a Diagram

Another way you can share a diagram is presented when you save a newly created diagram:

Saving a Diagram

Architexa lets you save a diagram as either:
  • A file in the local disk.
  • A shared diagram in a private server.
  • A shared diagram in a community server.

Conclusions

Architexa is not the typical UML diagramming tool in that it's built with a different design philosophy. Instead of "just" producing diagrams out of an existing code base, it lets the user customise the diagrams and decide the details that must be included in an easy and intuitive way. This fact fulfils the Architexa's slogan promises: it's a great tool to create diagrams that "make sense", according to each user's needs.

If you haven't tried it, it may seems just a "nuance", but it's a great usability leap for such a tool in the right direction. I've been an user of UML modelling tools for years, and I grew more and more dubious about the alleged improvements in developers' productivity. Most of the times, if not all, I ended up always relying on my textual IDE to navigate through the code base, jumping from method to method as needed. This time, however, I feel that Architexa can fill a gap and can really be useful to a developer, not only in the documentation stage, at least in a handful of use cases. Architexa UI is very efficient and pretty intuitive and during the tests I performed I felt very "proficient" at jumping from a dependency to another or from a method call to another.

But all that glitters is not gold, and Architexa has got its own shortcomings. First of all: it's an Eclipse plugin and it's not available for other IDEs. This is a deal-breaker for many users, such as I, who are not willing to switch their IDEs.

Then, some important Java language features are missing, such as generics. Generic aren't new kids on the block (two major Java releases have seen the light after support for generics was added to the language). They can't be dismissed as something of little importance, either. I don't know why no information about generic types and signatures appear on Architexa diagrams, but I hope this gap will be filled soon.

Then, I'd really like to see "awareness" about more Java technologies built into the tool. When I started reviewing it, I decided to use a fragment of an EJB module to see if there were more bells and whistles than what I was reading on other reviews which used simple Java projects. Given Architexa's design philosophy, I'd really like to see more information about classes, at least in the Layered Diagram. Furthermore, since many recent Java EE technologies heavily relies on runtime-available annotations, such as EJB and JPA, I added entities and EJBs to the project to see whether annotations were discoverable information: unfortunately, they were not.

Architexa is a good tool and I think it's in the right track to catch on developers and rise adoption from the bottom up. Developers can take good advantage of it and I believe it's a critical aspect for such a tool to gain adopters. Instead of being a tool imposed to them for methodology's sake, it's a tool that adds real value and can get their job done more easily and more effectively. Furthermore, Architexa is now free for individuals and small teams (up to 3 members) so that everyone can sign-up, download it and start to use it on its own, real life projects.

Friday, August 17, 2012

Night Photography: A Tip to Photograph Stars (and Other Point Light Sources)

Mastering night photography is not that difficult, nonetheless it has its own peculiarities you should be aware of. In this blog post we will see how one of the basic rules we learn about exposure is no longer valid when shooting stars.

One of the first things you certainly learnt when you started learning photography was how exposure is determined by three parameters: aperture, shutter speed and ISO sensitivity. Each one has multiple effects on the final result (most notably depth of field, motion blur and noise), but each one can be used to determine how much light enters your camera and reaches the sensor. Aperture, though, has a peculiarity: it's not an absolute measure, but a relative one. In fact, the f-number is not a measure strictly speaking: it's a pure number. The f-number N is the ratio between the focal length f of the lens and the diameter r of the entrance pupil:


Basically, the luminance (the "brightness" of the resulting image) depends only on the relative aperture and not on the absolute value of either lens parameters alone. In fact, when evaluating exposure, you just use the f-number: no matter the focal length or, more generally, no matter which lens you're using, if the f-number is the same, exposure is going to be the same. If you stop your aperture up or down, exposure will stop up or down accordingly.

This is true most of the time and is a consequence of the physical model of an optical system such as a single aperture camera (or the human eye). We've seen many time the equation


which summarizes this basic rule: luminance (in f-stops, hence the logarithm) is proportional to the square of the aperture N and inversely proportional to time t the shutter remains open.

What Happens When Shooting Point Light Sources?

When shooting stars, or more generally point light sources, however, the model changes and this result is no longer valid. A point light source, in this context, will be defined as a source of light whose size in the resulting image will smaller or equal to one pixel. Perfectly in focus, and depending on your sensor's resolution, some stars and planets may in fact appear bigger than one pixel, but not that much. Hence, this approximation can be considered good enough.

The reason why this happens is not complicated but requires some knowledge of Mathematics and Physics but since a photographer is usually only concerned with results and the rules to apply, I'll try to provide just a very summarized and intuitive explanation.

Let's start with a couple of analogies, although pretty "rough". It's absolutely intuitive that, the farther from a sound source, the fainter the sound you perceive. It's also intuitive that when shooting with a flash, the farther from the subject, the fainter the light that reaches it and, hence, the fainter the light reflected to your camera sensor. Now: why doesn't a similar effect exists when shooting any subject? A picture is produced by the light reflected on the subject: why isn't exposure affected by the distance from it?

It turns out it's a consequence of two competing phenomenons which, under certain circumstances, "balance" themselves and cancel out the contribution of the distance. It also turns out that the result is the general well known law we were talking about at the beginning of this article, hence the importance of the relative aperture, the f-number, in the field of photography.

On the other hand, when shooting point light sources (as the majority of stars in the night sky can be considered) the two competing phenomena don't balance themselves any more. In fact, one of the two practically disappears and the focal length f of the lens doesn't affect exposure any more. In this case, the result is similar to what we described in the analogies we above: luminance is inversely proportional to the distance from the subject but, much more importantly, it is proportional to the diameter of the entrance pupil. Having disappeared f from the equation, the result depends solely on r2 (a quantity proportional to the area of the entrance pupil) and not on the relative aperture. This fact is somewhat intuitive, if you think about it: the larger the area of the entrance pupil, the more light it can gather. Seen from this point of view, in fact, the usual rule is probably less intuitive: lenses configuration with the same aperture N may have entrance pupils of different sizes. Why, then, they give the same effect? That's because of the two components we were talking about, but we won't enter into mathematical details.

Since


it turns out that focal length does affect the final exposure, given N.

How? This model predicts that an increase in the focal length of the lens keeping the aperture N fixed increases exposure, since it increases the area of the entrance pupil. Although you won't be usually shooting skies with long lenses, you could take advantage of this fact to reduce shutter speeds, especially taking into account that detected luminance varies with r2 and, hence, with f2, the square of the focal length.

Some estimations are quickly done: if you increase the focal length from, let's say, 18 to 35 (using the same aperture), you'll increase the quantity of light reaching the sensor of a factor


that is, 2 stops.

It's important to realize that this effect applies only to point light sources, that is, small stars whose size in the picture is comparable to, or smaller than, the size of a pixel. It doesn't apply to the moon, to bigger stars and planets and not even to the sky itself. Nevertheless, it's a good trick to know if you want to maximize the number of visible stars in your picture.

Sometimes you may be tempted to stop down the aperture to have a better focus at infinity, especially when the lens you're using hasn't got a hard stop at infinity (many cheaper lenses, such as most Nikkor DX lenses, have not). In this case, instead of indiscriminately or heuristically stopping down the aperture, use the hyperfocal distance instead (which we talked about in a previous post) to get a good focus lock at infinity and determine exactly the depth of field you need. If you can, open your lens as much as you can.

Sunday, August 12, 2012

Adobe Photoshop Lightroom Tutorial - Part XXIV - Organising Your Photo Catalog Using Metadata and Keywords

Part I - Index and Introduction

Metadata, in one of its simplest form, is defined as "data about data". In the case of photography, for example, you may think about EXIF data attached to your image: they provide technical information (camera settings, geolocation information etc.) about the picture. Depending on the tool you use, you can go beyond what's provided by standards (such as EXIF or IPTC) and provide your own metadata.

What's the point of using metadata? The basic idea is organizing your images and thus being able to make searches based on some criteria. For example, you may want to search for images shot with a specific camera, or with a specific lens; or you may looking for pictures taken at a certain shutter speed, aperture or geospatial coordinates. Or you may be willing to search for pictures using non-technical criteria, such as a portrait shot at a wedding and processed in black and white. Can you imagine what your Internet experience would be if search engines didn't exist? You couldn't find a way to the information you're looking for, and the very concept of "Internet" as you know it would be defied. The same thing happens with your photo catalogs. How could you possibly find something if you couldn't search using the criteria you need? Amateur photographers with small catalogs may be able to find the pictures they're looking for manually scanning the catalog, or trying to remember which folder or collection a picture is in. But as soon as your catalogs grow larger and larger things get worse and the problem starts to be insurmountable. That's why some products exist which provide the tools you need to overcome this problem. In fact, there's a dedicated category of such products: image management databases and Adobe Photoshop Lightroom is one of them.

If you're using Lightroom, you already know your images are stored into a catalog which acts as a "proxy" between you and the images managed by Lightroom. The catalog is basically a database which stores additional information (metadata) alongside your images. Such metadata makes the database searchable, so that you can look for images using search criteria. Lightroom, in this respect, is extremely helpful and powerful in that:
  • It comes with out-of-the-box support for an extensive set of well known or frequently used metadata (such as ratings, EXIF and IPTC).
  • It lets you extend the metadata model using your own keywords.
  • It lets you easily build search criteria mixing and matching any type of searchable field.
  • It lets you define smart collections, that is collections of pictures whose content are defined by a search filter and are automatically updated.

These are just the most important features provided by Lightroom, and we'll discover more of them in the following sections.

Flags, Ratings and Labels

The simplest forms of metadata you can catalog your images with are flags, ratings and labels:
  • Flags are used to pick or reject an image.
  • Ratings are used to rate images on a scale from 0 stars to 5.
  • Label is a one of 5 color codes (Red, Yellow, Green, Blue and Purple) that can be assigned to an image.

While the meaning of flags and ratings is pretty well defined, the meaning of labels can be customized by the user. By default, labels are just "colours" but their name, and thus their meaning, can be customized to be meaningful for the user. Lightroom, for example, provides an additional naming scheme for labels, inherited by Adobe Bridge, that uses the following convention:
  • Red: Select
  • Yellow: Second
  • Green: Approved
  • Blue: Review
  • Purple: To Do

You're free, however, to assign your own meanings to colour labels. In my workflow, for example, I just use the common three traffic light colours (red, yellow and green) to transition images from the undeveloped, partially developed and done states.

In the following picture, you can see a screenshot of some pictures in my catalog. Three of them (the first, the second and the third) are flagged because I picked them, rated (with 4, 3 and 4 stars respectively) and labelled green (because I finished processing them). The second image is unflagged (it's neither rejected nor picked), unrated (0 stars) and partially developed (yellow label).

Flags, ratings and labels

The basic rating and labeling metadata are flexible and easy to use and can be largely adjusted to any development workflow. In my workflow, for example, flags are used before starting developing images in order to pick only the pictures eligible for development. Images to be deleted, are marked as rejected and deleted pretty soon. Images I'm not sure about are left unflagged even if, eventually, they'll either be picked or rejected (and thus deleted). Ratings are usually applied at the end of the development process and are usually immutable while colour labels are just a quick visual aid to quickly identify pictures I should be working on. Eventually, when I finish developing a folder (or collection) all pictures will be labeled green.

Metadata

Images can be assigned metadata of many kinds. Lightroom support many kinds of metadata including:
  • EXIF
  • IPTC
  • DNG
  • Location
  • Metadata defined in a custom plugin
Lightroom can also read proprietary metadata (such as proprietary EXIF extensions) found on an image, but in this case it usually gives no way to modify it. In fact, Lightroom won't even read all proprietary metadata: if you're interested in reading a field not visible in Lightroom, you should look at the excellent ExifTool by Phil Harvey (a command line tool I will probably write a post about in the future).

Metadata can be inspected and modified using the Metadata panel in the Library module:

Metadata Panel

As you can see in the previous image, the Metadata panel shows information about the chosen type of metadata (in this case EXIF and IPTC) and lets you modify the writeable fields. At the topmost part of the panel, Lightroom provides some commonly used fields (Rating, Label, Title, Caption, etc.) as a convenience to speed up metadata editing.

To change the currently displayed metadata category, you just need to select the desired one in the list box in the upper left corner of the panel. In my Lightroom setup, these are the available choices:

Metadata Categories

If you're a developer, you could also extend available metadata writing a custom Lightroom plugin using the Lightroom SDK. Most users (even professional ones), however, will be just satisfied with what Lightroom offers out of the box.

Location Metadata

Location metadata is the perfect way to geographically localize where a shot was taken. Nowadays, many cameras populate these fields using data gathered by a GPS device, such as modern smartphones or GPS-equipped cameras. Many DSLR, however, still lack this functionality and their pictures require the user to manually introduce location metadata.

Up to Lightroom 3, location metadata were just made up of text fields, but with the latest Lightroom release (v. 4 at time of writing) you can use the Map module to populate these fields by dragging and dropping images over a map:

Map Module

Once an image is dropped over the map, Lightroom will automatically update its location metadata, as you can see in the following picture:

Location Metadata of a Photo

The Map module also provides the possibility of saving a location, a functionality that can greatly speed up your workflow. To save or load a location, just use the controls found in the Saved Locations panel of the Map module.

Since location metadata may contain sensitive information you're willing to protect, you may want to ensure that information about some locations are never exported. That might be the case of information about your home location, for example. To have Lightroom protect a location, you can add it to the list of saved locations and mark it as private:

New Private Location

In the New Location dialog box, you can specify a radius which will determine the area of the (circular) location you're saving and a checkbox that can be used to mark it as private. If an image is tagged into a private location, the corresponding metadata will never be exported, no matter which export mechanism or publish service is used.

Applying Metadata Changes to Multiple Images

Very often you find yourself applying the same metadata to multiple images. For example, it may often be the case for metadata in the location, copyright, contact and workflow categories. Lightroom offers two ways to perform "bulk" metadata changes:
  • Metadata synchronization.
  • Metadata presets.

Metadata synchronization is very similar to develop settings synchronization: you apply the modification you need to a picture and then sync other pictures with it. To sync metadata with a reference image, just select all the images to be synced paying attention that the reference image be the first image in the selection set. Once the images are select, press the Sync Metadata button in the bottom left corner of the right module panel:

Sync Buttons

Ligthroom will present a form in which fields to be synced can be chosen and copied to the metadata set of the other images.

Metadata presets are a very similar concept, with the difference that metadata values are saved in a preset (instead of copied from a reference image) and applied to a set of images. To create a preset, select the Edit Metadata Preset item in the Metadata menu or in the Preset listbox in the topmost section of the Metadata panel. A form will be presented in which fields to be saved in the preset can be chosen. A saved preset can be applied to one or multiple images simply selecting the corresponding preset in the Preset listbox.

Metadata presets are handy when a set of metadata values is frequently applied to many photos. To speed up my workflow, for example, I created a preset for each of the fixed sets of metadata I commonly use, such as contact information, copyright information and common locations where I use to shoot. Metadata synchronization, on the other hand, is more suitable when many images share a common set of characteristics (same job, same event, same model, etc.) whose value, however, aren't worth creating a preset which will most likely be scarcely reusable.

Keywords

Lightroom lets you apply keywords to an image. Keywords, or tags (an alternate name used in other contexts such as Flickr or Google+), are just text labels that can be searched for: hence, they provide a mean for an user to freely organize the catalog using user defined "keys". In this sense, keywords are the building blocks you use to organize your catalog "your own way". Metadata described so far, in fact, was related to some technical aspects of a picture or some standard attribute set (camera settings, location, etc.). Keywords, on the other hand, are "the words you use to describe a picture" and, hence, to keep your catalog organized using words meaningful to you.

You may want to define, for example, keywords for each style of photography you produce (portraits, landscape, etc.), for treatments you apply (duotone, sepia, black & white, cropped, etc.), names of persons appearing in a photo, etc.

Adding and Removing Keywords

To assign keywords to a picture, just use the Keywording or Keyword List panels of the Library module. The Keywording panel, shown in the following picture, is made up several distinct controls:
  • The Keyword Tags list box, used to change what's shown in the keywords box.
  • The keyword box, where you can see a list of keywords applied to your image whose content depend on the current Keyword Tags selection.
  • A text box that lets you add keywords.
  • Two grids, Keywords Suggestions and Keyword Set, which provide a visual shortcut to two set of keywords.

Keywording Panel

By default, the keyword box shows the keywords currently assigned to the selected picture(s). In case of a multiple selection, a keyword that's only assigned to a subset of images is postfixed by an asterisk (*). To add a keyword, you just need to type it in the keyword text box and press Enter: the keyword will be added to the currently selected image(s) and created (with the default options) if it's the first time you use it. If you want to tweak the behaviour of a keyword, as described in the following section, you maybe want to create it manually before entering it or manually changing its options later. I usually prefer creating them manually in order not to forget changing their options afterwards.

To speed up your workflow, Lightroom lets you quickly select keywords from two 3x3 grids: Keywords Suggestions and Keyword Set. The former contains suggestions based on last used keywords, based on the keywords currently applied to an image; you'll see that adding or removing keywords to the current image triggers a suggestions change as well. The latter, on the other hand, is made up of a static list of keywords you can create, save and use. Using the listbox on the right side of the keyword set grid (showing Outdoor Photography in the previous image) you can select, create, edit and remove your own keyword sets. Lightroom ships with some example sets but you should create your own, reflecting your "keywording habits" for each kind of photography you're interested to.

To remove a keyword, just deselect it from one of the suggestion grids or manually delete it from the keyword list.

Keyword List

The Keyword List panel shows a graphical representation of the current keyword tree. In fact, keywords aren't just a flat set of tags: Lightroom let you organize keywords into a hierarchy, as shown in the following picture:

Keyword List

The quickest way to build is a hierarchy out of already existing keywords is just rearranging them with your mouse. If you drag a keyword over another, the former will convert into a child of the latter.

But what's the point of building and maintaining a hierarchy? It's not only constraining the size of a keyword list that could potentially grow to a considerable size. The keyword hierarchy allows you to organize concepts in a tree establishing an "is a" relationship with the containing keywords. In the picture above, for example, the keyword cat is the leaf of the subtree animal/mammal/cat. That is: in the hierarchy I'm using the cat is a mammal and is an animal. This way, you don't have to tag an image three times (cat, mammal and animal) but just one: cat.

You can tweak how keywords behave in the hierarchy. In the example above, we wanted cat to be a mammal an an animal. There may be cases where you just use the hierarchy for organizational purpose and don't want a picture to automatically acquire all the containing keywords as well. For the same purpose, you may want to organize your catalog using certain keywords (such as names) but you want to prevent those keywords to appear elsewhere, such as in exported or published images. In this case, you can just edit a keyword and specify the behaviour you need (right clicking on it and selecting Edit Keyword Tag):

Edit Keyword Tag

In the Edit Keyword Tag window you can see how the behaviour of a keyword can be tweaked:
  • A keyword can be included in an export (or publish) operation if the Include on Export checkbox is selected.
  • A picture will inherit containing keywords if Export Containing Keywords is selected.
  • A picture will inherit a keywords synonyms (more on this in the following sections) if Export Synonyms is selected.
If you want to prevent a keyword to be exported or published, just deselect Include on Export: no matter how you export or publish image containing this keyword, Lightroom will remove it.

Effective Keywords

We've just seen how the set of keywords applied to an image not only depends on the ones you explicitly added but also on the behaviour and the operation keywords are considered for. If you want to inspect the list of keywords effectively applied to an image you can use the Keywording panel and choose the list you're interested in.

In the Keywords section we've seen that the Keywording panel features a Keyword Tags list box. Depending on your choice, the behaviour of the panel will change:
  • Enter Keywords: the default choice, whose functionality has been described in the Keywords section.
  • Keywords & Containing Keywords: if you choose this option, the panel will turn read-only and will show the list of all keywords inherited by the image (as described in the Keyword List section).
  • Will Export: if you choose this option, the panel will turn read-only and will show the list of all keywords that will be exported with the selected image.

Synonyms

In the Edit Keyword Tag window you may have notice a Synonyms text box whose functionality we haven't described yet. Lightroom lets you associate a set of synonyms to a keyword: synonyms can be thought as an additional list of keywords associated with a keyword, whose primary purpose is search. A synonym, in fact, won't even appear in the keyword list and can only be consulted checking the configuration of each specific keyword.

Many users wonder when a synonym or a keyword should be used. That really depends on how you build your own keyword hierarchy but here are some guidelines. You should try to keep your keywords hierarchy simple, clear and intuitive so that your workflow is smooth. Also, keywords represent concepts and we know that, literally speaking, a pure synonym doesn't offer nothing new, just an alternate spelling. This is a clear case in which a synonym should be created instead of yet another keyword.

Other times, some words just doesn't fit well into your keyword hierarchy. Let's take my animal hierarchy. I defined a cat as a mammal and an animal. What about pet? Or clawed? Or furry? You cannot feasibly build a hierarchy containing all nuances that can possibly come to your mind. Also, a cat can certainly be considered a "pet", but a bird can as well. But in my hierarchy, a cat is a mammal while a bird is not. What should I do? Create a pet keyword for each of them? You can easily see how that hierarchy can become more and more cluttered if we try to introduce these concepts. In this case, you'd better use a synonym. A cat may be a synonym of "pet", of "clawed", as well as a bird may.

If you then want a synonym to be inherited from a keyword, you can select the "Export Synonyms" options of the affected keyword so that it will be explicitly listed in the keyword list of an exported or published photo.

Searching and Filtering

Any kind of metadata can be used to search and filter your catalog images.  In Part V of this tutorial we already described the basic search and filtering facilities of Lightroom, so that we will only summarize them here.

A filter by flags, ratings or labels can easily be built using the Attribute filter bar. As you can see in the next image, you can just select on the graphical user interface the values you're interested in and Lightroom will filter the contents of the currently selected folder (or collection) according to your choices.

Attribute filter

If you want to filter using keywords, you can use multiple techniques. The first is using a Metadata filter. You can configure a Keyword column for such a filter and select the keywords you want to filter with. In the following image you can see how Lightroom intelligently makes things easy including only used keywords in the currently selected folder (or collection).

Metadata Filter (stacked with a Text Filter)

Another way to search using a keyword is using a Text search. You can either freely search on every searchable field or narrowing the choice specifying the field you want to look for (as seen in the following images).

Text Filter

Text Filter - Searchable Field

The last method you can use to filter for a specific keyword is using the Keyword List panel to build a quick filter. When you hover a keyword with your mouse, a small arrow appears on the right of the keyword record, as highlighted in red in the following image. Pressing the arrow control makes Lightroom create a Metadata filter using the selected keyword. If you need to filter with just one keyword, this method is probably the quickest one.

Keyword List

Filters can also be stacked together to build even more complex search queries, mixing and matching criteria built onto any metadata field that Lightroom manages:

Multiple Filters Stacked Together

Conclusions

As we've seen, Lightroom provides excellent management capabilities and you can keep your catalog perfectly organized with very little effort. The point of an image management database is just this: making your ever growing catalog manageable. There would be no point in storing thousands of images if you couldn't effectively use the tool to quickly retrieve what you're looking for.

Some photographers start to use this kind of tools without realising their real potential nor the issues they'll start experiencing when their catalogs overgrows a set of few hundreds images. Lightroom, furthermore, is an excellent tool to develop your RAW files and it's easy to forget about it being a catalog manager as well.

That's why is very important to learn about these feature soon and start applying them to your regular workflow. The sooner, the better.

Protect Your Privacy

I want to stress once more how Lightroom can help you protect sensitive data (keywords and location information) so that you don't accidentally publish them.

Protecting the location information is probably easier because, even if geolocation data is often added automatically and you may forget about it, Lightroom gives a simple solution to this problem: just create a private location specifying its centre and its radius and just forget about it.

In the case of keywords, marking them as not exportable is up to you and you may easily forget, especially if you get used to creating them automatically from the Keywording panel. Lightroom, furthermore, does not separately manage subject names as other tools do and this fact induces users to define a keyword hierarchy for it. If you forget to configure each and every keyword according to your needs, private data may unexpectedly leak.

If you want to help me keep on writing this blog, buy your Adobe Photoshop licenses at the best price on Amazon using the links below.

Saturday, August 4, 2012

Lightroom Users Upgrading to Mountain Lion: Back Up Your Adobe Camera RAW Cache Directory

To improve Lightroom's performance, Camera RAW maintains a cache which speeds up some stages of an image processing. By default, the cache size is set to 1 GB, but you should increase it (as Adobe suggests) to store more image data in order to speed up preview generation of cached images. Since hard disk space is hardly an issue nowadays, I've set it up to 32 GB and disk usage is currently around 8 GB (with some thousands of images in my catalog). The beneficial effects of the cache are easily seen, that's why the cache directory is now included in my standard Lightroom backup.

Some days ago, I updated my Macs to Mountain Lion with the hope of benefitting from its performance improvements. The update process was easy and flawless but as soon as I opened my catalog I started suspecting something was wrong. A quick research confirmed my suspicions: Mountain Lion's update process had completely wiped away the ~/Library/Caches directory which, by default, contains the Adobe Camera RAW cache directory. 8 GB worth of data swept away without even asking: not good and not fair. If Apple wanted to clean its program caches, the installer could have limited to cleaning just those.

Fortunately, I could restore it from my latest backup and I was soon back to work. The bottom line is: if you're upgrading to Mountain Lion, you'd better backup your Adobe Camera RAW cache directory, unless you don't mind Lightroom recreating it from scratch.

If you still don't know your cache size (chances are it's still the default 1GB) or your cache location, you can check it in the Ligthroom preferences (File Handling pane):

Lightroom Preferences - File Handling

You should increase the cache size to increase the number of image data that can be stored: that depends on the number of RAW images in your catalog. In my system, I'm observing an average 1 MB worth of data per image. You can also change the cache location in case you prefer storing it closer to your catalogs for backups' and "visibility's" sake (by default the ~/Library folder is hidden in Finder).