Using command line tools to retrieve original images from Picasa Web albums

Today I tried to share a list of photos from my best friend’s wedding with various internet services. Since Facebook for Android is not capable of uploading multiple images at once I decided to upload them to Google Picasa Web albums first. Afterwards I switched their visibility to  a restricted circle of users and shared them with Google+. Unfortunately not many wedding guests use G+ but mostly favor Facebook. Lazy as I am I didn’t want to transfer the photos from my mobile to the desktop and upload them again (using the MTP protocol propagated by ICS/JB//Nexus doesn’t work well on Linux anyway) so I looked for a way to download all of the Picasa Web Album shots using the Picasa API.

Here you’ll get an overview of available endpoints: https://developers.google.com/picasa-web/docs/2.0/developers_guide

Note that the Picasa Wen Albums Data API strongly deals with XML, media and ATOM formats so it should be rather easy to parse the results. I decided to go with the album list first:

https://picasaweb.google.com/data/feed/api/user/<yourUserId>

You can use the long integer that is shown on your Google/+ profile’s URL as <yourUserId>. That one yields a document containing all albums. Note that you have to get a valid OAuth access token if you’d want to get the contents from within your own software – for this simple usecase it’s sufficient to use a browser window that holds your Google authentication cookies. To save the results you can simply save a page locally.

That document will yield an ATOM document containing the authenicated user’s albums. Find the album that you want to retrieve pictures from and copy the href-link from the <link rel=’http://schemas.google.com/g/2005#feed&#8217; … > element. Note that by default settings that document won’t contain links to the full size images. To let the API yield those, add another parameter to the URL, right after the auth-key: &imgmax=d . The url should look like that:

https://picasaweb.google.com/data/feed/api/user/<yourUserId>/albumid/<albumId>?authkey=#####&imgmax=d

Open that one in your browser window again and save its contents. Note that the API restricts you to 30 images by default. If you need more add yet another parameter: &max-results=1000. Now you can parse the XML result using your favorite tools. Since I’m on Linux I chose xmlstarlet that can be easily installed using apt:

sudo apt-get install xmlstarlet

This little XML-suite contains tools to execute XPath expressions on documents from the command line. Additionally you could also apply XSLT transformations. To get the plain image media sources from the last document retrieved you’ll issue this command:

xmlstarlet sel -t -v "//media:content/@url" your_document.xml > origs.txt

origs.txt will now contain a line-broken list of original images. To finally retrieve them you can use wget (that should come with your standard installation; if not: sudo apt-get install wget):

wget -i origs.txt

and voila: there are all your images 🙂

Links:

Advertisements

4 thoughts on “Using command line tools to retrieve original images from Picasa Web albums

  1. Error happens when I run the xmlstarlet command, I have no idea what to do. 😦
    XPath error : Undefined namespace prefix
    xmlXPathCompiledEval: evaluation failed
    runtime error: element value-of
    XPath evaluation returned no result.

  2. Hi Zach. This error definitely arises from the underlying xml2 library. What it says is that the document you’re trying to parse is a namespace which is not known the parser or template engine. Did you try to parse something “unusual”? The expression I used in the above example was

    “//media:content/@url”

    which selects the url from elements like

    The media:… is depicting a namespace so the parser knows which “media” element is meant, defined by an external namespace definition / schema file. In our case the root element is defining (+5others):

    xmlns:media= ‘http://search.yahoo.com/mrss/’

    What I have experienced in the past is that sometimes proxies are denying access to certain sources that have to be readable by the parser. Try to access http://search.yahoo.com/mrss/ with your browser directly and see if your network connection can access this resource.

    Also your parser lib (libxml) could be outdated or of a different version. My xmlstarlet needs
    libxml2 and libxslt1.1 (which should be installed automatically if not already on your system). To determine the currently installed version of those libs use

    dpkg -l libxml2 libxslt1.1

    this is what’s installed on my box:

    ii libxml2 2.7.8.dfsg-5.1 GNOME XML library
    ii libxslt1.1 1.1.26-8ubuntu XSLT 1.0 processing library

    Did that help?

    Cheers,
    Stefan

    • Hi Stefan. Hmmm, I just do the exactly what you said, xmlstarlet sel -t -v “//media:content/@url” but it doesn’t work.
      Here is my libxml2 libxslt1.1 version on my Debian Squeeze VPS:
      dpkg -l libxml2 libxslt1.1
      ii libxml2 2.7.8.dfsg-2+s GNOME XML library
      ii libxslt1.1 1.1.26-6 XSLT 1.0 processing library – runtime librar
      Actually I download ATOM document on my Windows 7 Chrome, and upload to Debian VPS to handle it.
      Here is my ATOM document:
      https://docs.google.com/open?id=0B7r4qfdPc2V1ejNVak9jd2pPSWc
      Cos error happens, so I use awk to handle it. It’s OK, but that error confuses me and if xmlstarlet sel -t -v “//media:content/@url” your_document.xml > origs.txt works that’s the best solution I think. 🙂
      cat | xmlstarlet fo | grep “media:content” | awk ‘BEGIN {FS=”\””} {print $2}’ > origs
      wget -i origs

  3. hmm. WordPress garbles xml tags in comments 😦 look at the sourcecode of the page or ask if the tag examples seem unclear…

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s