Posts for June 2011

Maintaining and Open Relationship with Google

Google's mission is to "Organize the World's Information" and they do a rather smashing job of it as long as they alone are doing it.   Even though they do a better job than most as staying open, there is still a significant risk when putting all your eggs in the Google basket and few options for backing out.   Particularly with the rise of Google+, Google Music beta and other such services going all in on Google could prove a big liability for individuals and companies in terms of being able to shift to new or better services as they emerge or just re-establish ownership over your own content.  Every business would do well to act with caution in opting for the convenience of any service as that convenience would too easily transform into abducting the ownership of your content.   I would suggest that losing control of your content in a world where ideas and content are a commodity is the same as losing control of your life or business.

With a little forethought however some convenient ways exist to both leverage the convenient services offered by Google and remain managers of your information.  The Digital Liberation Foundation launched it's "Google Takeout" service, that allows the harvesting and export of your information from various Google services into open formats.  Open formats are the key to keeping your content flexible and mobile and in a world where 5 years is an entire era of information management practices, this is critical to surviving and thriving in the modern world.  The group is starting with export features related to Google services but plan to expand their ability to other services as well.  I assume (hope) this means Facebook and Yahoo! based services but only time will tell.

Even with groups like ...

(Read More)
Game of Thrones Redux

Even knowing what was coming in the series, the end left me pleased and stunned as I was when I first started reading the books.  The characters were visualized and brought to life in amazing ways and I was blown away at how lean and mean the storytelling was.  The series highlights just what a powerful contribution a great set of writers and directors can make to a series.  I'm not one of those fans who feels unhappy because their favorite character wasn't featured enough.  The breath of the story itself meant that there was just no way every character could be touched on and they obviously had some hard decisions about who to feature and how.  Given the realities of what they had to film I think the obviously painful choices were good ones.

Even given the lean nature of the storytelling there is a long list of standout characters for me in the series already.  Danny and Drogo, Tyrion, Jon and Jamie were just brilliant and it's only in Game of Thrones that that would seem like a short list of stand-out characters.  Pulling that off in any series would be amazing, here it's just a miracle.

Some thing else the series really highlighted for me was the just how telling the story on screen drove the need for different decisions than telling it in the book.  Most obvious to me is the fact that in print there is no background, everything is foreground in writing.  You don't accidentally see a sentence in a novel so every detail, character and event is right there in the front of your mind.  The series instead was unafraid to let many things play out in the background, which gave readers of the book a special treat ...

(Read More)
New Rise of the Apes European Trailer

There's a new trailer out for the Planet of the Apes reboot called "Rise of the Apes" and it really looks fantastic.    It's amazing how much story they can convey in a simple trailer and I have hopes the movie has a lot of depth in it.  There seems to be a lot of potential for commentary about the nature of intelligence and self, the respect for life and compassion or what a lack of it brings.  Seeing the "acting" of Caesar with the John Lithgow character, who seems to have Alzheimer, brought a tear to my eye as you watch the chimp express compassion for the human who is obviously struggling.    This could be the start of a great movie franchise, at least I hope it is.

Django Libraries for XML and eXist DB

We often use XML at Academic Libraries and decided to create a set of libraries to ease our work connecting our XML and repository based work to the Django framework by building a central set of libraries.  We'll be continuing to build these libraries out and recently released the code as open source projects on GitHub.

EULxml provides XPath parsing features in python and mappings for xml documents to pythonic objects as well as features to provide Django Form to simple XML objects.  The code is available on GitHub and some documentation and examples up on read the docs. 

EULexistdb provides connections and XQuery capability to eXist DB and Django Queryset like objects for rich interaction between Django and XML data stored in eXist DB.  Combined with the XML Django Forms from EULxml (on which it depends) it has enabled us to do a lot with our Library collection.  This library is also available on GitHub and has some documentation and examples up on read the docs.

We're excited at the possibilities of leveraging the power of Django with our XML databases and repositories.   We're open sourcing it in hopes others may find it useful and may want to contribute to the libraries as well.

They Shoot URLs Don't They?

I've had a rather lengthy and interesting blogging life these last nine years and stopped over the last few for a number of reasons.  A backlog of 2400+ posts however have given me a rather interesting dataset to test when it comes to URL persistence and as I'm going over old posts I find and example of URL persistence that seems very backwards to me.

I use to run a funny little site for Gamespy called Paragon City Hall, just a community based site for an as-then-un-release game called City of Heroes and posted about that site back in 2002

In another post around the same time I reference my depression over a news story from Reuters that made me want to kill myself.  Over dramatic yes, but hey, it was 9 years ago so give me a break.

The amazing thing to me is that the Reuters story results in a dead link, nothing, no forward, no search suggestion, nothing.  The Paragon City Hall link however STILL WORKs, even though I shut the site down 8 years ago and it's unlinked by the network.

What kind of world do we live in when RPGPlanet has better URL persistence than Reuters?

Although I came to this realization later than I should have in my career, Content in the web is a Social Contract.  Tim Berners-Lee made this case quiet eloquently in an article I cite quiet frequently and while I might not expect Gamespy to understand it, of all agencies Reuters should get it.  They should have gotten it perhaps before even TBL posted anything about it in 1998.

A particular fear creeps over me when an agency like Reuters is letting links expire like that and offends me as an adopted digital librarian.   (They found me ...

(Read More)
Django AuthenticationForm For User Login

Django already makes it insanely easy to log a user in and out via their generic views.  Engineers will often want to create their own login view to provide some flexibility, say an Ajax login or other spin on standard login.  A number of examples are given in Django for that as well, and as with most of the framework this is a snap too.  A convenient feature of Django that doesn't make it into many of the examples I've seen is the AuthenticationForm that provides a convenience Django form with associated logic render a login form, validate input and throw errors if they do things like forget to supply a password and do the basic authentication check.

The form provides that all for you and all you really need to do in your view is read the user submitted data, validate the form and take the final step of logging the user in.

This is just one form in a group of 7 or so that provide all kinds of convenience features like Password Changes and User Registration.  Not only do they provide a developer with very easy access to common functions but they can extended or subclassed like any other Python Class to add or override functionality.

Here's an Example of a simple view method using the AuthenticationForm.  Something of a 'gotcha' for developers who normally use Django form is the POST values are passed as the second argument to the form.  The request object can be passed but that is normally only done to check for authentication cookies.  See the Source for more info on the form..  

from django.contrib.auth import login
from django.contrib.auth.forms import AuthenticationForm
from django.shortcuts import render
from django.shortcuts import HttpResponseRedirect
from django.core.urlresolvers ...
(Read More)
Django Tempate Tag for Gravatar Images

Gravatar images seem to be growing in popularity across a number of sites and the services already makes it incredibly simple to grab a profile picture there via URL.  The Gravatar site itself has a number of examples on how to grab an image off of the service, as well as more detailed examples of grabbing more information.

They do provide a examples for grabbing an image via Python and even a Django example which renders the image as a template note.  For displays like this I generally prefer an inclusion tag since I can render the image in a template rather than having to build it each time on my own.

The template tag itself is just:

from django import template
import urllib, hashlib

from yourapp import settings

# Provide Default settings so users only need to provide them in settings.py if they want to override.
GRAVATAR_BASEURL = getattr(settings, "GRAVATAR_BASEURL", "http://www.gravatar.com/avatar/")
GRAVATAR_DEFAULT_IMAGE = getattr(settings, "GRAVATAR_DEFAULT_IMAGE", "")
GRAVATAR_SIZE = getattr(settings, "GRAVATAR_SIZE", 40)

register = template.Library()

def gravatar_url(email, size):
    """
    Builds a Gravatar Image URL based on the provided email.

    :param email: Email address to query for a gravatar image.
    :param size:  Size to request and render the image in pixels.
    """

    attrs = {
        'd': GRAVATAR_DEFAULT_IMAGE,
        's': size
    }

    gravatar_url = "%s%s/?" % (GRAVATAR_BASEURL, hashlib.md5(email.lower()).hexdigest())
    gravatar_url += urllib.urlencode(attrs)

    return {'gravatar': {'url': gravatar_url, 'size': size}}

@register.inclusion_tag('account/snippets/gravitar.xhtml')
def gravatar_for_email(email, size=GRAVATAR_SIZE):
    """
    Renders a gravatar image for user with the specified email via a template.

    {% gravatar_for_email "user@email.com" 40 %}

    :param email:  String representing the users email.
    :param size: Size of gravatar to use in pixels.  OPTIONAL
    
    """
    email = "%s" % email
    size = int(size)
    return gravatar_url(email, size)

This approach also has the advantage of being extendable and it's easy enough to build additional ...

(Read More)
Gaming Wiki Back Online

I brought my Gaming Wiki back online on the site here after several months of being down. I apologize for that and I don't have any better excuse than not really taking the time to do it.  I had some difficulties with my previous web host and all I was ever able to get was a *.tar.gz download of the wiki database and the service would timeout every time I tried to download a gzipped directory of the mediawiki itself.  I kept hoping I'd find a backup up copy on a CD somewhere but no joy.

So it languished in 404 hell for a bit while I came to terms with the fact that I'd have to take the DB dump from an unknown older version of MediaWiki and try to upgrade it to work with a modern download.  

All in all I have to give it to the MediaWiki folks, I was essentially able to just run the update scripts and only had to make a few settings changes that took a bit of looking up.  More or less though the whole thing came back up.

Because I couldn't get a backup of the files stored in my wiki though some of the file links and thumbnails wont work until I come up with some plan to rebuild them.

Thanks to everyone for your patience.

SemTech 2011 Redux

The SemTech 2011 Conference delivered a lot to the attendees and I thought I'd jot down a few of my thoughts and note some highlights as the conference draws to a close.

By and large I have to say that the technology has definitely arrived and we're capable of some exciting advances in linking data and having the web to begin to fulfill some of the promises of being a real knowledge base.  I just hope Skynet appreciates all the work we're all doing on it's behalf when it finally becomes self aware. 

What had the structured data crowd buzzing the most was last weeks announcement of MicroData format support by Google, Microsoft and Yahoo at schema.org.  Annoyance aside at Microsoft trying to put up schema.org as if it was some small independent standards board, I think the Microdata format seems just fine to me.  Essentially a competitor to RDFa it targets easy markup of information in a web page and is a bit leaner and easier than the current RDFa 1.0 standard.  The crowd here being a bit biased toward RDFa, there wasn't a lot of positive talk about Schema.org but I find I can't really care too much one way or another.  What we need to develop is a community of practice, and the technology should be secondary to that as long as it's not a barrier.  To me Microdata or RDFa are both fine standards and the only logical argument I would make to prefer one over another is that Schema.org's aim to to mark up information for better searching while RDFa is aimed at marking up knowledge.  It may seem a subtle difference but misaligned motivations like this can be the cause of ...

(Read More)
San Fran Pic-So

Conference ended by mid-day here so I decided to take an open top  Bus Tour around San Fransico.  Great experience and you could get on and off all day so I got to see  more of the city in 4 hours than I have on most of my previous Trips.  Posted Pics to Picasa Web and linking blow.

Dead Simple Python Calls to Open Calais API

I was amazed at how easy Open Calais makes it for anyone to make calls to it's API via REST and return suggested tags and entitty recognition for any text.  Native Python libaries urllib(2) and httplib provide some effective methods for connecting and making simple REST calls to the Calais Web Services API but the httplib2 libray makes easier still.

Start off by installing httplib2 via pip

pip install httplib2

From there you just need to get an API key at the Calais site, set some headers, define a bit of text you want to pass to the API for tagging and entity recognition and then reap the benefit.

You can see this in the simple code snippet below...

import httplib2
import json

# Some local values needed for the call
LOCAL_API_KEY = 'PUT_YOUR_KEY_HERE' # Aquire this by registering at the Calais site
CALAIS_TAG_API = 'http://api.opencalais.com/tag/rs/enrich'

# Some sample text from a news story to pass to Calais for analysis
test_body = """
Some huge announcements were made at Apple's Worldwide Developer's Conference Monday, including the new mobile operating system iOS 5, PC software OS X Lion, and the unveiling of the iCloud.
"""

# header information need by Calais.
# For more info see http://www.opencalais.com/documentation/calais-web-service-api/api-invocation/rest
headers = {
    'x-calais-licenseID': LOCAL_API_KEY,
    'content-type': 'text/raw',
    'accept': 'application/json',
}

# Create your http object
http = httplib2.Http()
# Make the http post request, passing the body and headers as needed.
response, content = http.request(CALAIS_TAG_API, 'POST', headers=headers, body=test_body)

jcontent = json.loads(content) # Parse the json return into a python dict
print json.dumps(jcontent, indent=4) # Pretty print the resulting dictionary returned.

The server itself parses the body send as part of the http request and returns a json string with the results in this example because ...

(Read More)
SemTech 2011 - O'Rielly on RDF in eBooks

Instead of a flood of tweets I thought I'd go a bit old school and do some live blogging from the SemTech 2011 session Discovering and Using RDF for Books at O'Reilly Media this morning.   My own interest in this session is how we might apply this to texts coming from our local repository and in particular related to our Yellowbacks Project which we hope to enhance soon.  We also have a body of texts sitting on our servers in TEI format and we haven't landed on a way to comfortably leverage that in our infrastructure.  My own comments here appear in parenthesis (like so).

O'Reilly took their first stab at modeling information about their books in straight XML in a bit of a "tag soup" approach. This proved way too heavyweight for them and they ended up being late in delivering products because of the time it took to modify and extend their XML approach.  They then moved onto ONIX as an internal format, but it was old and writing xpath was a bit nightmarish because of the standards drift involved and other reasons.  In the end it was just not extensible and not friendly toward being agile.   That lead them to take a stab and creating their own schema, which also proved too heavyweight and slow.  Alas they washed up on the shores of Dublin Core, specifically with DC Terms and this introduced them to the world of RDF.

The extensibility of RDF starting with DC seemed pretty cool and useful to them and they kept adding FOAF, BIBLIO and more.  More useful for the company, the problem at the end of the day was they were still thinking in XML terms.  (Implying they should have been thinking in RDF and triples terms instead ...

(Read More)
Some Antics for the Week

I'm off this week to the SemTech 2011 conference in San Fransico so content may be a bit light.  I hope to have some interesting things to say when I come back when I clear away the fog of depession from being unhappy with my information architecture, service architecture and 'no doubt' feelng like my content is worthless because it isn't backed by OWL.

Sigh.

On a side note.  I could use a bit of a break from "I've invented the intelligent web" quotes from just about every CEO or manager I meet here.   I wish them luck of course and I love to see competition in the field but this is a tech confrence (mostly), not really a marketing one so people are going to need more information than that.

On a side note meeting up with some folks from Yale, Mayo Clinic and The Library of Congress has proven very interesting. They're doing some great stuff that I think has some application for us.  I'm looking forward to getting back.

Setting up LAMP on Ubuntu 11.04 (Natty) Desktop Edition

The default download for Ubuntu 11.04 (Natty) is the Desktop edition, which doesn't come with the LAMP server stack installed by default.  Fortunately setting up LAMP is almost as easy as installing Ubuntu itself these days.  You could install each package seperatly via apt-get but the most convenient method is to use the Tasksel package to install and configure LAMP all at once and together.

Install this this the terminal by typing:

$ sudo apt-get install tasksel

and when complete launch it in terminal by typing:

$ sudo tasksel

The basic terminal GUI will allow you to select from a number of package installs, use the arrow keys to move through the list and find the LAMP Server entry.  Hit spacebar to select it and ten Tab to highlight [ok] and hit return.  The installer should walk you through setup options as needed.

There are a few things you may want to look at after the install, the apache install defaults to allow indexing of directories and many peole turn this off.  

Also MySQL enthusiasts also may wish to install phpMyAdmin, which is not installed as part of the LAMP package.  For that just type...

$ sudo apt-get install phpmyadmin

And there you go!

X-Men: First Class Review

Overall I thought X-Men: First Class was just kind of okay.  Several fun moments separated by some overstylized and heavy handed storytelling.   I thought I'd share a few spoiler free thoughts.

I really have to blame the directors and producers on this as the cast is pretty top notch, but most of the performances were over the top.  Even so each one of them managed to squeeze in a couple of real or fun moments for their characters.  The stars had to be the Dorm Room X-Men by far, who were fun throughout most of the movie.  Macavoy was consistently good as Xaviar but Magneto was just as consistently card-board and dry, while Sebastian Shaw came off as a massive parody of an already over the top character from the comics.

Overall it felt like the movie was trying to conjure up a Bond movie feel but more often than not seemed a little too Austin Powers to me.

What was consistently good were the special effects and at times they were downright fantastic.  Banshee's flying scene were really great and you felt more like watching someone snap back up from a bungie jump and didn't feel CGI at all.  Jennifer Lawrence is great, but the slavish devotion of tying her character into the later movies made Mystique seem a bit cartoonish next to the other younger X-men.  I could have used me more Mystique/Beast throughout and less Xaviar/Magneto bro-mance.

Overall I have to reiterate that I thought the movie had a few good moments but overall was a miss.  After movies like Thor, Iron Man 1/2, and the latest Hulk it seemed Marvel had wised up and put movies like Wolverine behind them, this movie just seemed a step backward for the studio ...

(Read More)
Wordpress to Django: Designing Compatible URLs in urls.py

As I mentioned in my previous post, there are a few fairly easy strategies for maintaining the stable URLs for your content when migrating from WordPress to a local Django driven blog.

Django allows you a high level of control over URL formats so it's fairly simple to design them to be compatible with WordPress URLs.  Additionally WordPress has been around long enough that the standard URL re-write formats follow suggested best practices for content, so bringing your Django URLs in alignment with that is not only useful for migrating content but good practice overall.

That said the two most common formats for URLs in WordPress are:

http://<domain>/<4 digit year>/<1 or 2 digit month/<1 or 2 digit day/<slug>/

so for example the URL for the previous post linked above is...

http://www.flagonwiththedragon.com/2011/06/01/wordpress-to-django-strategies-dealing-with-WordPress-querystring-urls/

The next most common format for URLs is similar and differs mostly in how months are abbreviated:

http://<domain>/4 digit year>/<3 char month>/<1 or 2 digit day>/<slug>/

So an example of the same URL above in this format would be...

http://www.flagonwiththedragon.com/2011/jun/01/wordpress-to-django-strategies-dealing-with-WordPress-querystring-urls/

Designing urls.py in Django to accomodate this is simply:

    # URL format where month format is abbreviated character format.
    url(r'^(?P\d{4})/(?P\w{3})/(?P\d{1,2})/(?P[0-9A-Za-z-]+)/$', 'post_detail_alt'),
    url(r'^(?P\d{4})/(?P\w{3})/(?P\d{1,2})/$', 'post_day_alt'),
    url(r'^(?P\d{4})/(?P\w{3})/$', 'post_month_alt'),
    # URL format where month is either one or two digits.
    url(r'Word\d{4}Press\d{1,2})/(?P\d{1,2})/(?P[0-9A-Za-z-]+)/$', 'post_detail', name='post-detail'),
    url(r'^(?P\d{4})/(?P\d{1,2})/(?P\d{1,2})/$', 'post_day', name='list-day'),
    url(r'^(?P\d ...
(Read More)
Using the Django-Pagination app in Django 1.3

Like many Djangonaughts I use django-pagination as my primary means to page results on lists pages and between the differences on the original Google project page(1.0.5 is the last downloadable version), what appears to be the same project migrated to GitHub and the PyPi (1.0.7 is the pip install version) site for the project, things can get confusing.

I'm probably the only person confused by this but it appear that the GitHub site is the most up to date and it appears to be in sync with the pip install version.  Life signs overall are dubious on the project though with no updates having come since early 2010 and several (what seem to be) reasonable pull requests sitting in the project queue.

One gotcha I wanted to point out in the PyPI readme file is the directions for TEMPLATE_CONTEXT_PROCESSORS:

According to the project documentation, in settings.py you should set:

TEMPLATE_CONTEXT_PROCESSORS= (
   "django.core.context_processors.auth",
   "django.core.context_processors.debug",
   "django.core.context_processors.i18n",
   "django.core.context_processors.media",
   "django.core.context_processors.request"
)

This can cause some problems in Django 1.3 however since the default TEMPLATE_CONTEXT_PROCESSORS have changed, in particular to support the new features for serving static media. 

So to include for pagination to work and to keep the default template context processors you should instead set:

TEMPLATE_CONTEXT_PROCESSORS = (
    "django.contrib.auth.context_processors.auth",
    "django.core.context_processors.debug",
    "django.core.context_processors.i18n",
    "django.core.context_processors.media",
    "django.core.context_processors.static",
    "django.contrib.messages.context_processors.messages",
    "django.core.context_processors.request",
)

Alternatively if you want to just extend the default template context processors with just the one you need for django-pagination to work you could simply:

TEMPLATE_CONTEXT_PROCESSORS = TEMPLATE_CONTEXT_PROCESSORS + ("django.core.context_processors.request",)
Random Library Art

One of the patrons in our library left this drawing up on one of the writable wall space.

From Just Life

It's a great piece really and in particular the proportions are very good on the figure, it manages to convey a sense of perspective for such a simple drawing.

The saddest part of the whole thing though is the red line across the chest.  Seems like this was just a simple sketch of a girl sitting by a tree but someone was so threatened by breasts that they had to erase them and draw that red line across the space.  Perhaps that was part of the original art and a statement but I kind of doubt it. Sad really.

Wordpress to Django: Strategies Dealing with Wordpress Querystring URLs

Stable URLs are the foundation of valuable information on the web.  As Tim Berners-Lee eloquently described it "Cool URIs Don't Change" and I thought I'd address a few strategies and code I'm using in MetaRho for maintaining stable URLs for content migrating from Wordpress.

Wordpress Querystring URLs

By default Wordpress uses querystrings for accessing content, passing the internal ID number of the post through the 'p' attribute like so...

http://<domain name>/index.php?p=<post id>

So for example calling post id 1 on mydomain.com would look like.

http://www.mydomain.com/index.php?p=1

Since best practice for Cool URIs and in Django is to use real URLs instead of Querystrings this presents a small problem.  The easiest solution I found is to simply implement a decorator in Django that I put on the default index view to watch for incoming WordPress querystring URLs and query some extra field on the model for blog posts that holds the original WordPress ID number.  Note I do NOT try to maintain ID numbers between posts as the better practice is to keep these opaque from the user.

I use the common strategy of keeping a one to many Model related to my posts to contain key value pairs for extra data.  In my implementation I call that Model PostMeta and when I import content I just store the original WordPress ID number under a key 'wp_post_id'.

(view code on github)

def wp_post_redirect(view_fn):
    '''
    Checks a request for a querystring item matching a WordPress 
    post request.
    
    This is to enables url redirects for blog migrations from WordPress.
    To use just decorate the view method for your default blog location.
    
    '''
    def decorator(request, *args, **kwargs):
        wp_query = request.GET.get('p', None)
        if wp_query:
            try:
                post = Post.objects ...
(Read More)