Django Libraries for XML and eXist DB

We often use XML at Academic Libraries and decided to create a set of libraries to ease our work connecting our XML and repository based work to the Django framework by building a central set of libraries.  We'll be continuing to build these libraries out and recently released the code as open source projects on GitHub.

EULxml provides XPath parsing features in python and mappings for xml documents to pythonic objects as well as features to provide Django Form to simple XML objects.  The code is available on GitHub and some documentation and examples up on read the docs. 

EULexistdb provides connections and XQuery capability to eXist DB and Django Queryset like objects for rich interaction between Django and XML data stored in eXist DB.  Combined with the XML Django Forms from EULxml (on which it depends) it has enabled us to do a lot with our Library collection.  This library is also available on GitHub and has some documentation and examples up on read the docs.

We're excited at the possibilities of leveraging the power of Django with our XML databases and repositories.   We're open sourcing it in hopes others may find it useful and may want to contribute to the libraries as well.

Django AuthenticationForm For User Login

Django already makes it insanely easy to log a user in and out via their generic views.  Engineers will often want to create their own login view to provide some flexibility, say an Ajax login or other spin on standard login.  A number of examples are given in Django for that as well, and as with most of the framework this is a snap too.  A convenient feature of Django that doesn't make it into many of the examples I've seen is the AuthenticationForm that provides a convenience Django form with associated logic render a login form, validate input and throw errors if they do things like forget to supply a password and do the basic authentication check.

The form provides that all for you and all you really need to do in your view is read the user submitted data, validate the form and take the final step of logging the user in.

This is just one form in a group of 7 or so that provide all kinds of convenience features like Password Changes and User Registration.  Not only do they provide a developer with very easy access to common functions but they can extended or subclassed like any other Python Class to add or override functionality.

Here's an Example of a simple view method using the AuthenticationForm.  Something of a 'gotcha' for developers who normally use Django form is the POST values are passed as the second argument to the form.  The request object can be passed but that is normally only done to check for authentication cookies.  See the Source for more info on the form..  

from django.contrib.auth import login
from django.contrib.auth.forms import AuthenticationForm
from django.shortcuts import render
from django.shortcuts import HttpResponseRedirect
from django.core.urlresolvers import reverse

def authenticate_user(request):
    """Logs a user into the application."""

    if request.user.is_authenticated():
        return HttpResponseRedirect(reverse('account:index'))

    # Initialize the form either fresh or with the appropriate POST data as the instance
    auth_form = AuthenticationForm(None, request.POST or None)

    # Ye Olde next param so common in login.
    # I send them to their default profile view.
    nextpage = request.GET.get('next', reverse('account:index'))

    # The form itself handles authentication and checking to make sure passowrd and such are supplied.
    if auth_form.is_valid():
        login(request, auth_form.get_user())
        return HttpResponseRedirect(nextpage)

    return render(request, 'account/login.xhtml', {
        'auth_form': auth_form,
        'title': 'User Login',
        'next': nextpage,

The associated template code for this renders the form and any errors if needed.  Note that the form may have individual field errors in the case of a blank username or password and returns a ValidationError if the credentials provided were invalid and this is dispalyed to the user via the 'non_field_errors' attribute if present.{% if auth_form.non_field_errors%}

    {% block message %}

        <div id="error_msg">

        {{ auth_form.non_field_errors }}


    {% endblock %}

{% endif %}

{% block content-body %}

    <form action="{% url account:login-form %}" method="POST">

    {% csrf_token %}

        {% for field in auth_form %}

            <div class="fieldWrapper">

                {{ field.errors }}

                {{ field.label_tag }} {{ field }}


        {%  endfor %}

    <input type="submit" value="Login" />

    <input type="hidden" name="next" value="{{ next }}" />


    {{ ct }}

{%  endblock %}

Dead Simple Python Calls to Open Calais API

I was amazed at how easy Open Calais makes it for anyone to make calls to it's API via REST and return suggested tags and entitty recognition for any text.  Native Python libaries urllib(2) and httplib provide some effective methods for connecting and making simple REST calls to the Calais Web Services API but the httplib2 libray makes easier still.

Start off by installing httplib2 via pip

pip install httplib2

From there you just need to get an API key at the Calais site, set some headers, define a bit of text you want to pass to the API for tagging and entity recognition and then reap the benefit.

You can see this in the simple code snippet below…

import httplib2
import json

# Some local values needed for the call
LOCAL_API_KEY = 'PUT_YOUR_KEY_HERE' # Aquire this by registering at the Calais site

# Some sample text from a news story to pass to Calais for analysis
test_body = """
Some huge announcements were made at Apple's Worldwide Developer's Conference Monday, including the new mobile operating system iOS 5, PC software OS X Lion, and the unveiling of the iCloud.

# header information need by Calais.
# For more info see
headers = {
    'x-calais-licenseID': LOCAL_API_KEY,
    'content-type': 'text/raw',
    'accept': 'application/json',

# Create your http object
http = httplib2.Http()
# Make the http post request, passing the body and headers as needed.
response, content = http.request(CALAIS_TAG_API, 'POST', headers=headers, body=test_body)

jcontent = json.loads(content) # Parse the json return into a python dict
print json.dumps(jcontent, indent=4) # Pretty print the resulting dictionary returned.

The server itself parses the body send as part of the http request and returns a json string with the results in this example because that is the format I requested in the 'accept' header attribute.  The API accepts a number of formats and returns a number as well, see the api documentation for more information.

The take away here is how simple it is to make a call to the Calais API and it wouldn't take much more to expand this to something useful in your own python application.

For reference and completeness, here is the output of the coce above.  Enjoy.

    "doc": {
        "info": {
            "docId": "",
            "docDate": "2011-06-08 14:14:46.594",
            "docTitle": "",
            "document": "Some huge announcements were made at Apple's Worldwide Developer's Conference Monday, including the new mobile operating system  iOS 5 , PC software OS X Lion, and the unveiling of the iCloud.",
            "calaisRequestID": "4306b4a8-cc7f-7c04-1307-076b7d5f8d35",
            "id": ""
        "meta": {
            "submitterCode": "8fba6b3e-fef5-76ec-d7dc-ec60686110a4",
            "contentType": "text/html",
            "language": "English",
            "emVer": "7.1.1103.5",
            "messages": [],
            "processingVer": "CalaisJob01",
            "submitionDate": "2011-06-08 14:14:46.485",
            "signature": "digestalg-1|N0M3Ia9fmkexMBwN7kSL4thKM4g=|f8uTykbIPicGbu6y0962n658qv1PwewuM5jh5Gs0hJ79dC+vpurpmA==",
            "langIdVer": "DefaultLangId"
    "": {
        "_typeReference": "",
        "_type": "OperatingSystem",
        "name": "Mac OS X",
        "_typeGroup": "entities",
        "instances": [
                "suffix": " Lion, and the unveiling of the",
                "prefix": " new mobile operating system  iOS 5 , PC software ",
                "detection": "[ new mobile operating system  iOS 5 , PC software ]OS X[ Lion, and the unveiling of the]",
                "length": 4,
                "offset": 159,
                "exact": "OS X"
        "relevance": 0.714
    "": {
        "category": "",
        "score": 1,
        "classifierName": "Calais",
        "categoryName": "Technology_Internet",
        "_typeGroup": "topics"
    "": {
        "_typeReference": "",
        "_type": "Technology",
        "name": "operating system",
        "_typeGroup": "entities",
        "instances": [
                "suffix": "  iOS 5 , PC software OS X Lion, and the",
                "prefix": "Conference Monday, including the new mobile ",
                "detection": "[Conference Monday, including the new mobile ]operating system[  iOS 5 , PC software OS X Lion, and the]",
                "length": 16,
                "offset": 121,
                "exact": "operating system"
        "relevance": 0.714

WordPress to Django: Designing Compatible URLs in

As I mentioned in my previous post, there are a few fairly easy strategies for maintaining the stable URLs for your content when migrating from WordPress to a local Django driven blog.

Django allows you a high level of control over URL formats so it's fairly simple to design them to be compatible with WordPress URLs.  Additionally WordPress has been around long enough that the standard URL re-write formats follow suggested best practices for content, so bringing your Django URLs in alignment with that is not only useful for migrating content but good practice overall.

That said the two most common formats for URLs in WordPress are:

http://<domain>/<4 digit year>/<1 or 2 digit month/<1 or 2 digit day/<slug>/

so for example the URL for the previous post linked above is…

The next most common format for URLs is similar and differs mostly in how months are abbreviated:

http://<domain>/4 digit year>/<3 char month>/<1 or 2 digit day>/<slug>/

So an example of the same URL above in this format would be…

Designing in Django to accomodate this is simply:

    # URL format where month format is abbreviated character format.
    url(r'^(?P\d{4})/(?P\w{3})/(?P\d{1,2})/(?P[0-9A-Za-z-]+)/$', 'post_detail_alt'),
    url(r'^(?P\d{4})/(?P\w{3})/(?P\d{1,2})/$', 'post_day_alt'),
    url(r'^(?P\d{4})/(?P\w{3})/$', 'post_month_alt'),
    # URL format where month is either one or two digits.
    url(r'Word\d{4}Press\d{1,2})/(?P\d{1,2})/(?P[0-9A-Za-z-]+)/$', 'post_detail', name='post-detail'),
    url(r'^(?P\d{4})/(?P\d{1,2})/(?P\d{1,2})/$', 'post_day', name='list-day'),
    url(r'^(?P\d{4})/(?P\d{1,2})/$', 'post_month', name='list-month'),
    url(r'^(?P\d{4})/$', 'post_year', name='list-year'),

I provide a bit more here than needed as I also include posts lists if you just use the date portion of the URL but it should be obvious.

My unscientific opinion is that digit format for all date properties in a URL is better practice and the format I see most often so I suggest using that in your own use.  Using this format is also advantagous for your Django views because casting the date pieces as an int() means it can evalutate valutes like '01' and '1' in exactly the same way.  TheWordPress a post detail view based on all digits then would simply be:

# Test conversion to number based dates
def post_detail(request, year, month, day, slug):
    """Returns an individual post."""
    date =, int(month), int(day))
        post = Post.objects.published().get(slug=slug, pub_date__year=date.year,
    except Post.DoesNotExist:
        raise Http404

    return render(request, 'blog/post_detail.xhtml', {
            'post': post,
            'title': post.title,

The view to handle URLs where months use character abbreviations is very similar and just needs a bit more to parse the date format like so…

# Detail Views
def post_detail_alt(request, year, month, day, slug):
    Returns an individual post. Alternate arguments for compatability with temporary URL pattern

    tt = time.strptime('-'.join([year, month, day]), '%Y-%b-%d')
    date =*tt[:3])
        post = Post.objects.published().get(slug=slug, pub_date__year=date.year, 
    except Post.DoesNotExist:
        raise Http404
    return render(request, 'blog/post_detail.xhtml', {
            'post': post,
            'title': post.title,                                     

The above alt method is rather sloppy on my part really and violates DRY principles so at some future date I could refactor the alt method to be more lean like so…

def post_detail_alt(request, year, month, day, slug):
    Returns an individual post. Alternate arguments for compatability with temporary URL pattern

    tt = time.strptime('-'.join([year, month, day]), '%Y-%b-%d')
    date =*tt[:3])
    return post_detail(request, date.year, date.month,

That's really all you need.  If you're migrating content from WordPress and used those URL formats all your content should map.  If you're just starting out with your blog the concepts here are sound for starting right from the beginning.

Using the Django-Pagination app in Django 1.3

Like many Djangonaughts I use django-pagination as my primary means to page results on lists pages and between the differences on the original Google project page(1.0.5 is the last downloadable version), what appears to be the same project migrated to GitHub and the PyPi (1.0.7 is the pip install version) site for the project, things can get confusing.

I'm probably the only person confused by this but it appear that the GitHub site is the most up to date and it appears to be in sync with the pip install version.  Life signs overall are dubious on the project though with no updates having come since early 2010 and several (what seem to be) reasonable pull requests sitting in the project queue.

One gotcha I wanted to point out in the PyPI readme file is the directions for TEMPLATE_CONTEXT_PROCESSORS:

According to the project documentation, in you should set:


This can cause some problems in Django 1.3 however since the default TEMPLATE_CONTEXT_PROCESSORS have changed, in particular to support the new features for serving static media. 

So to include for pagination to work and to keep the default template context processors you should instead set:


Alternatively if you want to just extend the default template context processors with just the one you need for django-pagination to work you could simply:

TEMPLATE_CONTEXT_PROCESSORS = TEMPLATE_CONTEXT_PROCESSORS + ("django.core.context_processors.request",)

WordPress to Django: Strategies Dealing with WordPress Querystring URLs

Stable URLs are the foundation of valuable information on the web.  As Tim Berners-Lee eloquently described it "Cool URIs Don't Change" and I thought I'd address a few strategies and code I'm using in MetaRho for maintaining stable URLs for content migrating from WordPress.

WordPress Querystring URLs

By default WordPress uses querystrings for accessing content, passing the internal ID number of the post through the 'p' attribute like so…

http://<domain name>/index.php?p=<post id>

So for example calling post id 1 on would look like.

Since best practice for Cool URIs and in Django is to use real URLs instead of Querystrings this presents a small problem.  The easiest solution I found is to simply implement a decorator in Django that I put on the default index view to watch for incoming WordPress querystring URLs and query some extra field on the model for blog posts that holds the original WordPress ID number.  Note I do NOT try to maintain ID numbers between posts as the better practice is to keep these opaque from the user.

I use the common strategy of keeping a one to many Model related to my posts to contain key value pairs for extra data.  In my implementation I call that Model PostMeta and when I import content I just store the original WordPress ID number under a key 'wp_post_id'.

(view code on github)

def wp_post_redirect(view_fn):
    Checks a request for a querystring item matching a WordPress 
    post request.
    This is to enables url redirects for blog migrations from WordPress.
    To use just decorate the view method for your default blog location.
    def decorator(request, *args, **kwargs):
        wp_query = request.GET.get('p', None)
        if wp_query:
                post = Post.objects.published().get(postmeta__key='wp_post_id', 
                ar = post.pub_date.strftime("%Y/%m/%d").split('/')
                htr = HttpResponseRedirect(reverse('blog:post-detail', args=ar))
                htr.status_code = 301 # This should reflect a 'Moved Permanently' code.
                return htr
            except Post.DoesNotExist:
                raise Http404
        return view_fn(request, *args, **kwargs)

    return decorator

The decorator is fairly simple and just queries for posts with that meta key and redirects the user to the real URL with the proper status code or issues a 404 if no post is found.

It's important to note that this decorator needs to be put on the view used at the of the blog app as that is the equivalent behavior from WordPress.

In my next update I'll discuss a bit about designing Django URLs to deal with typical WordPress URL rewrite configurations.

Single simple view for Django form processing

I always feel a bit dissatisfied with the amount of code I have to put in to process forms in Django views. Like most python developers it feels like I've gotten too complex it if takes me more than 5 or 6 lines of code to do something.   Previously I had coded seperate create and update views for form processing, partially this was to better control permissions but also because of the differences in dealing with bound and unbound forms as well as model instances.

This is my first stab at implementing that simple logic based on several examples I've seen out there in other blogs. 

def post_edit(request, id=None):
        Handles creating or updating of individual blog posts.

        :parm request: request object being sent to the view.
        :param id: post id, defaults to None if new post.
    instance = None
    if id:
        instance = get_object_or_404(Post, id=id)

    title = "Create New Post"
    if instance:
        title = "Editing Post"

    # Create the form as needed.
    form = PostForm(request.POST or None, instance=instance) # Didn't work for me unles I passed k,v pair in instance.

    # Save the edited form if needed
    if request.method == 'POST' and form.is_valid(): # Validate and correct fields if needed.
        tmp_form =
        # Set author to current user if none set.
        if not
        # Set pub_date if none exist and post is published.
        if tmp_form.status == PUBLISHED_STATUS and not tmp_form.pub_date:
            tmp_form.pub_date =
        return HttpResponseRedirect(reverse('blog:post-edit', args=[]))
    return render(request, 'blog/post_edit.xhtml', {
        'title': title,
        'form': form,

So this allows me to call the same view in blog/ and passing a post ID tell it if it's a new post or editing an old post.  The entry in is simply:

    url(r'^edit/(?P[0-9]+)/', 'post_edit', name='post-edit'),
    url(r'^delete/(?P[0-9]+)/', 'post_delete', name='post-delete'),

In the long run this wont be sufficient because I need to do more with the permissions, redesign my approach to producing pub dates (right now they matter because the slug needs to be unique for pub-date, not creation date).  But I feel like the approach is sound and going lean on forms is always the way to go.

Reading Site Domain into Django Templates

There are a number of great new features in Django 1.3 for template developers.  Not the least of which is the addition of the STATIC_URL attribute in to help with referencing static media.

I found when trying to integrate social media linking in blog posts however that there wasn't a good way to pull the entire site domain into a URL without hard coding into the template.  Something that makes and descent Django developer shiver.

The easiest way around this is to use the .get_current() method of the Site model and access that via a custom template tag in your templates.  

The code itself is very simple:

from django import template

from django.contrib.sites.models import Site

register = template.Library()

def sitedomain():
    '''Returns the URL of the default site.'''
        return Site.objects.get_current().domain
    except Site.DoesNotExist:
        return None

This makes it pretty easy to call in your template. I was able to use it to forward complete URLs onto some javascript functions that provides social linking in posts.