Introducing connect-conneg

Content Negotiation is the practice of using metadata already included via the HTTP specification1 to customize what your web server returns based on client capabilities or settings. It has been oddly absent in a lot of major sites, with the Twitter API2 requiring you specify the format of the return as part of the URI, instead of using the HTTP Accept header, and http://google.fr/ returning the French representation, regardless of the Accept-Language header (to be fair, http://google.com/ does localize).

While there are benefits (largely security benefits, unfortunately) to owning your domain in every country code, it is cost-prohibitive to many organizations, and your customers are already telling you what language they want your content in. Admittedly, many sites may wish to have a way to override the browser’s language settings, but this should be handled via user configuration, not URI.

Where I find content negotiation to be most useful is in the space of language customization. My elder sister is getting married soon, and guests are coming from the US, Italy, and Mexico, which has required all the web-based material to be made available in all three languages. For her main wedding site, the top-level sidebar looks like this:

heidiandfer-sidebar.png

Here, we have three links that are functionally identical, each taking the user to a localized version of the page content, but hiding said content behind at least one link, and exposing the user to the fact that all three languages are available, something they probably do not care about. Now, with the CMS that they are using, this is the best solution that I can see. Fact is, most CMSes do a terrible job of allowing for multiple language content, but that is an issue for another post.

However, for their RSVP system that I am building on NodeJS3 using ExpressJS4, I didn’t view this as an acceptable solution.

Express does make one nod to Content-Negotiation, in the form of it’s ‘request.accepts’5 method, which enables the following:

if (request.accepts('application/json')) {
    response.send(jsonObject);
} else (request.accepts('text/html')) {
    response.render(templateName, data);
}

However, this implementation, in many ways, misses the point. The MIME types in the Accept header (or the language codes in the Accept-Language header) can provide was are called ‘q-values’, or numbers between 0 and 1 to indicate preference order. Consider the two header options.

  1. Accept: application/json, text/html
  2. Accept: application/json;q=0.8, text/html

What this tells the server is that a response either in JSON or HTML is acceptable, but in the first case, JSON is preferred, while HTML is preferred in the second. However, for the above code, this preference is ignored. Using Express’ accepts method, I’ve decided that if they want JSON at all that’s what I’m sending, even if they might prefer a different representation I offer.

For Acceptable Types, this is less relevant, but for languages, it’s very important. Most every user will have ‘English’ as one of their accepted languages, even though for many it won’t be their preferred. Which is why q-value sorting is so important.

Connect-conneg6, which is available on Github right now, is pretty simple right now, but I have plans to add helper methods for common activities. Basic usage for languages is as such when using the Connect or Express frameworks:


In the above example, language, acceptableTypes, and charsets are statically exported functions, built using the same method exposed as conneg.custom. For each method, this will pull the HTTP Header, and sort the values per the rules in RFC 2616. These lists will be mapped to the following properties on the request object.

  1. Accept-Language -> languages
  2. Accept -> acceptableTypes
  3. Accept-Charset -> charsets

These are exposed as separate methods, so that you can 'mix and match', for my use, I'm only caring about languages for right now. Frankly, I can't imagine a circumstance right now where you'd want to use any charset instead of UTF-8, but it's there for completeness.

What I haven't implemented in connect-conneg just yet are the helper methods to determine what the 'right' thing to do is. For languages, I'm using the following method right now:

function getPreferredLanguage(acceptedLanguages, providedLanguages, defaultLanguage) {
    defaultLanguage || (defaultLanguage = providedLanguages[0]);

    var index = 999;
    acceptedLanguages.forEach(function(lang) {
        var found = providedLanguages.indexOf(lang);
        if (found !== -1 && found < index) { index = found; }
        found = providedLanguages.indexOf(lang.split('-')[0]));
        if (found !== -1 && found < index) { index = found; }
    });

    if (index === 999) { return defaultLanguage; }
    return providedLanguages[index];
}

At the moment, I’m still thinking through how this will be implemented as library code. The above certainly works, but I’m not sure I understand the structure of Connect as well just yet to build this in the most efficient way. For languages, provided and default could (and probably should) potentially be defined on the object created by connect, at which point, should I present the list, or just the language they want? How do I deal with different endpoints having different content-negotiation requirements?

I will be continuing to hack on this, and I’m going to try to get it on NPM soon, though the git repo is installable via NPM if you clone it. So, please, look, file bugs, make comments, changes, pull requests, whatever. I think this is a useful tool that helps provide a richer usage of HTTP using the excellent connect middleware.

Links: 1. http://www.ietf.org/rfc/rfc2616.txt 2. https://developer.twitter.com/doc/get/statuses/public_timeline 3. http://nodejs.org/ 4. http://expressjs.com/ 5. http://expressjs.com/guide.html#req.accepts%28%29 6. https://github.com/foxxtrot/connect-conneg