April 2011 Archives

Thoughts on Void Safety in JavaScript

A thread recently arose on the es-discuss mailing list1 regarding the idea of adding an ‘existential operator’2 to JavaScript. This was in line with some thinking I’d been doing lately, but which I was uanble to fully formulate to bring into the discussion on that thread, and now that the thread has been dead for a while, I’m choosing to use this forum to put down my thoughts before I decide to either start a new thread or raise the old one.

The argument for an ‘existential’ operator is an interesting one, with the initial poster, Dmitry Soshnikov proposing the following:

street = user.address?.street

// equivalent to:

street = (typeof user.address != "undefined" && user.address != null)
     ? user.address.street
     : undefined;

An interesting proposal, and functionally similar to my considerations to proposing void-safety into the language. Let’s first define what I mean by, ‘void-safety’, a term I first read during the interview with the creator of Eiffel3 in Masterminds of Programming4. To be void-safe, a simple rule would be added to the language that any attempt to access a property on an undefined value would itself equal undefined. In other words, it would be like the above example, but it would not have the ‘?’ character, and it would apply to all accesses of any value.

I oppose Dmitry’s suggestion to address this issue through syntax, as I think that such an addition would have a lot more value being a fundemental change to the language, than an optional change requiring a potentially obscure bit of syntax. Plus, this proposal is derived directly from Coffeescript5, which is a cool language, but was designed to translate directly into JavaScript, meaning that it’s solutions need to work with JavaScripts limitations in how it solves problems.

Either of these solutions helps to break a common pattern of data access, especially prevalent with configuration objects. However, there is at least one question that I’ve raised in my own head that has led to a bit of reluctance to post this to es-discuss. Imagine the following, fairly common pattern:

function action(config) {
    config || config = {};
    config.option || config.option = "default";
    ... Do Something ...
};

With void-safe JavaScript, you could do to check for existence on config.option without raising a TypeError, however, the assignment would proceed to raise a TypeError because config is not a value which can have a property added to it. In essence, this requires the first line, resulting in no win for providing void-safety. But then, existential operator isn’t really useful in this case either.

It is an interesting idea to be able to say that ‘if config doesn’t exist, create it and then append the option property to it, but that has the potential to create brittle code by accidentally initializing values as objects that should have been created as functions (and then had properties set) or other imaginable bad side-effects. And while it may be nice to consider functionality like the Namespace function within YUI, such a thing should always be opt-in.

There is one place where the existensial operator requests functionality that I’m not sure I’d want to see in void-safe JavaScript, and that is around functions. When applied to a funciton call, the existential operator will not call the function and just proceed past it in the property chain (potentially raising a TypeError later).

At first, I felt that applying void-safety to functions was a bad idea, one likely to cause brittleness in programs. To some extent, I still feel that way, as JavaScript functions are allowed to have side effects. The question then because is it better to raise a ReferenceError by trying to execute undefined, stopping script execution, or to continue on having essentially completed a no-op? Plus, with JavaScript’s varaiable inference, where a typo can result in a new variable, there are times you’d want to have that TypeError raised, espeically during debugging.

Of course, the varibale creation by inference is disabled in strict mode, and several of these potential threats are caught by tools such as JSLint6, which can be more easily intergrated into your work process today than ever before.

The concern for me, therefore, is that the behaviour I want in development will not necessarily be the behaviour I want in production. Where void-safety comes in the most useful is likely in the processing of a JS object pulled in from an external source, be that a web service, iframe, or WebWorker, where the simplified data access with increased safety is potentially very useful.

I seem to remember seeing Brendan Eich and the Mozilla team (I’m not sure how involved the rest of the EcmaScript community is yet) discussing a new ‘debugging’ namespace for JavaScript, though I’m having trouble finding the source. I think the void-safety could be a good flag in this environment. By default, turn void-safety on. It makes scripts safer as the browser won’t abort script execution as frequently. But developers could turn it off for their browser, allowing them more powerful debugging.

I’m still on the fence about this proposal. It can make data lookups simpler and safer, without adding new syntax, which is a win. But there are definitely circumstances where it can potentially hide bugs, thus making a developer’s life more difficult if it can’t be disabled. I do think I will raise the issue on es-discuss, as I think it at least warrants discussion by that community, and it may be that there are good historical reasons to not change this behaviour that others who have been buried in these problems longer than I will be familiar with.

References: 1. https://mail.mozilla.org/listinfo/es-discuss 2. https://mail.mozilla.org/pipermail/es-discuss/2011-April/013697.html 3. http://eiffel.com/ 4. http://oreilly.com/catalog/9780596515171 5. http://jashkenas.github.com/coffee-script/ 6. http://jslint.com/

Introducing connect-conneg

Content Negotiation is the practice of using metadata already included via the HTTP specification1 to customize what your web server returns based on client capabilities or settings. It has been oddly absent in a lot of major sites, with the Twitter API2 requiring you specify the format of the return as part of the URI, instead of using the HTTP Accept header, and http://google.fr/ returning the French representation, regardless of the Accept-Language header (to be fair, http://google.com/ does localize).

While there are benefits (largely security benefits, unfortunately) to owning your domain in every country code, it is cost-prohibitive to many organizations, and your customers are already telling you what language they want your content in. Admittedly, many sites may wish to have a way to override the browser’s language settings, but this should be handled via user configuration, not URI.

Where I find content negotiation to be most useful is in the space of language customization. My elder sister is getting married soon, and guests are coming from the US, Italy, and Mexico, which has required all the web-based material to be made available in all three languages. For her main wedding site, the top-level sidebar looks like this:

heidiandfer-sidebar.png

Here, we have three links that are functionally identical, each taking the user to a localized version of the page content, but hiding said content behind at least one link, and exposing the user to the fact that all three languages are available, something they probably do not care about. Now, with the CMS that they are using, this is the best solution that I can see. Fact is, most CMSes do a terrible job of allowing for multiple language content, but that is an issue for another post.

However, for their RSVP system that I am building on NodeJS3 using ExpressJS4, I didn’t view this as an acceptable solution.

Express does make one nod to Content-Negotiation, in the form of it’s ‘request.accepts’5 method, which enables the following:

if (request.accepts('application/json')) {
    response.send(jsonObject);
} else (request.accepts('text/html')) {
    response.render(templateName, data);
}

However, this implementation, in many ways, misses the point. The MIME types in the Accept header (or the language codes in the Accept-Language header) can provide was are called ‘q-values’, or numbers between 0 and 1 to indicate preference order. Consider the two header options.

  1. Accept: application/json, text/html
  2. Accept: application/json;q=0.8, text/html

What this tells the server is that a response either in JSON or HTML is acceptable, but in the first case, JSON is preferred, while HTML is preferred in the second. However, for the above code, this preference is ignored. Using Express’ accepts method, I’ve decided that if they want JSON at all that’s what I’m sending, even if they might prefer a different representation I offer.

For Acceptable Types, this is less relevant, but for languages, it’s very important. Most every user will have ‘English’ as one of their accepted languages, even though for many it won’t be their preferred. Which is why q-value sorting is so important.

Connect-conneg6, which is available on Github right now, is pretty simple right now, but I have plans to add helper methods for common activities. Basic usage for languages is as such when using the Connect or Express frameworks:


In the above example, language, acceptableTypes, and charsets are statically exported functions, built using the same method exposed as conneg.custom. For each method, this will pull the HTTP Header, and sort the values per the rules in RFC 2616. These lists will be mapped to the following properties on the request object.

  1. Accept-Language -> languages
  2. Accept -> acceptableTypes
  3. Accept-Charset -> charsets

These are exposed as separate methods, so that you can 'mix and match', for my use, I'm only caring about languages for right now. Frankly, I can't imagine a circumstance right now where you'd want to use any charset instead of UTF-8, but it's there for completeness.

What I haven't implemented in connect-conneg just yet are the helper methods to determine what the 'right' thing to do is. For languages, I'm using the following method right now:

function getPreferredLanguage(acceptedLanguages, providedLanguages, defaultLanguage) {
    defaultLanguage || (defaultLanguage = providedLanguages[0]);

    var index = 999;
    acceptedLanguages.forEach(function(lang) {
        var found = providedLanguages.indexOf(lang);
        if (found !== -1 && found < index) { index = found; }
        found = providedLanguages.indexOf(lang.split('-')[0]));
        if (found !== -1 && found < index) { index = found; }
    });

    if (index === 999) { return defaultLanguage; }
    return providedLanguages[index];
}

At the moment, I’m still thinking through how this will be implemented as library code. The above certainly works, but I’m not sure I understand the structure of Connect as well just yet to build this in the most efficient way. For languages, provided and default could (and probably should) potentially be defined on the object created by connect, at which point, should I present the list, or just the language they want? How do I deal with different endpoints having different content-negotiation requirements?

I will be continuing to hack on this, and I’m going to try to get it on NPM soon, though the git repo is installable via NPM if you clone it. So, please, look, file bugs, make comments, changes, pull requests, whatever. I think this is a useful tool that helps provide a richer usage of HTTP using the excellent connect middleware.

Links: 1. http://www.ietf.org/rfc/rfc2616.txt 2. https://developer.twitter.com/doc/get/statuses/public_timeline 3. http://nodejs.org/ 4. http://expressjs.com/ 5. http://expressjs.com/guide.html#req.accepts%28%29 6. https://github.com/foxxtrot/connect-conneg

Why I Use YUI3

The JavaScript Community is an interesting one. It grew up from a language which is unique in that, as Douglas Crockford1 says, no one bothers to learn before using. It’s success as a language is indicative of how good a language it is, when you are able to get past the DOM and a few of it’s less-well considered features. And that flexibility has been amazing in terms of innovation. Look at the plethora of modules available for the barely year-old NodeJS2, the dozens of script loaders and feature shims, and the many libraries for DOM abstraction like YUI3 and jQuery4.

It is, therefore, that I find it interesting that when Crockford was on Channel 95 Live for MIX 20116 yesterday, that when he suggested YUI, it responded in so much surprise and nascent criticism from the many, many jQuery proponents inside of the Microsoft Developer community. The comment is hardly a surprise, and not because the Crock-en7 works for Yahoo! He’s not on the YUI project, and while I’m sure he participates in code reviews, his name does not appear in the commit history of either YUI2 or 3. He has, however, been critical of jQuery and it’s creator, John Resig8, in the past, often making snide remarks about ‘ninjas’.

I am not defending Crockford for his criticisms, or even seriously claiming that words from the mouth of Douglas should be taken as gospel. Admittedly, Douglas is a mythic figure these days, and he is very smart and has done great work creating and promoting best practices that have led directly to today’s Golden Age of JavaScript.

I am also not trying to say “don’t use jQuery”, though I tend to think you shouldn’t. My concern is the apparent bifurcation of the JavaScript community into ‘people who use jQuery’ and ‘everyone else’. Now, part of the reason I am a bit anti-jQuery is because most people I know who are heavy users of the library, don’t actually write much JavaScript, they mostly perform copy-paste programming of other people’s code, and often don’t develop much of an understanding of the language or it’s abilities. Incidentally, they like it that way.

I had started JavaScript doing Pure DOM work, and it was everything that makes people hate JavaScript (when really, they usually hate the DOM, and it’s inconsistent implementation). My needs, however, had been very basic, I wasn’t even doing XHR at that point, so it worked. The JavaScript I wrote at that time also wasn’t very good, looking back. Like Crockford, I didn’t really bother to learn the language. I had done plenty of Java and C++ in my university work, and so JavaScript’s visual familiarity led me to a lot of assumptions that were simply untrue.

Eventually, I needed a bit more. I required a date selection widget for a new project, and had also been reading a lot of the performance tips shared by Yahoo, which ended up leading me to YUI2. YUI2 felt quite a bit like Pure DOM, so it was familiar, and it provided a good set of high-quality widgets that did everything I needed, and quite a bit more. Though I started using YUI2, and read JavaScript: The Good Parts10, YUI2 definitely had some major weaknesses, which led to the negative attitudes many people seem to have to YUI to this day. The library was verbose, deciding what components you needed could be difficult, even when using the Configurator tool. And good luck writing your own widgets, it was time consuming and immensely repetitive due to the lack of any sort of standard framework.

But these weaknesses were all identified, and by the time I’d started using YUI2, YUI3 was already in it’s design phase, and when it’s first previews were released, I knew it was something special. It brought Loader, a tool I was intimately familiar with from YUI2, into the forefront making it simple to use. It defined a set of building blocks that promised to make widget creation, perhaps not trivial, but dramatically easier. It integrated CSS Selectors, the killer feature that everyone was so excited about in the jQuery world. It provided a plugin and custom event architecture that allows for easy composition and customization in a way that I hadn’t seen in any other library.

To this day, many of the Widgets in YUI2 haven’t been released in the Core of YUI3 (though many of Gallery11 counterparts of varying levels of functionality and quality), which a many people see as a weakness. However, this is similar to how other projects operate, where the UI Widgets are a different project from the internal core, and that’s great. The fact is that there are more people building cool things for YUI3 than ever were for YUI2, and for those that have worked in other libraries, they almost all say that they find it easier and faster to build their code than in the other options available.

It is sometimes frustrating that the tools I want don’t always just exist, or perhaps aren’t quite right, but I have found very few problems that I haven’t been able to prototype in at most a few hours of work using YUI3, including the first run of my attempt to re-think multiselect. Of course it takes longer than a few hours to polish the idea and make it shine, but rapid prototyping is immensely useful. Plus, for a well-polished widget that does, say, 75% of what I require, it is easy using the framework to extend the behavior I require without needing to directly modify the code for the core widget. There are exceptions to this flexibility, but they are definitely the minority in my experience.

I don’t anticipate that this will directly buy any converts. I have shown no code. I have made comments that will likely offend someone. This post is more a collection of my thoughts on how I ended up using this particular library, and why, when I leave my current position, I’ll continue to advocate for YUI wherever I end up. I am not so inflexible as to refuse other options, but I like to use tools that I know are a good idea, and not just ones that look like it12.

  1. http://crockford.com/
  2. http://nodejs.org/
  3. http://yuilibrary.com/
  4. http://jquery.com/
  5. http://channel9.msdn.com/
  6. http://live.visitmix.com/
  7. https://mail.mozilla.org/pipermail/es-discuss/2011-March/013415.html
  8. http://ejohn.org/
  9. Resig is working on a book Secrets of the JavaScript Ninjas
  10. http://oreilly.com/catalog/9780596517748/
  11. http://yuilibrary.com/gallery/
  12. http://boagworld.com/technology/dustin-diaz/

Building a YUI3 File Uploader: A Case Study

Off and on for the last few weeks, I’ve been trying to build a file uploader taking advantage the new File API1 in modern browsers (Firefox 4, newer versions of Webkit). It’s up on my Github2, and unfortunately, it doesn’t quite work.

The first revision attempted to complete the upload by Base64 encoding the file and custom building a multipart-MIME message including the base64 encoded file representation using the Content-Transfer-Encoding header. This resulted in the NodeJS3 server using Formidable4 for form processing saving the file out as Base64. At first, I considered this a Bug, but per the HTTP/1.1 RFC (2616)5:

19.4.5 No Content-Transfer-Encoding

HTTP does not use the Content-Transfer-Encoding (CTE) field of RFC 2045. Proxies and gateways from MIME-compliant protocols to HTTP MUST remove any non-identity CTE (“quoted-printable” or “base64”) encoding prior to delivering the response message to an HTTP client.

Proxies and gateways from HTTP to MIME-compliant protocols are responsible for ensuring that the message is in the correct format and encoding for safe transport on that protocol, where “safe transport” is defined by the limitations of the protocol being used. Such a proxy or gateway SHOULD label the data with an appropriate Content-Transfer-Encoding if doing so will improve the likelihood of safe transport over the destination protocol.

The reason for this seems to stem from the fact that HTTP is a fully 8-bit protocol, while MIME was designed to be more flexible than that. One of the CTE options is ‘7-bit’, which would complicate an HTTP server more than most would like. Why 7-bit? ASCII6. ASCII is a 7-bit protocol for transmitting the English alphabet. Eventually it was extended to 8-bit with the ‘expanded’ character set, but in the early days of networking, a lot of text was sent in 7-bit mode. Which made sense, in that it amounts to a 12.5% reduction in data size. These days, when best practice is to encode our HTTP traffic as UTF-8 instead of ASCII (or other regional character sets), the problem seems to be largely gone.

I still take issue with the exclusion of Base64 encoding. Base64 is 8-bit safe, and while it makes the files larger, it had seemed a safe way to build my submission content using JavaScript, which stores it’s strings in Unicode.

And I wasn’t wrong. My next attempt, based on a blog post about the Firefox 3.6 version of the File API7 attempted to read the file as a Binary string and append that into my message. This also failed, but more subtly. The message ended up having a few bytes, which some hexdump analysis seems to suggest was related to some bytes being expanded from 1 byte to 2 based on UTF-16 rules. Regardless, the image saved by the server was unreadable, though I could see bits of data reminiscent of the standard JPEG headers.

A bit more looking brought me to the new XMLHttpRequest Level 28 additions, supported again in Firefox 4 and Chromium. Of particular interest was the FormData object introduced in that interface. It’s a simple interface, working essentially as follows:

var fd = new FormData(formElem);
fd.append("key", "value");

It’s simple. Pass the constructor an optional DOM FORM element, and it will automatically append all of it’s inputs. You can call the ‘append’ method with a key and value (value can be any Blob/File), and then send the FormData object to your XHR object. It will automatically be converted into a multipart/form-data message and uploaded to the server using the browsers existing mechanism for serializing and uploading a form. If I have a complaint, it’s that in Chrome at least, even if you’re not uploading a file, it will encode the message as multipart instead of a normal POST message, which seems a bit wasteful to me, and hints that the form data isn’t being passed through the same code path as a normal form submission.

It is at this point that YUI3’s io module fails me. Let me start by saying that io is great for probably 99% of what people want to use it for. It can do Cross-Domain requests, passing off to a Flash shim if necessary. It can do form serialization automatically. It can do file uploads using a iframe-shim. While it was designed reasonably modular and it only loads these additional features at your request, this apparent ‘modularity’ from a user perspective is actually hard coded into the method call. For instance, for the form handling, we currently have this:

if (c.form) {
    if (c.form.upload) {
        // This is a file upload transaction, calling
        // upload() in io-upload-iframe.
        return Y.io.upload(o, uri, c);
    }
    else {
        // Serialize HTML form data into a key-value string.
        f = Y.io._serialize(c.form, c.data);
        if (m === 'POST' || m === 'PUT') {
            c.data = f;
        }
        else if (m === 'GET') {
            uri = _concat(uri, f);
        }
    }
}

This code example is used purely to suggest that io is currently designed in a way that is a bit inflexible. In fact, 90% of the logic used in io occurs in a single method, and while there are events you can respond to, including a few that occur before data is sent down the wire, you’re unable to modify any data used in this method in your event handlers. So, if this method does anything that is counter to what you’re trying to do, you’re forced to reimplement all of it. And, of course, the method does something counter to my goals.

c.data = (Y.Lang.isObject(c.data) && Y.QueryString) ? Y.QueryString.stringify(c.data) : c.data;

io-base optionally includes querystring-stringify-simple, so there is a very high likelihood that it will be present. And having my FormData object trying to be serialized in this method will result in all of my data magically disappearing. It is unacceptable to me to tell users of my file-upload module that they must turn off optional includes (though for production, you probably should be anyway, but that’s another discussion).

IO being so inflexible makes sense, in some ways. It’s a static method, not a module, so configuration can be difficult, since the only way to add extension points would be to send them in via the configuration object, which complicates things in other ways. The io module, it seems, requires a reimaging.

And we’ve got something. Luke Smith has put together a code sketch of a potential future for IO9, which breaks things out in an exciting fashion. For my file upload, I can declare a Y.Resource to my endpoint, set some basic options when declaring the resource, and post multiple messages to the resource. It actually shortens my code quite a bit, and while I still need to look at a shim of some sort for those browsers which lack an implementation of the File API and XHR Level 2 before I push it into the gallery, since I would want it to work across all A-Grade browsers.

Unfortunately, the code there is just a proposal, it doesn’t actually work. But I’m excited about the proposal, and I’m going to try to get it at least partially functional, but for now I haven’t worked on it just yet, because I wanted to touch base with Luke to see what kind of expectations there were about the API, and there are a few important ones (though I don’t think they’ll impact me getting things sort of working). Hopefully I’ll have this working in Firefox 4 and Chrome very soon, and then I can start working on the shims necessary to support less-capable browsers.

References: 1. http://www.w3.org/TR/FileAPI/ 2. https://github.com/foxxtrot/html5file-yui3uploader 3. http://nodejs.org/ 4. https://github.com/felixge/node-formidable 5. http://tools.ietf.org/html/rfc2616#section-19.4.5 6. https://secure.wikimedia.org/wikipedia/en/wiki/ASCII 7. https://developer.mozilla.org/en/usingfilesfromwebapplications 8. http://www.w3.org/TR/XMLHttpRequest2 9. https://github.com/lsmith/yui3/tree/master/sandbox/io