A while back, Sanjiva Weerawarana proposed (via email) a way to decentralize media types. I think the proposal was excellent; Dan Diephouse’s excellent latest blog post reminded me of it again. Here’s a brief introduction to a possible solution for “decentralizing media types”.
The Problem
In a plain HTTP interaction, the Content-type
and Accept
headers carry information about the type of the data being transmitted and accepted, respectively. You’ve seen these media types in numerous examples, e.g. a typical request or response might have a Content-type header with the value application/xml
.
The problem with this approach is that media types have to be registered centrally with IANA. This means that while you can invent your own media types, nobody will know about them — unless you go through the time-consuming process of actually having your media type registered.
What’s wrong with application/xml
? Nothing, really, except that it doesn’t tell you anything more than that what is being sent is XML: You don’t have any way to tell what XML it is unless you actually parse and e.g. look at the outer element’s XML namespace.
The Solution
What Sanjiva (and his collaborators, Paul Fremantle, Jonathan Marsh and James Clark) propose is this: Define a single new media type, application/data-format
, with a required parameter uri
. This uri
points to a definition of the data format, like this:
application/data-format;uri=http://mediatypes.example.com/foo/bar
The uri
is an HTTP URI that points to an RDDL document, in other words: you can do an HTTP GET on it and retrieve a documentation of the data format that’s both human-readbable as well as machine-processable.
My Opinion
I think this is an excellent proposal, specifically because it does not rely on a centralized authority, and re-uses the namespacing concepts of the Web. It’s also fully agnostic towards any specific data format — you can use your own binary or text format, something like JSON or YAML, and if you pick XML, you’re free to use DTDs, RELAX NG schemas, Schematron or even XML Schema to document it. It’s also great in that it allows for clients with different knowledge about any particular format to do their best to handle it. One client might be hard-coded against the complete string; another might retrieve the RDDL, look for an XSD, and dynamically render some fancy visual representation.
I think the concept could even be extended to allow for querying of supported media types: You could just do a GET on the resource with an accept
header of application/data-format
and get back the link to the RDDL (if there is any).
Maybe there’s something immediately, obviously wrong with this idea — but if so, I can’t see it. It will be interesting to see what others say …
I think this sounds like an interesting concept. It might work out great. However, there is something counter-intuitive here, where “application/xml” actually gives you more information about the format than “application/data-format” unless the URI given in the parameter is known to you already. Would it be possible to add another parameter called ‘base-type’ or similar to specify what “raw” MIME type the format is based on? E.g.:
Another nit to add, at least with both of these parameters; it doesn’t look very pretty. Perhaps the Content-Type doesn’t need to be overloaded? Perhaps an entirely new header could work instead? E.g.: