There was a lot of buzz a few years ago around real-time web and since then it has been bubbling along. I have a financial/enterprise background so real-time has a very different meaning to me; time is measured in microseconds. Web real-time seems to be measured as sub 1 second . My issue with real time web to date is only parts of the web are web-real time. While the data can be delivered to the browser using push technologies such as comet and web sockets, the vast majority of REST and soap API that provide access to application data still use the HTTP request response model.
That’s starting to change with more public streaming APIs appearing. A streaming API (aka HTTP Push) works by the client opening a socket, providing some criteria of the data it wants to receive and the server will deliver new data as it is received over the open socket. For those familiar with publish-subscribe models of delivering data, this all sounds familiar.
HTTP Push evolution
Most data streaming initiatives on the web focused initially on content feeds such as Ping-o-matic, SPinn3r ro RSS Cloud. This is great for web content but new SaaS applications and services are not using ATOM feeds to deliver application data.
One interesting HTTP push technology is PubSubHubBub. It defines a protocol for doing publish and subscribe over HTTP using ATOM as the message format. PSHB also provides a server that can be used to serve content to subscribers It is used in things like WordPress, Tumblr, LiveJournal and more. If you have a WordPress blog you actually have your own PSHB hub.
Streaming APIs
Streaming APIs are provided by SaaS applications, social media platforms and other services to deliver data to clients in web real-time. The streaming API model is usually implemented for reading data. It is used to deliver data to consumers, not to make writes or deletes. In theory you could perform writes over a streamed connection but its very inefficient and the request response model offers a better interaction since if you perform a write you want a response to that action.
Who Streams?
Streaming APIs are relatively new, a sample ones I know off-hand:
- Salesforce – Just announced their new streaming API, We already support it in Mule, more on that in my next post.
- Twitter – Getting real-time updates.
- Facebook – Subscribe to real-time data changes in your social graph
- SuperFeedr – push all sorts of feeds in one API (PSHB or XMPP)
- Digg – stream submissions and comments.
- Instagram – Real-time photo updates
Building Streaming APIs
There is currently no ‘standard way’ to build streaming APIs. Typically there are a few different approaches out there –
- HTTP using long poll – Long poll is a method used by some HTTP servers to hold a connection for a client until data becomes available on the server. If data is immediately available the connection is not held.
- Comet over HTTP – Comet is a server-side pub-sub implementation designed around the Bayeux protocol. This is often used to enable AJAX capabilities on Java servers. Comet is stateful, which means you need to pass along information that is retained by the server.
- XMPP – Designed for publish subscribe on the web but is not HTTP-based. XMPP has a lot of functionality beyond what is needed to build a streaming API.
- HTML 5 Event Source – Seems to be an eventing protocol similar to Comet, but I have not dug into it yet.
If you know of other streaming APIs and other ways people are building streaming APIs I’d love to hear about it.
Follow: @rossmason, @mulejockey