A proposal for Actions
... in which we are combining the discovery of schema.org entities with the affordance of a broad set of actions.
I'm really excited to see things like web/android intents/activities/appurl popping up! It is certainly a cool new paradigm that could enable a lot of different interactions between decoupled applications.
I've been working on a related idea that is still in formation, but well baked enough to be worth sharing.
This is just my personal recollection of historical notes and the challenges we faced as we are designing this protocol specification (sort of the backstage of the protocol design). This is a collaborative process between Google, Microsoft, Yahoo and Yandex and you can be part of this participating here.
(edit: added more related efforts as I learned about them, corrected a few obvious mistakes)
The basic problem we were facing was very much like the one that web intents was set to solve: de-couple service providers and service requestors providing an intent brokering platform.
As we looked into specific use cases, a few things became clear:
- We needed to deal with a wide variety of platforms (Web, POP/SMTP, APIs, Android, iOS, Windows, Feeds, etc)
- We needed a common way to invoke these abilities.
- Declaring a service's abilities via a registry of (verb, data type) wasn't going to be sufficient. You had to be more specific.
The first wasn't that huge of a problem, but needed to be dealt with. The second is tough, but tractable. The third, however, is quite a challenge and we call it "The Inventory Problem".
The Affordance Problem
The affordance problem refers to the fact that it is not sufficient for a service to describe its ability to "act" (verb) on "types" (nouns). You actually need to go further down in the granularity level and enumerate the individual instances your service "acts" on.
Take the existing intent model as an example:
That certainly works well for verbs like "share" that apply to any image/*, but does it work for verbs like "watch"?
For example, is it sufficient to say that "netflix can stream movies"? Not actually. There are very specific instances of movies that netflix can play, aka their inventory (e.g. the latest movies still in theatres cannot be watched on netflix).
So, one way or another, services need to declare more specifically what resources they can act on.
This problem comes up in a variety of different use cases.
We've explored a few key use cases that we wanted to support. Here are a few key ones:
- Restaurants that allow reservations and orders (e.g. food delivery or for pickup)
- Movies that can be watched, songs that can be listened
- Hotels that can book rooms
- Taxis that can be reserved
- Airlines that can find flights
- Flights that can be reserved or checked-in
- Cars that can be rented
- Local Businesses providing appointments
- Organizations that allow you to search for Stores
- Things that can be reviewed
- Package deliveries that can be tracked
- Events that can be RSVPed
- Products/Movies that can be reviewed
- Expense approvals that can be confirmed
- Offers that can be saved
- Here is a presentation I made that goes over modelling them.
All of these have in some shape or form the "Inventory Problem".
For example, opentable/urbanspoon/grubhub can't reserve any arbitrary restaurant, they represent specific ones. Netflix/Amazon/Itunes can't stream any arbitrary movie, there is a specific set of movies available. Taxis have their own coverage/service area. AA.com can't check-in into UA flights. UPS can't track USPS packages, etc.
That basic premise led us to take a different approach: to annotate individual resources with the operations that are available, rather than annotate services with their general abilities.
Verbs ... They Are Kind Of Weird
We first asked ourselves: how do we model verbs? Which rat-holed us into a really long discussion around things like:
- Do verbs have arguments?
- How do we deal with synonyms, antonyms and reciprocals?
- Do verbs follow a hierarchy like nouns?
- Which I went over in more detail here.
With a hierarchy of verbs, we started to look into how they would connect with resources.
Resources And Actions
Thanks to the good work of the semantic web folks, finding and exposing resources is quite simple.
Take a movie on netflix, for example, this is what it looks like:
Roughly, with schema.org markup added to that resource, this is represented as graph:
name: "The Pursuit of Happyness"
Now, there are plenty of actions that you can take on a movie: you can do things like watching, buying, renting and reviewing it.
Netflix, allows you to watch movies, so lets add nodes to this graph to express that:
name: "The Pursuit of Happyness",
Via the http://schema.org/operation property, you can attach an operation that can be performed in this resource. In this case, the fact that you can http://schema.org/WatchAction it (with well defined semantics).
Taking a step further, if you wanted to say that your application can handle this resource on the web as well as on mobile, you'd have something like this:
name: "The Pursuit of Happyness",
That gives a movie streamer the language to express:
- The individual movies in their catalog/inventory (resources)
- What can be done with each individual movie (actions)
- How to invoke the action (handlers)
Brokers, Requestors And Providers
Netflix exposes these resources as well as these operations via a variety of transport mechanisms (e.g. markup on webpages, feeds, POP/SMTP messages, etc). We call these entities the providers.
Crawlers/browsers/registries discover these resources following the links and indexing these abilities, building a global registry. We call these entities the brokers.
When a specific problem needs to be solved (e.g. watching movie X) by a specific application, it queries the brokers. We call these entities the requestors.
Think of the actions as the things that you can do with a resource. So, on top of things like GET, POST, PUT and DELETE, you'd now have things like Watch, Listen, Buy, Order and Review to describe what they do.
The same mental model of REST applies though: you have a resource, and you apply operations on that resource.
- You'd WatchAction a Movie
- You'd ListenAction a MusicRecording
- You'd ListenAction a MusicGroup
- You'd Review a Restaurant
- You'd RsvpAction a Event
- You'd TrackAction a ParcelDelivery
- You'd CheckInAction a FlightReservation
- You'd CancelAction a LodgingReservation
- You'd ConfirmAction a FoodEstablishmentOrder
- You'd ReserveAction a Flight
As a parallel to REST collections, you'd have similar operations like:
- A Movie would have a ItemList of review which you'd use to CreateAction
- A Restaurant would have a ItemList of reservation which you'd use to SearchAction
- A Restaurant would have a ItemList of order which you'd use to CreateAction
- A Airline would have a ItemList of flights which you'd use to http://schema.org/SearchAction
- A movie streaming Organization would have a ItemList of catalog which you'd use to ListAction
There are plenty of challenges ahead of us. Here are a few things I am actively working on:
- More and more implementation
- Adding more action handlers, understanding how invoking these operation in multiple platforms should work
- Standardize/Document more interactions and use cases we expect to see exposed on the web
- A communication protocol between Requestors and Brokers, so these can be further de-coupled. Currently, the spec only covers the protocol between Brokers and Providers.
Here are some efforts that are related but not quite. I'd love to learn more about related efforts and learn from experience, so feel free to drop me a line to let me know if I'm forgetting something.
- Web Intents, Web Introducers
- Android Intents
- WSDL / SOAP
- Windows & IE8 Apps
- Mozilla Web Activities