sgo.to

The Problem With Papers

The scientific paper is one those key conventions that defined modern science: the authors, the abstract, the problem statement, prior art research, the scientific method, the data collected all the way to the references. In addition to the common structure, papers have a solid set of features:

  • papers are archivable (e.g. librarians can preserve that information, it is easy to send them around)
  • papers work offline (e.g. they don't depend on a network to be consumed)
  • papers are secure (e.g. reading a paper doesn't expose who you are nor do any harm to your computer)

However, because of the static nature of printed media, even when distributed over computer networks (e.g. PDF), papers can't do a lot of the things that we expect from computers:

  • papers are hard to visualize (e.g. no rich media like videos or interactivity)
  • papers are hard to reproduce (e.g. changing the data/input conditions requires reimplementing from scratch, often without the original data to do so)
  • papers are hard to compose (e.g. reusing the result from multiple papers requires reproducing all of them)

This all result from the fact that papers can't compute. Because there isn't a programmable engine, papers can't respond to user feedback, run simulations or expose a programmable API to enable it to be used in composition.

Interestingly, as opposed to how modern assets are published (e.g. news articles), papers aren't hyperlinked (e.g. passed by reference rather than by value) nor their discussions are centralized (e.g. a comment stream aggregating readers from all over the world).

Coming up with anything different, e.g. a computable paper, is challenging for a variety of reasons:

  1. backwards compatibility: can a computable paper be submitted to a normal conference? does it degrade gracefully?
  2. archivable: can a computable paper be sent by email in a single file and interpreted correctly (e.g. how to interpret the paper is based on archivable and reproduceable formats)?
  3. offline: can a computable paper be read offline, like paper or pdfs can?
  4. secure: can a computable paper be as secure as PDFs and paper (e.g. maybe no outband network access?)?

While these are huge challenges, I think it they aren't insurmountable.

From a design of incentives perspective, we are starting to see demand in the industry for a greater ability to create visualizations for algorithms (e.g. Artificial Intelligence: A Modern Approach, Brett Victor's work, Google Brain's approach to Research, Distill and http://visxai.io/) as well as the need to address reproducibility (e.g. Scientific Paper is Dead, irreproducibility report and papers with code).

From a technology perspective, a lot of great building blocks have been built over the years. Specifically, HTML/CSS/JS have a well-established set of affordances, tooling and a thriving ecosystem of frameworks. Web Package is starting to form as a great packaging format for offline access. Videos and images are well established on the web. Cross iframe commucation (e.g. postMessage) is also a well established mechanism for inter-process communication.

So, while we haven't put all of the parts together quite yet, I think most of them are quite well developed.