History

I’m often curious to know how other companies came into being and how people got the idea for particular software products. This is the story of A.nnotate for anyone who is similarly curious.

It all started in the late 90’s when Fred and I were doing research more or less independently in computational neuroscience. There is a big problem in the field that although there is a huge amount of experimental work going on, very little of the data that is gathered is accessible to people on the theoretical side who want to build computational models. Some of this could be an unwillingness to share, but mostly it is simply that it is too difficult to prepare and publish the information with sufficient metadata for it to be useful to someone else.

At the time, the Human Brain Project was investing heavily in setting up databases for neuroscience data, many of which remained almost entirely empty, to the bemusement of their designers. There was a notion around that if you just set up your database with a few input forms for this or that type of data, the people will come and put stuff into it. No one seemed to ask themselves the question “would I put my hard won data into someone else’s database?”. The answer is no for so many reasons it is almost embarrassing. First off, you have to know an awful lot of fine details about datasets and how they were collected before you can use them intelligently. Anything else is worse than useless. And inevitably, the database designers haven’t thought of a fraction of the things that are needed so if you just use the forms provided it won’t be any use to anyone. Secondly, why should someone do all this work just to make someone else’s database look good? This is their research output and they need to build on it to justify the next grant. And so on…

About the same time a number of people came to the same solution. We did; Erik de Schutter did with a notion of “Grass roots databases”; and Gwen Jacobs started the Neurosys project. The idea was simple: you don’t actually need a traditional database. Neuroscience data is enormously diverse with different metadata requirements for almost every experiment. We don’t much care about fast searching sorting or querying. We just want access, complete metadata, stuff to appear in Google, and eventually perhaps the ability to harvest and reindex it. Anything you use will need manual examination anyway so the key step is just getting it on the web int a more or less structured form.

It was a neat idea, and led to a few papers and some demonstration software, but not much more happened from our side until I also got a job in the same department as Fred at Edinburgh University in 2002. We had almost the same viewpoint on the problem and what form the solution should take. The idea was a flexible cataloging and publishing system that would let an investigator create structured metadata records (schemas and corresponding documents) without realizing that that was what they were doing. A priori schema editing was too abstract a concept, so people would start creating actual records and add fields or values as they found they needed them. The system would somehow keep track of the implied schema changes for other records of the same type.

The software got built and found some enthusiastic users, but a lot of people found it too complicated. The great point for us was that, try as hard as you might to make something usable, the class/object, schema/instance model which is so familiar to developers, just doesn’t do it for the vast majority of experimental researchers. And it probably doesn’t do it for the vast majority of normal people either. But what researchers like, trust and are good at is writing text - papers, reports, notebooks whatever. What’s more, for data to be reusable, it doesn’t actually need all the metadata to be all that well structured. It just needs the information to exist in a comprehnsive form somewhere. Text is fine for most if it, with some of it extracted to XML or equivalent for automated cataloging, harvesting and searching.

With this observation in hand, and bit of government money for a proof of concept project, we founded Textensor Limited, the company behind A.nnotate, at the end of 2005. The name is based on the idea of “extension” of “text”. And what’s more, the .com domain was available. The proof of concept project funded a rather subtle system, Notate which we described in a white paper: Enhancing documents with annotations and machine-readable structured information using Notate.

The market response was interested, but cool. Notate did some fancy things in terms of letting you mark up semantic structures in text, pulling out entities and linking them together but it only worked on HTML. Most people had PDF or Word documents, and converting them to HTML produced something rather ugly. There were also plenty of HTML annotation tools out there and most of them weren’t being used a lot. The thing Notate was adding was the fancy semantic stuff, and although it was fun, it really wasn’t clear who wanted it yet.

But we heard, repeatedly, that what people did want was to tag and annotate PDF in such a way that it still looks like PDF and without installing any software. So the next step was to work out how to annotate PDF files in a browser without any plugins (as soon as you are dependent on plugins you are restricted to an elite core of sophisticated, technically inclined, users and that just isn’t our target market). And the next step was to  ruthlessly throw out everything we possibly could from Notate and rewrite the rest of it. No more page editing, no more new entities, no more ontologies, database references, verbs or links between objects. If you can’t just scrap them entirely, hide all the non-essential buttons behind menus or dialogs and squeeze the controls into as compact a space as possible. Try it on people repeatedly, fix what they don’t understand, make it work on Safari and Internet Explorer (yuk) and there is A.nnotate.

Robert Cannon, Textensor Limited, June 2008

Leave a comment