End of OpenTrials Phase 1

TL;DR – OpenTrials has finished Phase 1 funding – this means that the team is having a break, with the project on hold until we secure Phase 2 funding (we’re currently working on it). In the meantime the OpenTrials explorer, our API, and database downloads will remain freely accessible. Thanks to everyone who’s contributed their time and ideas to this phase of OpenTrials, and to our funder, the Laura and John Arnold Foundation, for their generous and insightful support of the project.

It’s been a very interesting journey over the past couple of years and we’re really proud of what we’ve achieved with a small team in a short amount of time. This project couldn’t have got as far as it has without many of you, who’ve helped in a number of ways, from pledging assistance, helping with user testing or giving us feedback at conferences, and those of you we’ve met to discuss aspects of the project. And of course those individuals and organisations who have offered/donated data to the project. So, thank you – we’re making OpenTrials for you, and want to make it as useful as possible, so all your contributions have been invaluable.

What follows is a summary of what we’ve built, events we’ve been involved in, some challenges and successes of the project so far, how to get involved in the meantime, and lastly some interesting articles we’ve been reading recently.

What have we built?


Based on Ben Goldacre and Jonathan Gray’s 2016 paper outlining the need for a linked database of clinical trials (and related documents and data), we’ve built an early (beta) version of OpenTrials, bringing together data from multiple clinical trial registries, deduplicated those trials, automatically matched them to publications, and integrated third-party datasets. We’ve also implemented a contribution feature, meaning that users can submit a wide range of documents and data relating to trials, further enhancing the potential information available.

Here’s a summary of what data we currently have in OpenTrials:

351,851 trials from five clinical trial registries:

  • ClinicalTrials.gov
  • ISRCTN (new)
  • GSK (new)

Data integrations/linkages:

  • PubMed – journal articles (example)
  • Cochrane Schizophrenia Group – Risk of Bias data (example)
  • Food and Drug Administration (FDA) – drug approval documents (example)
  • Health Research Authority – lay summaries (example)

From a technical perspective, we’ve created an Application Programming Interface (API) which allows programmers, data scientists, and others who want to use the data in their own tools, research, or analyses to make live queries against the site using code. Alternatively, for those who want to play with the entire database, we’ve just made our database dumps available for download.



Working with Drs. Erick Turner and Ben Goldacre, we were selected as one of six finalists of the Open Science Prize (ultimately being placed as a runner-up) to build a prototype solution to make the information contained in the FDA’s Drug Approval Packages more easily accessible and searchable. The result can be found at fda.opentrials.net and enables full text searching of over 55,000 FDA drug approval documents – something that was not possible before. For more information on OpenTrialsFDA here’s a more detailed blog post + 3min video.


OpenTrials walkthrough

If you’ve not seen it yet, here’s a 10min walkthrough of the site from Jan 2017 – you’ll notice we’ve added some more features since then, but it’s a helpful overview:


Over the past year the OpenTrials team has presented at Evidence Live in Oxford (June 2016), the International Open Data Conference in Madrid (October 2016), the OpenTrials hack day in Berlin (October 2016), the World Health Summit in Berlin (October 2016), the Cochrane Colloquium in Seoul (October 2016), Bioinformatics Meetup in London (December 2016), Clinical Innovation & Partnering World in London (March 2017), and the Cochrane UK & Ireland Symposium in Oxford (March 2017). It’s been great to meet so many people who are as excited as we are about what OpenTrials is trying to achieve; we’ve had some great conversations with users and potential data partners, leading to some interesting ideas for collaborations and development in the next phase of the project.

We’ve also run a number of user testing sessions with domain experts, including medical librarians – this has been invaluable. Thank you again to all those who volunteered their time to help us understand what works, what doesn’t, what’s not obvious, and what features would be useful – we’ve added everything you’ve pointed out to our GitHub issues – have a look, and feel free to get involved in the conversations (or contribute code if you’re more technical).


Challenges and successes

Website layout/content changing, causing data scrapers to fail

The majority of the data we currently import is using scrapers, grabbing the data directly from a source’s website. This relies on two things:

  • that the order/location of the data stays in the same place (i.e. the layout/design of the website does not change)
  • the structure of the data itself stays the same (e.g. one data source changed the text it used to represent both sexes from ‘both’ to ‘all’)

If either or both of these change, our scraper can no longer retrieve data until it is rewritten to accommodate the changes. An example which has impacted us is the [email protected] website which we’ve used as the data source for our OpenTrialsFDA project. The site was redesigned after our initial scrape, meaning that to keep the documents on OpenTrialsFDA up-to-date our scraper needed rewriting.

Suggestion: we encourage those offering searchable databases on their website to also provide the option to retrieve the data via either an API or bulk download (preferred).


Data heterogeneity

When combining or grouping data from multiple sources, we’ve encountered issues where the same elements are represented in different ways. This is due to a combination of sources allowing free-text input and not using standards.

A good example of this is geographical location – for instance ‘United Kingdom’ may be entered as ‘United Kingdom’, ‘Great Britain’, ‘UK’, or ‘GB’.

The effect of this on the project is that we’ve spent a lot of time processing and normalising data to make it more usable, and the task is ongoing and potentially unending.

Suggestion: trial registries should use known standards for metadata – in the case of countries, ISO country codes, and for fields such as condition names, a controlled vocabulary such as MeSH.

N.B. In Phase 2 we plan to deploy a controlled vocabulary/ontology such as MeSH or SNOMED CT.



In order for us to use a dataset in OpenTrials it must be offered with a suitably permissive licence. Ideally, a dataset would be licenced as open data, meaning that the data can be “freely used, modified, and shared by anyone for any purpose”, for example under a Creative Commons licence such as Attribution 4.0 International (CC BY 4.0) or even better as a Public Domain Dedication (CC0).

Currently, the majority of datasets we see are not usable due to restrictive terms and conditions on their websites. This may be due to organisations using boilerplate terms and conditions with built in restrictions, erring on the side of perceived risk, or wanting to protect information they perceive as having value as intellectual property to the organisation.

Over the past six months, with the help of an intellectual property lawyer, we’ve been in discussions with a number of organisations which have restrictive data licenses. We’ve explained how we’d like to use their data on OpenTrials, how their current licence prevents that use, and how different parts of the licence (e.g. non-commercial, personal use only, no redistribution) are problematic/ambiguous, and have suggested more open, permissive alternatives.

Suggestion: if you’re a data provider, we’d encourage you to follow the example of one of the organisations below and make your data licences more open – we’re happy to talk to you about the issues: [email protected]

N.B. We’re planning a detailed blog post about licensing – watch this space!


Licensing successes

We’re pleased to announce that two organisations (ISRCTN and GSK) have already changed their terms and conditions to allow far greater use of their data.

In the case of ISRCTN, this covers their trial metadata (e.g. condition, intervention, trial title, phase etc) under a CC BY 4.0 licence. In the case of GSK, this covers both their trial metadata and their collection of documents relating to trials (Protocol Summaries, Scientific Results Summaries, Protocols, and Clinical Study Reports).

We’ve just added these new sources to OpenTrials, meaning that even more trials and documents are now listed – many thanks to both organisations for showing great leadership on this issue!

Commenting on this progress, Andrew Freeman, Head of Medical Policy at GSK said:

We can and do publicly disclose information about our clinical trials on our register. Disclosure is important but it’s not enough. The value of disclosing information can be significantly limited if the information is not readily accessible and usable. To that end, we recently clarified that the use of information on our register is unrestricted provided that it may not be used in applications by others for regulatory approval of a product.


How to get involved in the meantime

If you’re keen to get involved in discussions with other OpenTrials users and open data fans, there are a number of ways you can do that while our core team is having a rest.

If you’re interested in looking at the bugs and feature requests we have (or want to file new ones!), take a look at our GitHub issues and feel free to comment on any with your insights. We also have a forum where you can discuss issues relating to OpenTrials, for instance if you know of a data source we might be interested in using, or a way of improving matches or cleaning the data. For the more technical amongst you, feel free to also contribute code and get involved in our Gitter chat room.

We’d also like to hear about any problems you have with the OpenTrials explorer and OpenTrialsFDA – use the ‘Flag an error’ link at the bottom of any page.

And lastly, use our API and/or database downloads to create tools, visualisations, and analyses + let us know what you’ve made!


What we’ve been reading

And lastly, as it’s going to be a while until we’re in touch again, here’s a bumper crop of articles from the last few months:

Trial data transparency


Reporting bias



As always, for the latest updates make sure you’re subscribed to our newsletter and follow us on Twitter @opentrials


Signing out for now from the OpenTrials team at Open Knowledge International – see you on the other side!

Open Knowledge International’s OpenTrials team, March 2017

AllTrials interview: Ben Meghreblian, OpenTrials community manager

Last month our OpenTrials community manager Ben Meghreblian sat down to talk to AllTrials campaign manager Dr Till Bruckner about the public beta launch of OpenTrials. AllTrials (http://www.alltrials.net) campaigns for all clinical trials – past, present and future – to be registered, and their methods and results to be fully reported, and thus its work is closely related to the work of OpenTrials.
In the interview, Ben explains the background and functionality of the current OpenTrials beta version and talks about what other players can do to make their data more accessible and useful for researchers.


Who do you expect to use your data, and for what ends?
Ben Meghreblian: We expect a range of users to use OpenTrials, including researchers, doctors, and patients. A researcher could find out more about a range of trials on a drug, searching by various different features such as inclusion and exclusion criteria to match a specific population. A doctor interested in critical appraisal of research papers could see if sources of bias for specific trials have already been assessed by experts. A patient interested in participating in a trial for their condition could identify trials in their geographical area which are enrolling. We’re also interested to see how policy makers and regulators may use OpenTrials to inform their work, and how data journalists and developers will use the data to write interesting stories.

How easy will the platform be to use?
OpenTrials works like a search engine, with advanced search for filtering results by criteria such as drug and disease area. From our user testing so far we haven’t come across any major usability issues, but we’re taking all feedback into account – we want to make it easy to use for a wide range of people. No special software or IT skills will be needed.

Looking at the data, what discoveries have surprised you most so far?
So far we’ve found two interesting things. Firstly, we found a number of publications on PubMed which have an incorrect trial registry ID associated with them. We have used PubMed Commons to comment on these trials, and have already received positive responses from some authors. Secondly, we have found interesting discrepancies and problems with the data on registries, and elsewhere. Some of the errors we’ve found are widespread, and concerning. Keep an eye on the OpenTrials blog for details.

What have been the biggest challenges in developing OpenTrials?
Getting the data that we need and cleaning it. We currently extract data from several different sources, each with its own structure. For example, one source might use the location name “United States of America”, another might use “USA”, and so on. We have to keep the names consistent so the user can easily find trials. This problem gets more complicated when we consider things like drug names, condition names, company names, etc.

Is there an automated function to detect inconsistencies and poor research practices such as primary outcome switching?
For a given trial, we automatically list discrepancies across different registries – for example trial status and number of trial participants. For poor research practices such as primary outcome switching, this is currently too difficult to do automatically, but the COMPare Trials project is doing a great job of manually assessing outcome switching in trials published in academic journals. RobotReviewer are doing interesting work on assessing these flaws using software alone.

You plan to score the methodological rigour of trials. How will that work?
We have been given risk of bias data from the Cochrane Schizophrenia group, which grades trials on issues such as blinding and selective reporting. For those trials, we will display this information on the corresponding OpenTrials page along with other trial information (see an example here). We hope that showing this data integrated will encourage other groups and companies to share their datasets with OpenTrials.

How will your data set differ from similar ones such as the Good Pharma Scorecard or that generated by the 2015 STAT investigation?
The Good Pharma Scorecard is an excellent window onto a small number of trials where results have been manually searched for. Charles Piller’s 2015 STAT investigation was a valuable static snapshot of registry data showing which institutions are best and worst for overdue trials. OpenTrials advanced search will allow users to conduct similar searches, across all clinical trials conducted. As the population of the database becomes more complete, it will facilitate similar audits, but where the results update live as trials are published (or not).

What data will be included?
We currently extract and display data from ClinicalTrials.gov, EU CTR, HRA, WHO ICTRP, and PubMed, and include risk of bias assessments from the Cochrane Schizophrenia group. After the launch, we plan to integrate systematic review data from Epistemonikos and other sources. There are seven additional sources of data that we’ve extracted, but can’t display because of licensing issues – we’re working with them to get permission to publish. We’ll let users know when they become available via the OpenTrials blog.

You plan to populate the database manually, but only for a small number of trials. What value will that add?
It will showcase what a perfect database would look like, and the value it can give to patients, researchers, and doctors. Additionally it will allow us to establish the amount of manual effort necessary to manually perfect the entire database.

You plan to allow third parties to submit data online. Is there a danger of players with vested financial interests submitting partial or tailored data to influence perceptions of drugs’ effectiveness?
Any submitted data will be manually approved by a researcher in the first instance. While it will be impossible to stop those with vested interests submitting altered data (something we don’t think will happen often), we hope the community of OpenTrials users will help flag any anomalies in the contributions we will host, which can then be reviewed by our team. As more information is contributed and data sets donated, we would expect outliers to become easier to spot by having many eyes on the data and triangulating information.

Can you explain what your transparency scoreboard will do?
We aren’t releasing the transparency scoreboard as part of our beta launch. We will release this at a later date, but meanwhile all our data will be open and accessible, along with an interface: we encourage others to run their own analyses, build applications, and find interesting patterns and stories in the data.

Does it make sense for several groups to build separate platforms? Wouldn’t it be better to agree on a universal data sharing standard first and then develop one definitive global platform?
Getting widespread agreement on standards may be ideal, but is notoriously hard to achieve. It is very unlikely that they would be imposed on all of the hundreds of thousands of trials already conducted. We can start linking, matching, and building, right now, and there is no sense in further delay. We would rather take the initiative and build a functional platform supported by the community. We’ve spoken to a number of groups in related fields and we are always keen to discuss shared interests and ways of collaborating in order to not reinvent the wheel!

Which is your favourite clinical trials registry, and why?
ClinicalTrials.gov is a leading registry both technically and in terms of the number of trials it contains. Their data is well-structured and well accessible. Currently there are quite a few registries with poor support for data collection, so this stands out amongst them.

What can existing clinical trials registries do to make their data more accessible and useful for researchers?
Registries could provide API access to their data, along with making sure that the trial metadata is well-structured and follows recommended standards (e.g. the WHO Trial Registration Data Set). Ideally, the metadata would follow known standards, for example using ISO codes for countries instead of their names and using a database such as MeSH to define terms like conditions names. This would make the different registries’ databases comparable. While API access is very useful, the best way a registry can offer its entire database is as a regular download, similar to what the FDA does with its OpenFDA website. This makes it much simpler for researchers who need a local copy of the database.

What can other players do to make their data more accessible and useful for researchers?
Beyond adopting standards and guidelines, we’d encourage a range of players to embrace being open by default. We’re keen to talk to any organisations or companies who want to make their data more accessible via OpenTrials.

Beyond OpenTrials, ten years from now, what information do you expect researchers to be able to access? And what information do you think will remain elusive?
There is a clear need for a step change. We need structured data about all trials, describing the methods and results, instead of only free text reports. That’s the big horizon. Basic knowledge management has been sorely lacking on clinical trials, and that makes no sense. We spend millions on individual trials. We need to spend a little on making sure the information is discoverable, machine readable, and impactful.

This interview was conducted via email by AllTrials campaign manager Dr Till Bruckner and has been reposted from the AllTrials blog.

Ben Goldacre on the beta launch of OpenTrials

Dr. Ben Goldacre shares news of the OpenTrials beta launch and provides further insights into why the team are releasing such an early prototype of the database. 

OpenTrials is a vast, ambitious project aiming to match all the publicly accessible documents and data, on all trials, side-by-side, in one place. You can read more about what we’re doing – and how we’re doing it – here, here, here, and here.

Today we have exciting news: the first publicly accessible demo of our service is now up and running, live on the internet, right here.

This is an early “beta”. Software geeks will know what that means. For the benefit of the Evidence Based Medicine geeks: “beta” means we’re sharing an early prototype, and that’s extremely important.

Because we want your feedback, early on. We want you to play with what we’ve built, and tell us what’s not working, what’s missing, what features you want, where the matches are going wrong, where the structures are confusing, and more. We are not building a monolith behind closed doors. We are following in the noble tradition of “release early, release often”, because we want you to help us keep on track, building something that meets your needs.

We also hope that you will want to get in touch. We want to bring software geeks and medical geeks together, pooling knowledge and techniques to build useful things together. So if you’d like to share some data to OpenTrials, join our user group, help us build some code modules, request features, or otherwise work with our team, then do please get in touch.

And we hope you enjoy playing with what we’ve built so far!

For a walkthrough of how to use the OpenTrials platform, Ben Meghreblian of the OpenTrials team has put together a brief video explainer below:

New Community Manager

Hi everyone, I’m Ben Meghreblian and I’m the new Community Manager for OpenTrials. I come from a background in IT and psychology, having moved to science communications and campaigns work in the last few years. Most recently I worked on the AllTrials campaign, also led by Ben Goldacre, which advocates for greater transparency in clinical trials. During my time at AllTrials I was involved in building networks of supporters, writing blogs relating to clinical trials and transparency, and supporting European lobbying efforts. My work in psychology was mostly in the clinical psychology field, researching and supporting therapeutic work in adult mental health.

As Community Manager I’ll be wearing quite a few hats, but in essence I want as many people as possible to know about OpenTrials, get them or their organisation involved with the project, listen to their feedback, and help create an amazing tool which will be used around the world! On a daily basis I’ll be talking to people online, meeting people face to face, running workshops, and attending events.

Moving forward, we’ll use this blog, Twitter, and our newsletter to deliver regular updates to keep you informed about the progress of the OpenTrials platform. These will include perspectives from people like you on why we need OpenTrials, information on user testing/feedback, and upcoming events including hackathons and workshops.

Ultimately we are building something for a community of researchers, doctors, patient groups, medical professionals and everyone else who will use OpenTrials, so it’s important we get those people’s views throughout the project. We want to understand the difficulties they face, research questions they want to ask of a platform like OpenTrials, and what opportunities there may be to design OpenTrials to make possible new things that can’t currently be done.

If you’d like to find out more, read Ben Goldacre’s recent blog or paper explaining our current vision for OpenTrials. There’s also a video of Ben talking through OpenTrials. Whichever you read or watch, please give us feedback!

Open Science Prize

Exciting news! As you may have read in the recent blog post, OpenTrialsFDA, a project we’re collaborating on with Ben Goldacre and Erick Turner, has been selected as a finalist for the Open Science prize! We’re really excited about working on a prototype and demoing it later this year. More details about the prize here, or watch Ben explain it in this 2min video we submitted as part of our successful bid.

Get involved!

Here are five ways you can get involved now:

  1. Subscribe to our newsletter – takes 10 seconds and you’ll get all the latest OpenTrials news straight to your inbox.
  2. Follow us on Twitter: @OpenTrials and RT anything relevant to your followers.
  3. Have a look at our discussion forum and see if there are any topics you can help with, or questions you have about the project.
  4. Write a short (or long) blog piece for us – tell us why OpenTrials is important to you and how you’ll use it. Drop me a line and let’s discuss.
  5. Tell your colleagues, networks, and friends about OpenTrials – we want to build a diverse network across multiple domains and around the world.

We’re currently looking for people who want to get involved by attending a hackathon (location tbc, probably in Europe) – if you’re interested in having early access to test the platform and help shape its development, please get in touch.

OpenTrialsFDA selected as finalist in Open Science Prize

Open Knowledge is thrilled to announce that the team ‘OpenTrialsFDA’ has been selected to advance our product idea into a prototype to complete for a $230,000 prize in the Open Science Prize, a global science competition to make both the outputs from science and the research process broadly accessible to the public.

OpenTrialsFDA is a collaboration between Erick Turner (a psychiatrist-researcher and transparency advocate), Dr. Ben Goldacre (Senior Clinical Research Fellow in the Centre for Evidence Based Medicine at the University of Oxford) and the team behind OpenTrials at Open Knowledge.  

OpenTrialsFDA will increase access, discoverability and opportunities for re-use of a large volume of high quality information currently hidden in user-unfriendly Federal Drug Administration (FDA) drug approval packages.  The prototype will enable academics, clinicians and researchers working with clinical trials to search and access the FDA information on clinical trials via a user-friendly web interface. The team will also produce application programming interfaces (APIs) allowing third party platforms to access, search, and present the information, thus maximising discoverability, impact, and interoperability.

The prototype will provide the academic research world with important information on clinical trials in general, improving the quality of research, and helping evidence-based treatment decisions to be properly informed, by an evidence base that is more complete and less vulnerable to “spin”.

The finalists, announced at the 7th Health Datapalooza Conference in Washington, D.C., were selected out of 96 multinational, interdisciplinary teams representing 450 innovators from 45 countries.   The Open Science Prize is a collaboration between the National Institutes of Health and the Wellcome Trust, with additional funding provided by the Howard Hughes Medical Institute of Chevy Chase, Maryland.

Final prototypes will be submitted on December 1, 2016, and will be demonstrated at an Open Science Prize Showcase to be held in early December 2016.  The public will also be invited to consider and vote online for their favourite prototype.  The ultimate Open Science Prize winner is expected to be announced in late February or early March 2017.

Contact: [email protected] 






OpenTrials paper published

We are pleased to announce the publication of a new paper on OpenTrials, by Ben Goldacre and Jonathan Gray in Trials.

The paper, ‘OpenTrials: towards a collaborative open database of all available information on all clinical trials’ outlines the ambitions of the OpenTrials project.

Read more about OpenTrials in Ben Goldacre’s guest blog on BioMed Central: OpenTrials: what, why and how?

OpenTrials is a collaborative and open linked database for all available structured data and document on all clinical trials, threaded together by an individual trial.  Where other projects have set out manually and perfectly curate a narrow range of information on a smaller number of trials, we aim to use a broader range of techniques and attempt to match a very large quantity of information on all trials.  We are currently seeking feedback and additional sources of structured data.

The paper gives an overview of:

  • the data schema
  • the types of documents and data included
  • populating the database
  • presenting the data
  • some use cases
  • open data in medicine
  • intellectual property and privacy
  • practical issues

Follow the project on Twitter: @opentrials

Are you interested in being involved in the project as a:

  • data partner
  • funding partner
  • community partner
  • user?

Sign up via our website: https://opentrials.net/

OpenTrials Technical Roadmap

Hello fans of evidence-based medicine and open data! We’d like to officially announce that the technical work for OpenTrials is underway. Read on for an overview of what we are doing, how we are doing it, and the current roadmap for the first half of 2016. If you have any questions or comments, do not hesitate to ask on Twitter via @opentrials or by email at [email protected].

Technical team

Technical work on OpenTrials is conducted by Open Knowledge International. While different people may come into the project at different times, the current technical team is:

Feel free to reach out to any of us at OpenTrials on GitHub.


As with all Open Knowledge International projects, we welcome and encourage contribution. For the technical work on OpenTrials, contributions can mean any or all of code, documentation, testing, etc. See the OpenTrials issue tracker for interesting tasks to take up.


OpenTrials aims to provide a comprehensive picture of the data and documents on all clinical trials conducted on medicines and other treatments. The platform will present data aggregated from a wide variety of existing sources, starting with clinical trial registers and moving on to academic journals, systematic reviews and other data sources. See Ben Goldacre’s video for greater context. Here, we are are focused on the technical aspects of the OpenTrials platform.


Let’s start with a look at the general architecture of the OpenTrials platform. This is a high-level overview, describing the general data flow, and the relation between different components.


OpenTrials will be implemented as a set of loosely coupled components, from data acquisition through to user-facing applications:

  • Extractors and transformers get data from 3rd party sources and process it in our data warehouse, performing tasks from general cleanup through to record matching.
  • Processed data is written to both a file system and a database. The file system hold documents associated with records in the database, as well as flat file representations of data for direct access. The database is a normalized SQL representation of the data warehouse, plus a search index, each powering an HTTP API.
  • A set of user-facing applications sit on top of the API to provide views on the data.

Of course, there are many details inside each component as described in the above architecture diagram. We plan to blog regularly as we develop the platform, and give deeper insight into each component and our strategies.

Data model

We are not setting out to design a perfectly formed vocabulary around trial data. Rather, we accept that the data itself is messy, inconsistent, divergent and non-standard, and we set out to increase the value of this data by threading it together based on a range of matching techniques, and by extracting a set of relations between various entities that are manifest in the data itself.


The above diagram centers around our “ideal” representation of a trial, which is derived from various sources of data on a trial, starting with the trial records published on clinical trial registers. This “ideal” representation has a minimal set of core, pre-defined fields based on the WHO Data Set, and a less structured set of associated data making up the graph of everything the OpenTrials platform knows on a given trial.

  • (left) A range of data sources with information about trials.
  • (middle) Our “ideal” representation of a trial, which is made from the core facts we can extract from the various data sources.
  • (right) Key information that will be extracted from trials and used for data exploration (Examples: Find all trials conducted in Location X; Filter trials by Organization Y).

Technology stack

OpenTrials will be written in Javascript and Python, being the core languages used at Open Knowledge International, and the most common languages used in the open data sector.

The majority of web-facing code will be in Javascript, using Node.js for servers, and either React or Angular for clients. The OpenTrials API will be a Node.js server implementing an OpenAPI-compatible HTTP API exposing data in JSON.

Significant portions of the platform are not web facing, and are concerned with data acquisition and processing. The majority of this code will be in Python, leveraging the extensive ecosystem of data processing tooling it offers.

For databases, OpenTrials will use Elasticsearch and PostgreSQL. Both of these solutions have been chosen based on previous experience, and the flexibility that each offering brings to data storage (Elasticsearch is much more than “just” a search backend, and PostgreSQL is much more than “just” an SQL backend).

3rd party integrations

We are working towards a number of 3rd party integrations for OpenTrials, and we expect these type of integrations to increase over 2016. Some of the first integrations include:

  • PubMed (link): A range of data types to match onto clinical trials.
  • ContentMine (link): Matching academic journal publications to clinical trials.
  • Epistemonikos (link): Summaries and systematic reviews on clinical trials.
  • Document Cloud (link): OCR and text editing for scanned documents already in OpenTrials.


We use several terms in the roadmap to describe the various components of the OpenTrials platform. For ease of understanding, here’s a short glossary explaining these terms in this context.

  • Scrapers: A set of processes run at semi-regular intervals to acquire data in an automated fashion from 3rd parties, and cache the data in the Warehouse. Not all data is technically “scraped” – some is acquired via API or bulk dump access.
  • Warehouse: A database used as a staging area for data processing before creating publicly consumable data.
  • Datastore: A flat file system with OpenTrials data stored as Data Packages.
  • Database: The public database for OpenTrials with cleaned and reconciled data for reading, exposed via the API.
  • API: The HTTP API for for data stored in the Database.
  • Apps: The publicly accessible applications exposing views on the data stored in the OpenTrials database. Currently, we aim to develop 3 apps:
    • OpenTrials: The main portal for discovery and exploration of clinical trial data.
    • Trial Finder: An app to assist patients and health carers in finding relevant clinical trials.
    • Schizophrenia: An app that uses the OpenTrials API to deep dive into clinical trial data around Schizophrenia.


Here we’ll present a high-level view of shape and development of the OpenTrials platform. We practice agile development at Open Knowledge International, so do not think of this roadmap as a strict plan of action, but rather, as a document that reflects our current thinking and estimates, and is subject to change.

February – March 2016


  • Seed the Warehouse with core data from clinical trial registers and existing dictionaries for interventions and conditions. See the scrapers overview for details and work status.
  • Create the initial Datastore (data as flat files) and Database for the processed data, implementing a loosely structured data model threading trial records from different registers, and related lookup tables for Intervention and Condition entities. The initial threading of trial records will be based on simple de-duplication and record matching strategies using unique identifiers and titles.
  • High-level requirements for robust de-duplication and record matching strategies across trials, interventions and conditions.
  • High-level requirements for the API design, accompanied by a working prototype based on current state of the Database.
  • High-level requirements and UI/X skeletons for the OpenTrials App, accompanied by a working prototype.
  • High-level requirements for matching sources such as journal articles, systematic reviews, and clinical study reports onto clinical trial records, accompanied by some prototype code. A range of target data sources and services are currently on our radar, from PubMed and OpenAccess APIs, through to Content Mine and Epistemonikos.

Contact [email protected] if you have a particular interest in any of the efforts highlighted above, and would like to contribute!


  • Early read-only access to the Warehouse for interested parties to explore this data through a standard SQL interface.
  • Community feedback on documentation and prototypes for technical design and user interfaces.

April – May 2016


  • Data acquisition from expanded data sources as identified in previous period. Flow from Warehouse to Datastore to Database to API.
  • Implementation of robust de-duplication and record matching strategies towards increasing the matching success rate over Trials, Sources, Conditions, and Interventions in our Database.
  • Diff implementation over clinical trial records, exposing record discrepancies as published on public registers, accessible as data via API and App.
  • Iteration on the OpenTrials Database and API as we integrate new data sources and record matching strategies.
  • Iteration on the OpenTrials App prototype, towards a beta version that exposes all data available from the OpenTrials API.
  • High-level requirements and UI/X skeletons for the Trial Finder App.
  • High-level requirements and UI/X skeletons for the Schizophrenia App.
  • High-level requirements and UI/X skeletons for de-duplication and record matching via crowdsourcing in the OpenTrials App.
  • Outline a range of potential problem sets and ideas for data apps that could be build on the OpenTrials API, for use in a hackathon.

Contact [email protected] if you have a particular interest in any of the efforts highlighted above, and would like to contribute!


  • Access for early adopters to a beta release of the OpenTrials App.
  • Expand data sources.
  • Increase data quality and quantity exposed via APIs.
  • Hackathon in May 2016: Desirable locations being London or Berlin.
  • Community feedback on documentation and prototypes for technical design and user interfaces.

June – July 2016


  • Continued iteration on data acquisition, record matching, and data modeling -> APIs.
  • User identity management for APIs and Apps.
  • Prototype implementation of user interface for de-duplication and reconciliation.
  • Prototype implementation of Trial Finder App.
  • Prototype implementation of Schizophrenia App.
  • Possible data acquisition targeted for use in the Schizophrenia App.

Contact [email protected] if you have a particular interest in any of the efforts highlighted above, and would like to contribute!


  • Prototype versions of Trial Finder and Schizophrenia apps for testing.
  • User accounts, which in turn will enable user-specific actions like saving searches.


Development of OpenTrials will continue throughout 2016 with the broad goals of expanding the database and exposing interfaces for crowdsourcing mechanisms to contribute new data and clean existing data. As the year progresses, we can solidify the roadmap for Q2 2016 based on actual development status, our improved understanding of user needs, and new opportunities around data.


  • Friendly interfaces for individuals and small groups to directly add data to the OpenTrials Warehouse. This data can then enter queues for processing by domain experts to match onto Trials (existing or previously unknown to OpenTrials).
  • Significant data contributions from new data partners.
  • Increased linkage to 3rd party data sources that are directly relevant to clinical trials, but beyond the scope of OpenTrials. An example would be providing links to external sources of IPD data for a given trial.

Contact [email protected] if you have a particular interest in any of the efforts highlighted above, and would like to contribute!


  • A stable release of each App by end of 2016.
  • A stable release of the API by the end of 2016.

Are you an expert on Clinical Trials? OpenTrials is seeking a Community Manager

Open Knowledge International is seeking a Community Manager to work on developing the community around the OpenTrials project: https://opentrials.net

The Community Manager for OpenTrials will have a working understanding of what clinical trials are, how they are run, and how they are reported; the different kinds of documents and data associated with clinical trials; and the current barriers to accessing them.

We are looking for someone self-driven and organised with expert community management and communications skills, who has the ability to engage with community members at all levels – from policy makers to developers working in the health sector.

This is a part-time position (2 days per week) available immediately. This is a fixed term contract of six months with the possibility of extension. Early application is encouraged, as we are looking to fill the position as soon as possible. The vacancy will close when we find a suitable candidate


Rufus Pollock presents on OpenTrials at “Publishing Better Science through Better Data”

At a recent Nature Publishing event, “Publishing Better Science through Better Data” in October 2015, Rufus Pollock, the Founder of Open Knowledge International, gave an update on the OpenTrials project, outlining the importance of providing a comprehensive picture of the data and documents on the world’s medical trials and the role of Open Knowledge International in the project in under 15 minutes. http://www.nature.com/openresearch/publishing-better-science-through-better-data-15-presentation-videos-and-slides 

With thanks to the Nature Publishing Group.

Want to get involved in OpenTrials?
Sign up here: https://opentrials.net/
Follow us: @opentrials
Or email us directly [email protected]

Open Knowledge Announce Plans for Open, Online Database of Clinical Trials

Open Knowledge today announced plans to develop Open Trials, an open, online database of information about the world’s clinical research trials funded by The Laura and John Arnold Foundation. The project, which is designed to increase transparency and improve access to research, will be directed by Dr. Ben Goldacre, an internationally known leader on clinical transparency.

Open Trials will aggregate information from a wide variety of existing sources in order to provide a comprehensive picture of the data and documents related to all trials of medicines and other treatments around the world. Conducted in partnership with the Center for Open Science and supported by the Center’s Open Science Framework, the project will also track whether essential information about clinical trials is transparent and publicly accessible so as to improve understanding of whether specific treatments are effective and safe.

“There have been numerous positive statements about the need for greater transparency on information about clinical trials, over many years, but it has been almost impossible to track and audit exactly what is missing,” Dr. Goldacre, the project’s Chief Investigator and a Senior Clinical Research Fellow in the Centre for Evidence Based Medicine at the University of Oxford, explained. “This project aims to draw together everything that is known around each clinical trial. The end product will provide valuable information for patients, doctors, researchers, and policymakers—not just on individual trials, but also on how whole sectors, researchers, companies, and funders are performing. It will show who is failing to share information appropriately, who is doing well, and how standards can be improved.”

Patients, doctors, researchers, and policymakers use the evidence from clinical trials to make informed decisions about which treatments are best. But studies show that roughly half of all clinical trial results are not published, with positive results published twice as often as negative results. In addition, much of the important information about the methods and findings of clinical trials is only made available outside the normal indexes of academic journals.

“This project will help to shed light on both good and bad practices by the sponsors of clinical trials,” Stuart Buck, LJAF Vice President of Research Integrity, explained. “If those sponsors become more transparent about their successes and failures, medical science will advance more quickly, thus benefitting patients’ health.”

“We are thrilled to partner with Open Knowledge on the use of the Open Science Framework (OSF) for this project. Open Trials is a great example of how the free, open source OSF infrastructure can be utilized by the community in different ways to increase transparency in scientific research,” Andrew Sallans, Center for Open Science Partnerships Lead, explained.

Open Trials will help to automatically identify which trial results have not been disclosed by matching registry data on trials that have been conducted against documents containing trial results. This will facilitate routine public audit of undisclosed results. It will also improve discoverability of other documents around clinical trials, which will be indexed and, in some cases, hosted. Lastly, it will help improve recruitment for clinical trials by making information and commentary on ongoing trials more accessible.

“This is an incredible opportunity to identify which trial results are being withheld,” Rufus Pollock, President and Founder of Open Knowledge, explained. “It is the perfect example of a project where opening up data and presenting it in a usable form will have a direct impact—it can literally save lives. We’re absolutely delighted to partner with Ben Goldacre, a leading expert and advocate in this space, as well as with the Center for Open Science and LJAF to conduct this groundbreaking work.”

The first phase of the Open Trials project is scheduled for completion in March 2017. For project updates, please follow @opentrials on twitter or get in touch with us at [email protected].