Background

See a full video about the project from Ben Goldacre here

The project has received Phase One funding from Laura and John Arnold Foundation, to Open Knowledge and the Centre for Open Science, with Ben Goldacre as principal investigator.  User engagement, database design, front-end design and coding will be carried out by Open Knowledge and the back-end database is provided by the Centre for Open Science. We have a small steering committee meeting regularly for the daily running of the project and a larger Advisory Board for a wide range of users and stakeholders for intermittent guidance on build, strategic direction, and sustainability. In terms of outcome measures, we have targets for the quantity of data imported, the number of active users, and also policy impacts, such as raised expectations for access to documents and structured open data on clinical trials.

 

What is the scope of what OpenTrials is collecting?

We are collecting all publicly available information about trials.

Which drug areas will the OpenTrials database include?

We aim to include all randomised trials on all interventions, drug or non-drug.

What type of linked data will be included in OpenTrials?

Our aim is to ‘thread’ together all documents about one trial in one place.

Which registers will you be scraping/acquiring data from?

We currently extract and display data from ClinicalTrials.gov, EU CTR, HRA, WHO ICTRP, and PubMed, and will include risk of bias assessments from the Cochrane Schizophrenia group in the next few weeks. After the launch, we plan to integrate systematic review data from Epistemonikos and other sources. There are 7 additional sources of data that we’ve extracted, but can’t display because of licensing issues – we’re working with them to get permission to publish. We’ll keep updating the OpenTrials blog as they become available.

Do you know of others we should look at?  Please feel free to add directly to the discuss forum and our team will look into it, or email us directly at [email protected]

Will you include patient information in the database?

We don’t plan to include information about individual patients (“individual patient data”) on the OpenTrials database.  Others are doing this job and it is beyond the scope of OpenTrials. We will link out to existing IPD hosted externally.  We plan to only host information that is already publicly available, and to attempt to make more information publicly available. IPD often presents privacy risks that mean it cannot be simply posted online.

What will happen when you find studies that have later found to be fraudulent or where people have raised concerns?

We will link to publicly available commentary where possible in these cases.

How will OpenTrials deal with duplicate records?

Detecting duplicate clinical trial records is an area of ongoing scientific research, and we aim to work with researchers to use state-of-the-art techniques to find such records.

What is your record linkage strategy?

We will match where possible on clinical trial register IDs, and using probabilistic record linkage techniques on various aspects of study design. This is an area of ongoing work and research.

How will you present the data to the end user?

We have developed prototype presentations of the data for different audiences and are currently running a series of user engagement workshops to improve these.  Initial views are focused on: search; researchers’ needs for individual trials; patients’ needs for individual trials; and overviews of performance metrics, which include transparency metrics on how much information is available for various classes of trial by sponsor, site, etc.  These presentations of the data are in the video on OpenTrials.net and we are happy to receive feedback.

Will all of your content be machine-readable?

Yes, we aim to provide all data in open, machine-readable formats, supported by an API.  For original documents that we have collected that are not already machine-readable (e.g. PDF scans of trial documents), we aim to ensure that we generate machine-readable versions where possible using an  OCR tool like DocumentCloud.

Are we able to see the technical roadmap?

Our current roadmap for the first half of 2016 is available from this page.

What is your methodology for discrepancies?

When a trial is registered in multiple registries, its data can get out of sync. For example, the same trial can be marked as complete on ClinicalTrials.gov, but ongoing on WHO ICTRP. We can’t know for sure which value is correct. We label these cases as discrepant to let our users know they have to be extra careful when using this information.

Do you consider all trial’s attributes when calculating discrepancies?
No. We want to keep the number of irrelevant discrepancies as low as possible. To do so, we selected the trial’s attributes that are both comparable and relevant. For an attribute to be comparable, it needs to have consistent values among our sources, which isn’t the case for most textual attributes (e.g. title, sponsor names, location names, etc.). Relevancy is more difficult to gauge, as it depends on who you ask. For each attribute, we asked ourselves: “Would I want to research deeper if a trial has discrepancies on this specific field?”.

Considering this criteria, we defined the following set of attributes:

  • Gender of participants
  • Planned number of participants
  • Status
  • Recruitment status
  • Whether it has published results

This allow us to reduce the number of irrelevant discrepancies, while still providing important informations for our users. This list will keep changing as we improve our database cleaning process and become able to add new attributes.

How do you calculate discrepancies?
For each trial, we check if each of its sources have the same value for all the attributes we consider when calculating discrepancies (see above). Sources that have no data on a specific attribute are ignored when calculating discrepancies on that value (but are considered on other attributes from that source are considered).

Do you consider all sources when calculating discrepancies?
No. We don’t consider trials from EU CTR when looking for discrepancies because they track trials recruiting from multiple countries as different trials, each with its own planned number of participants, status, etc. This causes many false positives (non-discrepant trials marked as discrepant), so we’re ignoring EU CTR for now. This might change in the future.

 

Who is on your Advisory Board?

Our Advisory Board members can be found here.