Open Paleo is an open project that anyone can contribute to on GitHub. All data sources, methods, code, and results are openly shared for collaboration and inspection as the project evolves.

We strongly encourage others to participate in the project, propose their own ideas, and to contribute or re-use any of the data or other information available here.

Project concept

The idea of this project is to perform a range of meta-analyses into the published Palaeontology literature. This will include looking at factors such as:

Ultimately, this information might prove useful in developing standards, protocols, and best practices for palaeontological research and publishing.

Data sources

Google Scholar - COMPLETED

Journal selection was for the top-20 cited Paleontology journals according to Google Scholar.


Metadata were extracted from Scopus journal-by-journal (as csv files), with the only filter being on the dates, constrained to published articles between 2015-2016. This includes information such as:

  • Authors, titles, and year of publication.
  • Number of citations (according to Scopus).
  • Article Digital Object Identifier (DOI).

Clean data - COMPLETED

Using Visdat R package to visually inspect the data, we were able to spot the misaligned rows and block shifted columns. These formatting errors were then fixed in MS-Excel and saved again in CSV format with UTF-8 encoding. Following this, the headers were formatted for user friendliness during analysis and the empty rows and columns were scrubbed off the data using Janitor R package.


Data for PLOS ONE were obtained using the Rplos package in R. The code, resulting data, and Unpaywall query results can all be found here. Note that some of the data here are different to that obtained to Scopus queries.

Unpaywall - COMPLETED

The next phase is to use the Unpaywall DOI checker on the DOI list for each journal. This provides information such as:

  • The Open Access state (true or false)
  • Publication date
  • Source of evidence for Open Access status

All of the results of these steps are available within this repository.

Google Scholar - IN PREP

While Unpaywall checks to see if legitimate versions of articles have been made OA (i.e., via green self-archiving routes), researchers often also often tend to share their articles in non-copyright compliant ways. This includes on platforms such as ResearchGate or

Therefore, data will be cross-checked with Google Scholar, which has this information at an article-level, to see:

  • Whether articles are freely available;
  • Which versions are available;
  • Which services or platforms are most used.

Wikidata / WikiCite - IN PREP

WikiCite provides a lot of integrated data around scholarly literature, linking research papers with authors, topics, species, and much, much more. All data is CCZero and integrates many online resources. Scholia gives an idea what it can do for paleontology.


This website is licensed under an MIT media license. Theme is flaty. Source code can be found at Created by Jon Tennant.