Skip to Main Content
Library Guides

Research Data Management

Research Data Management

Selection, Preservation and Sharing: selection of data, preservation and repositories, data sharing and open data

There are several steps to preparing your data and sharing it. Decisions around sharing data and prepare the files and metadata for deposit may take time, resource and funding, so it is advisable to think about these considerations as early on as possible and try to include any costs into funding or grant proposals.

The steps below represent the main considerations. Just click on each one for further information and links. You can always speak to your Information Specialist for a chat around your options.


Research funders and sponsors may have their own requirements for what happens to the data throughout the project, what should be shared, retention of data and repository.

You can check major UK funder requirements via the Sherpa Juliet tool.

You may want to check any data transfer agreements that you may have signed to see what you are allowed to do with the data after the end of the project, and discuss this with the relevant parties if you are unsure.

The default at the end of a project always used to be 'delete / destroy', particularly for personal information. However the landscape is changing and researchers increasingly are recognising the need to consider the cost of data collection, potential re-use and value of preserving data.

Historically, researchers have shared data via personal requests, but keeping hold of all your research data can be difficult to manage. Data archives and sharing services have the benefit of managing your data, and the access rights to it, for you.

Sharing data can have a variety of benefits for the researcher. This video from University of Bath is a short overview of what some of those benefits may be.

 

However, this comes with additional responsibility to plan for the preservation and sharing of data. The University of Edinburgh has a GDPR video  where (starting at 2:24) the researcher neatly discusses the changing landscape of research data in the social sciences and some of the challenges involved with making data more open.

The University of Plymouth's Research Data Policy stipulates that:

Research data that supports published research findings or is of long-term value is considered for open deposit

It acknowledges the the UKRI Concordat on Open Research Data, which states:

 "not all research data can be open and [...] that access may need to be managed in order to maintain confidentiality, guard against unreasonable cost, protect individuals’ privacy, respect consent terms, as well as managing security or other risks". However, "any restrictions must be justified and justifiable".

In other words, research data should be made "As open as possible, as closed as necessary" (Horizon 2020).


Some considerations you may wish to take into account when thinking about what to preserve and share (adapted from Open University, 2016):

  • Are there any ethical issues to consider? Have you gained consent from your research participants to archive, share or reuse the data?
     
  • Do the data have special scientific or historical value? Does evidence of current research in your field suggest that your data will be important in the future?
     
  • Are the data unique? Would the information derived from your dataset be at risk if the dataset were lost?
     
  • Do the data have a high re-use potential? Are the data likely to be of broad interest? Has their reliability been assured?
     
  • Can the data be easily reproduced? Would it be feasible to replicate the data? Would it be financially viable?
     
  • Is there a strong economic case for preservation? Are the estimated costs related with data curation justifiable when you consider the potential future benefit?
     
  • Are the data in support of a patent application? It is necessary to retain data in support of patent applications because we may be called upon to defend our patents in court and the original research data can be critical in this process. Policies and requirements impacting on data selection

  • Data centre policy If you are going to deposit your data into a subject-based data centre, subject-specific evaluation criteria may apply. Where this is the case you should follow the guidance provided by the data centre in question.

  • Academic publisher requirements Increasingly, academic publishers also require data which underpins a publication to be retained and shared. Check with your publisher if this is the case, as they may have requirements for where it is stored and for how long.
     
  • Intellectual Property- who owns the data from your research? Are there any agreements that mean that you are not the owner

 

At the end of the day, the decision of what data to retain and for how long is not always clear cut and straightforward. By considering the above and documenting the justification of your decisions in your data management plan, you can ensure that the treatment of your data is as transparent as possible, should anyone wish to interrogate it.

For retention, unless your funder or publisher stipulates otherwise,  the University Research Data Policy recommends that:

"​Research data that supports published research findings or is of long-term value is retained for a minimum of 10 years from collection or creation of the data or publication of the research results (whichever is the latter)."

"Research data is retained for longer than 10 years where an increased retention period is required to meet legal, statutory, contractual or funder requirements."

Personal Data

Considerations before sharing data

Personal data must be managed in accordance with EU and UK data protection legislation. You can only share personal data gathered for research purposes with other researchers if you have informed consent for data sharing. Your information sheets and consent form should tell study participants how their data will be shared, and with whom. The UK Data Service RDM guidance includes guidance on drafting of consent forms and information sheets. 

Options for data sharing

  • you should full anonymise your datasets but bear in mind that this might not be as simple as removing names and addresses. The UK Data Service provides detailed guidance on anonymisation of qualitative and quantitative data. 

  • you can restrict access to bona fide researchers and subject to terms and conditions of use. 

  • if you restrict access to the data, this should be justified in your Data Access Statement

Sensitive data

In addition to identifiable personal data, examples of when data might be too sensitive to share openly are: 

  • data on the location of endangered species or fragile ecosystems; 

  • data on organisations; 

  • data arising from experiments using animals; 

  • the location of GM crops.

Options for sharing data

  • You can redact or restrict parts of the datasets to make them less sensitive; 

  • You can restrict access to bona fide researchers subject to terms and conditions of use. 

  • If you restrict access to the data, this should be justified in your data access statement.

Intellectual Property

You must not share data if you do not have the right to do so, for example: 

  • the data are owned by a third party and your licence to use the data does not allow you to share them; 

  • the Intellectual Property (which includes data) is assigned to collaborators or the funder in your contract. 

Options for sharing

  • make sure that you understand the Intellectual Property Rights assigned within your funding contract or collaboration agreement and understand how this relates to data sharing in your project. You can contact the University of Plymouth's Specialist Advisor for Intellectual Property Support for advice on Intellectual Property Rights and your research contracts. 

  • if this is the reason that you are not able to share data from your study this needs to be stated in your data access statement.

Contractual Obligations

You must not share your data if this would be in breach of your funding contract or collaboration agreement. This situation is most likely to arise if you are working with an industrial collaborator. 

Options for sharing

  • Check your contracts carefully before sharing data; 

  • You may be able to negotiate data sharing once any Intellectual Property arising from the study has been commercialised. 

  • You may be able to negotiate data sharing with restricted access and subject to non-disclosure agreements. 

(Adapted from University of Bath, 2016)

There are various ways to share data and make it accessible for future use. These are outlined here:

University of Plymouth PEARL Repository

Our internal PEARL repository is a free service that you can use to deposit your research data. It can handle any amount of output, but currently has a limit on file size of 2gb. The data will be given a unique identifier and can be linked to other relevant works, such as articles. The record can also be harvested by other services and aggregators, to make your research more discoverable by other researchers.

If you choose not to deposit into PEARL, it is still recommended to create a record for the item using Elements and link it to the location of your dataset so that your work can still be part of the university's collection. Further guidance on depositing data into Elements can be found below.

To find out more, please Information Specialist who will be able to further advise.

Funder Data Centre

Your funder, such as NERC or ESRC, may have a data centre that they wish you to deposit your work into. If you are funded by one of these,  you should use the data centre recommended by them and follow their requirements for retention.

External Open Access Data Repository

There are many options for external repositories. You will need to ensure that the repository offers the services you require, particularly if you need long-term preservation of some data and this may also incur costs that should be factored into grant proposals. These services may also offer some free storage and some paid, or have limits on storage.

The University of Sheffield have compiled a helpful list of generic, national and subject, specific research data repositories. They have also included information on publisher recommended repositories and funder recommendations.

Your department may also have their own recommendations. For example, the School of Psychology have produced their own framework for open Science and created guidelines for pre-registering research and data via Zenodo.

You can also browse repositories using the Registry of Research Data Repositories (Re3data).

Sharing via Journal

Publishers may also request data to be shared for peer review, or ask for the data to be deposited along with a written publication. They may have their own data services or recommend repositories to use. If you are publishing from your data, it is worth checking the policy of the journals you are thinking of publishing in, to ensure that you are familiar with their data policy and the requirements of any repository they require you to deposit data into. This may need additional consideration if you are not the owner of your data, or have limited rights over the data.

The University of Sheffield (2019) have compiled this handy (not exhaustive) list of publishers recommended repositories:

  • CUP Recommends the use of Dryad
  • Elsevier list repositories supporting bidirectional linking with Elsevier publications
  • Oxford Journals – look for the data archiving directions of the individual journal
  • PLoS recommended repositories
  • Royal Society policy on data sharing and mining
  • Scientific Data (Nature.com) recommended data repositories
  • Springer policy on data availability and list of recommended repositories
  • Taylor & Francis Supplemental material
  • Ubiquity Press recommend their Dataverse repository or Dryad
  • Wiley Data Sharing Service

You may also wish to publish your data in a specialist data journal. These allow you to publish the data from your research as a distinct asset, within a journal structure. Like a repository, they will create a DOI for the work and provide information enabling the data to be cited.

Don't forget to add accurate and complete metadata to your dataset when depositing it into the repository. This metadata will enable your work to be more discoverable and searchable, potentially leading to greater recognition for your work.

You also need to remember to add appropriate documentation to allow your (future) self and others to reproduce, understand and re-use the dataset.

For further information on this, please refer to our pages on 'Metadata and Documentation'.

Once you have made the decision to share your data, it is a good idea to ensure that others are clear on what they can do with your data and encourage them to use it for these purposes. You may also need to place restrictions on the use of the data, either for your own reasons or as a requirement of other stakeholders. The most basic methods of controlling use of your data are via licenses and embargoes.

Open licenses 

These can be applied to your work to communicate the allowances and restrictions around use of the data. Creative Commons licenses are the most common, but depending on your discipline (for example software) it may be more appropriate to choose a different sort of Open License, such as an Apache, MIT, GNU or Mozilla license. Repositories often provide options of licenses you can use. Some of the most common ones can be found below.

Creative Commons

This is the most commonly used type of license applied to research. It has a legal code but also plainly states what people can and cannot do with a piece of work. It is built upon a few considerations, such as ability to adapt and allowing non-commercial use or derivatives to be made. Each license has a Legal Code so it is legally binding, a Commons Deed to make it easily understandable and a Machine Readable layer so that technology can identify the license. 

It is widely adopted, easy to understand and suitable for most types of work. However, it is not suitable for software

Example: 

Creative Commons Licence
This work is licensed under a Creative Commons Attribution 4.0 International License.

Find out more: 
List of Creative Commons Licenses
Creative Commons License Builder
Full considerations of Creative Commons Licenses

Software/hardware 

Software and hardware have their own set of open source licenses that differ by levels of what is permissible and which are adopted widely by different communities. The Choose Your License tool is useful to help determine the best license for your work.

They also have a handy summary of licenses.

Or try this interactive license selection tool.

Other Resources

The UK Data Curation Centre has produced a useful PDF explaining data licensing and different types of data license:  
DCC How to License Research Data


Embargoes

You may need to keep your data private for a period of time, to allow for publications arising from the data, patent applications or commercial application of your findings. In order to allow for this, you can set an embargo upon the work, which will allow the data to be discovered but will close access to the actual dataset until the specified date. 

If you are funded, your funder may allow an embargo but specify a maximum period of time that access can be closed for. 

Certain publishers may also stipulate that data should be made open within a specific period of time, so it is worth checking your journal's policy on data and making sure that it aligns with your interests and funder policy.

Most repositories, including the University of Plymouth PEARL repository, will allow you to choose a period of embargo according to your needs.

Data Citation

Data citation allows others to attribute use of data to the authors/creators. This could be where data is referred to, or has been used to create something new. However, data citation also allows data to be linked to related outputs, such as articles or artefacts. Additionally, there are currently projects ongoing to increase the links and relationships between outputs and datasets, which rely on good data citation practices. Publishers may also have their own policy on how to cite data within their publication.

We recommend:

1. Add a data statement to your manuscript.

This will tell people where to locate the data relating to your outputs. The statement may take a number of forms depending on the availability of the data, accessibility and permissions. See below this list of steps for a guide to statements you can use in a number of situations.This will let people know where the data relating to your manuscript is. 

2. Create a record for your dataset in the repository/ deposit the data in the PEARL repository

Making use of the PEARL repository or creating a record within PEARL for data deposited elsewhere enhances visibility of the dataset. 

3. Citing your data and asking others to cite it in the following format:

Creator (PublicationYear). Title. Publisher. Identifier

If you need or wish to include information about Version and Resource Type, the recommended form is as follows

Creator (PublicationYear). Title. Version. Publisher. Resource Type. Identifier

Once your data is deposited in a repository and issued with a DOI, you can begin to promote your data. Here are a few tips for getting the most impact out of the hard work you've been putting into managing and sharing your data:

1. Consider publishing in a data journal.

Peer-reviewed data journals can help to promote your work, facilitating exploration of datasets for researchers.

The blogpost Introducing the Data Paper... (Moody, 2017) gives a neat introduction to Data Journals.
The University of Edinburgh's sources of dataset peer review resource lists Data Journals and their policies, which may be a good starting point.

2. Always cite your data!

Data citation, alongside allocation of DOI to a dataset, allows use of that data to be tracked. There are various projects aiming to improve discoverability and measurement of use of data sets. This depends on accurate and consistent data citation and signposting to other relevant publications via citation, so that anyone discovering the publication can also discover the dataset and vice versa. 

3. Check to see if your repository promotes your data.

Some repositories promote datasets as part of their services, so it is well worth just seeing whether your preferred repository will do this. For example, the UK CESSDA data archive state:

"The 4 most recently published data are listed on the UKDS homepage as ‘Latest data’ (life feed from the catalogue). We send weekly updates of newly released datasets to our user community. Selected individual ReShare datasets are promoted via Twitter."

4. Use your data in your teaching.

Consider using your data in your teaching and as a resource for teaching data manipulation/ analysis too. This can help to build understanding of open scholarship, data analysis and data management skills in your students.

The Foster Open Science resource has a module on how to use Open Data in teaching, which may be helpful.

5. Use social media and online dissemination to promote your research.

Many researchers are already exploiting online tools to disseminate and promote their research, as well as create greater social impact through engaging with academics and non-academics about their research. Why not try this with your data too?

You might wish to try:

  • Writing a blog post about the research and the process of collecting the data. Cite the data within this.
  • Tweeting or writing a Facebook post and refer to the dataset.
  • Talk to our Digital and Content Team and our Multimedia Team to see if they can help promote the research and dataset, through articles, press releases or creating multimedia to aid wider dissemination.

Don't forget to cite the dataset whenever you are talking about your research!

6. Track use of your data.

While this is a growing field, thanks to work on citations and DOIs by a variety of organisations, there are ways you can begin to track use of your data.

Altmetric: Measures views, downloads, citations, social media mentions, news/media outlet mentions, use in literature management tools.

Impact Story: Similar to above, tracks social media, news/media outlets, blogs etc.

PlumX: Similar to the above, integrated into Scopus. Automatic authentication on Campus, or access via DLE and library databases.

Data Cite: Shows views and downloads of data from aggregated repositories

(Adapted from CESSDA ERIC RDM Guide

 

Depositing Data in Elements

See also the Don't Dump, Deposit training run in Summer 2021.

Example Data Access Statements

"All data created during this research is openly available from [insert repository name / The University of Plymouth PEARL repository] at [insert DOI here]."

"All data supporting this study is provided as supplementary information accompanying this paper."

"All data is provided in full in the results section of this paper."

"The research materials supporting this publication can be accessed at [insert URL here]."

"The research materials supporting this publication can be publically accessed in [insert repository/data centre/archive here] via [insert Persistent Identifier here]. The research materials are available under a [insert licence here]."

You can cite multiple datasets within the references section of the paper as you can cite data as you would a paper. You can also make a single archive record that has links to all of the datasets used for the publication (if they are archived in different locations). 

 

"This publication is supported by multiple datasets, which are openly available at locations cited in the reference section."

If you are using secondary data or data owned by a third party you are unlikely to be able to archive or share the raw data. When using secondary data you should provide information on the sources of the data and access arrangements. If you are using numerous data sources you should consider archiving a document that summarises all of the data sources and access arrangements, along with information about your data processing methods. 

 

"This study was a re-analysis of existing data, which is openly available at locations cited in the reference section. Further documentation about data processing are available at [the University of Plymouth PEARL repository/ insert repository name] at [insert DOI here]."

"This study brought together existing data obtained upon request and subject to license restrictions from a number of different sources. Full details of how these datasets were obtained are available in the documentation available at [insert DOI here]."

 If there are legal or ethical justifications for not sharing your data openly you should provide this justification in your Data Access Statement and give information about whether the data is completely restricted, or whether it can be accessed under certain conditions. 

"The research materials supporting this publication have been deposited in [insert repository/data centre/archive here]. If you wish to access these research materials please contact [insert non personal contact details here]."

"Due to the confidential nature of some of the research materials supporting this publication not all of the data can be made accessible to other researchers. Please contact [insert non personal contact details here] for more information."

"Anonymised interview transcripts from participants who consented to data sharing, plus other supporting information from the [insert repository], subject to registration, at [insert DOI here]."

"Due to ethical concerns, supporting data cannot be made openly available. Further information about the data and conditions for access are available at [the University of Plymouth PEARL repository/ insert external repository name here]:  [insert DOI here]."

"Due to the (commercially, politically, ethically) sensitive nature of the research, no interviewees consented to their data being retained or shared. Additional details relating to other aspects of the data are available from [the University of Plymouth PEARL repository/ insert external repository name here]: [insert DOI here]."

"Supporting data are available to bona fide researchers, subject to registration, from the [institution/repository] at [insert DOI here]."

 If you have non-disclosure agreements, patents pending, or other contractual restrictions on sharing your data you should give this information in the Data Access Statement.

 

"Supporting data will be available from [the University of Plymouth PEARL repository/ insert external repository name here] at [insert DOI here] after a six-month embargo period from the date of publication to allow for commercialisation of the research findings."

"Due to confidentiality agreements with research collaborators, supporting data can only be made available to bona fide researchers subject to a non-disclosure agreement. Details of the data and how to request access are available at the University of Plymouth PEARL repository at [insert DOI here]."

Consider digitising non-digital data for preservation if possible. If this is not possible the location of the non-digital data should be included in the Data Access Statement. You can still request a repository record for your non-digital as well as your digital data to obtain a record with a unique ID that can be shared in your statement.

 

"Non-digital data supporting this study are stored by the corresponding author at the University of Plymouth. Details of how to access these data are provided in the documentation available at the University of Plymouth PEARL repository at [insert DOI here]."

If you have not used secondary data sources and have not created new data but need to provide a Data Access Statement

"No new data were created during the study."

You can pre-register data with us and we will mint a doi for that dataset. You can then upload files once the data is available. Other repositories may offer similar facilities.

 

"All data is currently available upon request and will be made available from [insert repository name and add doi or other information as appropriate] [option to add a timeframe]"

"Data is deposited in [inset repository] with the following doi [inset doi] and will be openly available after an embargo period of X. If you require access earlier, please contact the authors to request this"