Documentation refers to any guidance that you are using to help control, manage or understand your collection of data.
This could be a set of guidelines, equipment set-up, description of fields on a dataset, specific mixes to create a paint colour or lab notebooks. These will not necessarily become part of your research output, but need to be managed in order to ensure the integrity of your output.
Metadata is essentially 'data about data'. You give your data metadata to help others to discover it. For example, library records about an item in the collection are a form of metadata and they mean you can search through a library catalogue for titles, authors and other terms. If you deposit your data into a repository, you will be expected to provide metadata to enable discovery of your data.
You might follow a metadata standard relating to your chosen repository or another disciplinary standard. For example, you may need to use controlled vocabulary to describe your work, so that other searching using those terms can find your data. Standards can also be useful tools to aid you in describing your work adequately.
This exercise from the Mozilla Science Lab may help you get used to describing your data appropriately for future re-use.
Any information that cannot be recorded in a structured way (i.e. as the values of fields in a data or metadata file) can be recorded as free text within a readme file.
There are many commercial options here, but Jupyter Notebook is a good free option for researchers who are writing code or GitHub may be used to handle the project, versions and documentation. Evernote is a flexible tool and you can attach different types of files to notes, with a tagging system to help organise. MS OneNote is another free solution that integrates fully with Office 365 applications.
Some file formats can record information in addition to the main data content. For example, the Observations and Measurements XML standard provides a way of recording sampling strategies and procedures as well as measurement values.
Some disciplines have developed special file formats or data structures for recording supporting information. You can find more information on this below.
In some cases, archives/repositories generate specialist metadata files from their submission forms. Find out the fields of the submission form of the archive to which you are planning to submit your data, copy these fields into your data documentation and fill these in as you go through your project.
Some of the information needed to understand data would normally be provided in a journal article reporting the research. In order to prevent duplication of effort, it is possible to refer to an article to provide more information about a dataset, but before doing so you should be sure that (a) the article provides sufficient detail and (b) that the article will be available as open access.
When documenting your data, the aim is to provide enough information so that a fellow researcher who is familiar with your field, but not necessarily with your work, should be able to understand the data, interpret it and use it in new research, without the need to contact you directly about the dataset.
An overview of the data should include:
Specifically, you may need to include some of the following information:
You may be recording some of this information in a lab notebook or research journal. If so, you may find it convenient to maintain an index file that links data files to the corresponding page numbers until you have an opportunity to transfer the information into a documentation file.
A 'readme' file is a plain text file that is named 'readme' to encourage users to read it before looking at the remainder of the content. It can contain documentation directly or instruct the reader where to look to find more information. Even though it is free text, the file should be structured into sections as an aid to the reader. The following table summarises suggestions on what to include. There are some examples of readme files provided as links below the table.
|Section||What to include|
Information needed so that the reader can cite your dataset:
Describe how you collected the data:
|Third-party inputs||If you used third-party data, provide a data citation or a description of how you accessed the data.|
Provide details of the steps you took to process the data:
If your workflow generates auxiliary files as well as data files, explain which are which.
Relate the outputs of your workflow to the data files you have, or will be submitting, for archiving.
|Inventory of files||
Give the names of the files in the dataset, a short description of each, and how they interrelate.
Mention related data that was not selected for inclusion, such as auxiliary files generated by your workflow.
|File structure and conventions||
Provide details on how to interpret your data files:
Give a short statement about the terms under which others may use the dataset.
If necessary, the full text of the licence may be given in a separate plain-text file called 'licence.txt'.
|Relationships||If applicable, give links to related datasets, alternative records or publications.
As a researcher, the three main types of metadata you will be asked to provide are contextual metadata, discovery data, and metadata for reuse.
This describes the context within which the project was conducted. This helps to connect your data to your own research profile, and to your project, funding body and publications.
This helps other researchers to find your data, and as a result may help to increase the impact of your research. You will provide discovery metadata when you complete a record in Elements for the PEARL repository, or another research data archive or repository.
Metadata for reuse
The metadata you provide for reuse will depend on the field of your research.
Check below for links to a number of subject-specific metadata standards and to catalogues of metadata standards.
Some subject areas have agreed on a common set of terminology to use when describing data. Metadata standards list the properties of the dataset that need to be known and vocabularies provide a standardised set of terms with which these properties can be recorded.