RDM Support @UMResearch Data Management Guide
Research Data Management Guide
1.About Research Data Management
Research data are the foundation of scientific knowledge. They are fundamental for researchers to answer their research question or to test a hypothesis. The type of research data collected during a project depends on the research method. There are four main types of research data:
- Observational data: captured in real time, typically cannot be reproduced exactly
- Experimental data: from labs and equipment, can often be reproduced but may be expensive to do so
- Simulation data: from models, can typically be reproduced if the input data is known
- Derived or compiled data: after data mining or statistical analysis has been done, can be reproduced if the analysis is documented.
Examples of data types are text, numbers, images, 3D models, software, audio files, video files, reports, surveys, etc.
Research data are valuable, therefore proper data management is crucial.
Research Data Management concerns how you:
- Create data and plan for its use;
- Organise, structure, and name data;
- Keep data (make it secure, provide access, store and back it up);
- Find information resources;
- Share with collaborators;
- Publish and get cited.
“Research data management concerns the organization of data, from its entry to the research cycle through to the dissemination and archiving of valuable results.”
(from, Whyte, A., Tedds, J. (2011). ‘Making the Case for Research Data Management’.
1.1.Motives for Research Data Management
As a researcher, you will benefit from managing your research data because:
- It increases efficiency;
- Increases quality of the data;
- It makes research reproducible;
- Although it takes time up front, it will save you time in the end;
- It will keep your data safe, minimizes the risk of data loss or access by unauthorised persons;
- Is a step in making your data FAIR.
The FAIR principles have become a well-known concept as the science community is moving towards transparent, reproducible and Open Science. The FAIR principles act as a guide for improving the Findability, Accessibility, Interoperability and Reuse of data.
Re-use of data and data combining makes research more effective and facilitates new research. In addition, sharing your data will improve the visibility of your work, resulting in citations, impact and esteem and expansion of your network.
Institutions, research funders and publishers are important pioneers for data sharing and data management. They require a long-term strategy for the provision of data resources and they encourage researchers to share their data. By organizing and managing research data, you will comply with funder requirements.
Research data have to be stored for at least ten years (may vary according to discipline). Data are made available to other academic practitioners upon request unless legal provisions dictate otherwise.
Regulation and guidelines for research data management at Maastricht University are available in the Research Data Management Code of Conduct [PDF].
1.2.Steps in the Research Data Life Cycle
The research data life cycle contains several steps that have to be taken at different stages of the research cycle in order to ensure successful data curation and preservation.
Proposal Planning & Writing
- Determine funding opportunities
- Write (funder) proposal
- Conduct a review of existing data sets
- Determine if the project will create a new dataset (or combine existing)
- Investigate archiving challenges, consent, confidentiality and licenses
- Make an estimate of the Data management costs
- Identify potential users of your data
- Contact your faculty data steward or support at the library for advice on a suitable archive
- Create a data management plan
- Make decisions about documentation form (protocol, naming convention, version control, loggings) and content of the project
- Content pre-test & tests of materials and methods
- Follow best practices
- Organize files, backups & storage
- Think about access control and security
- Manage file versions
- Document analysis and file manipulations
- ‘FAIRification’ of your data
- Determine file formats
- Contact archive for advice
- Document (more) and clean-up data
End of Project
- Write a paper
- Submit report findings
- Deposit data in data archive (repository)
2.Starting your research
Once your research idea has been elaborated, you will have to find funding for your project. Nowadays, most funders initially require a data management paragraph as part of the research proposal. After you have obtained a grant, funders require a data management plan within a set period of time.
Contact your faculty data steward for support.
In other cases, specialists from the University Library and MEMIC and DataHub are happy to support you in answering the questions in your proposal and to support you in writing your Research Data Management Plan.
Investigating potential funding sources and preparing postgraduate research applications is a lengthy process, so you should allow plenty of time for this.
The most common funders for research projects at Maastricht University are NWO, ZonMW and Horizon2020.
The Contract Research Centre (CRC) offers professional support in acquiring national and international grants and funds for research projects and scholar- and fellowships. For (almost) every project proposal, there are funding possibilities!
On the intranet of Maastricht University, you will find further information on grant opportunities and funding. Please find additional information on the intranet page of Contract Research Centre.
2.2.Research Data Management Paragraph
You will want your research proposal to stand out from others in order to heighten your chances of being selected and acquiring funding for your research project. Usually, funders require a data management paragraph as part of the research proposal. While you write the data management paragraph on how you will manage your data, keep in mind that most funders are in favour of open access and reuse of research data.
2.3.Research Data Management Plan
The goal of a Research Data Management plan is to consider the many aspects of research data management including, storage and preservation, access and restrictions and description of the data collection with metadata. Compliance with a good Data Management plan will ensure that data are well-managed in the present, and prepared for preservation in the future. The Research Data Management plan is a dynamic document that should be regularly adapted during the project.
A Research Data Management Plan typically states how and what kind of data will be created during the research project. A Research Data Management Plan should outline the plans for storage, sharing and preservation of your data during and after the research project. Additionally, it should describe the security measures that will be taken and the restrictions that will apply given the nature of the data. (Sensitive) personal data or patentable data need extra protective measures and restrictions for sharing and preservation might apply.
Most funders require a data management plan. But Maastricht University does not oblige researchers to provide a Research Data Management plan if projects are not funded by the well-known funding bodies. Nevertheless, it is highly recommended to write a Research Data Management plan and it might be a requirement of your faculty. Contact the data steward or information manager of your faculty for more information.
For additional information and data management plan templates from funders, check the links to the websites of most common funders:
- Horizon2020 portal
- Horizon2020 participant portal
- Evaluation of proposals
- Guidelines on FAIR data management in horizon 2020
Additional useful links:
2.4.Personal data: Privacy & Security
Data security and privacy are very important. In case your research project involves personal data, it is important to understand how this may affect your research and your data (collection). Therefore, it is important to tackle possible privacy and security issues at a very early stage of your project.
In case your research does not involve personal data you still have to consider security measures regarding the integrity and confidentiality of your research data. These matters must be addressed in your research data management plan.
Personal data means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person. This also includes sensitive personal data which must be treated carefully.
Sensitive data are data revealing:
- Racial or ethnic origin
- Political opinions
- Religious or philosophical beliefs
- Membership of trade unions
- Genetic data
- Biometric data for the purpose of uniquely identifying a natural person
- Physical or mental health condition
- Sexual life or orientation
- Convictions, proceedings and criminal acts
- National Identification Number (BSN in the Netherlands)
In case your Research project involves personal data UM requires you to undertake the following steps. This may vary per faculty or department.
Registration of the project:
- Contact the information manager, data manager or data steward of your faculty to register your project and help you with the risk classification of your research project;
- The outcome of the classification will be a score (low, middle, high) on:
- Based on the score you will have to take appropriate measures
- A Data Protection Impact Assessment (DPIA) may be mandatory in case the Research project involves sensitive personal information. Please contact your information manager for this. Based on the outcome of the DPIA, you are required to take the necessary measures in collaboration with the information manager of your faculty.
- Find information on ICT Security including policies on the UM Security pages UM Security pages.
- Visit the Privacy Regulations website Privacy Regulations website for detailed information on this topic.
- If you have specific questions about personal data processed in your Research project, please contact your Information Manager found under Privacy:GDPR. If necessary your Information Manager can contact the central UM privacy team.
Legal obligations when processing personal data:
- Collect the minimal amount of personal data (i.e. only the data that are directly necessary for your research project)
- Obtain the consent of participants for collecting personal data and inform participants in a clear and transparent manner about your processing of their personal data
- If consent cannot be obtained; be able to provide an alternative legal ground for processing
- Store contact details (e.g. name, DoB, address, BSN etc.) and research data separately (pseudonymisation)
- Do not use cloud services UM does not have a contract with and especially do not use public cloud services as:
- Storage of data might not take place in Europe and others might have access to your files;
- The continuity of the service is not guaranteed;
- Providers might obtain undesired rights of ownership of the stored files.
- Anonymise/pseudonymise data as soon as possible
- Use safe passwords and change them regularly
- In case of a (presumed) data leak immediately contact ServiceDesk ICTS via Servicedesk-ICTS@maastrichtuniversity.nl or call (043) 388 5555 (on working days, 8:00-17:00) and inform your Information Manager, even if it’s only one person’s data and even if you don’t suspect any harm will come out of it.
Several ethical review committees do an ethical review of scientific research involving either human participants or personally identifiable data. At Maastricht University, the following three ethics committees assess research on ethical issues.
Ethical Review Committee Inner City Faculties (ERCIC)
ERCIC is supportive of ethical issues specifically for the Inner city faculties. ERCIC encourages researchers to submit their study protocols involving human participants or personally identifiable data for ethical review before the start of research activities. Review by ERCIC currently takes place on a voluntary basis.
Visit the website of ERCIC for more information.
Ethics Review Committee Psychology and Neuroscience (ERCPN)
The ERCPN is the Ethics Review Committee of the Faculty of Psychology and Neuroscience and requires researchers to submit their study protocols involving human participants or personally identifiable data for ethical review before the start of research activities. Ethical review of scientific research involving human participants or personally identifiable data is carried out by the Ethics Review Committee Psychology and Neuroscience (ERCPN). If studies fall under the Medical Research Involving Human Subjects Act (WMO), review by the accredited review committee METC is mandatory.
Visit the website of ERCPN for more information.
Medisch Ethische Toetsings Commissie ( METC)
The medical ethics committee of MUMC+, is an independent by government accredited ethical review committee. METC assesses medical-scientific research involving human subjects in accordance with legislation on Medical Research Involving Human Subjects (Wet Medisch Wetenschappelijk Onderzoek, WMO). For studies that fall under the Medical Research Involving Human Subjects Act (WMO), review by the accredited review committee METC (Medisch Ethische Toetsingscommissie) is mandatory.
Additional information can be found on the Ethics Review website.
Informed consent is the process by which a researcher discloses appropriate information about the research after which a participant can make a voluntary, informed decision to accept or refuse to cooperate.
Normally informed consent is given before the start of the research. Gaining informed consent is crucial to meet your legal and ethical obligations towards participants whilst simultaneously enhancing the value of your research data.
To obtain informed consent, researchers should:
- Inform participants about the purpose of the research;
- Discuss what will happen to their contribution (including the future sharing and archiving of their data);
- Indicate the steps that will be taken to safeguard their anonymity and confidentiality;
- Outline their right to withdraw from the research at any time.
Consent needs to be freely given, informed, unambiguous, specific and by a clear affirmative action that signifies agreement to the processing of personal data.
Source: Informed Consent – CESSDA ERIC
Article 4(11) of the GDPR defines consent as: “any freely given, specific, informed and unambiguous indication of the data subject’s wishes by which he or she, by a statement or by a clear affirmative action, signifies agreement to the processing of personal data relating to him or her.”
Consent is one of the lawful grounds on which personal data processing has to be based.
Source: Guidelines on consent: https://autoriteitpersoonsgegevens.nl/sites/default/files/atoms/files/guidelines_on_consent.pdf
2.7.Costs of Data management planning
There are huge differences in the nature of research projects but all projects have in common that they need to put an effort in organising, documenting and storing research data. Some projects will need to pay more attention to detailed data documentation, organisation and formatting than others.
Depending on the type of research project, you might need to invest in additional products, support and services to manage your research data. This might lead to additional costs.
To avoid unpleasant surprises and to avoid running out of financial resources it is recommended to define a budget for Research Data Management costs.
There is no hard and fast approach to costing research data management. However, there are guides and tooling available to support you in costing data management.
Landelijk Coördinatiepunt Research Data Management (LCRDM, National Coordination Point Research Data Management) developed a data management costs guide, a practical overview of possible costs per activity within each phase of the data life cycle.
The UK Data Service has developed an activity-based costing tool for costing Data management in the social sciences.
3.Collecting, Processing & Analysing
3.1.Discover and reuse existing data
Funders, publishers and universities increasingly require researchers to make their data Findable, Accessible, Interoperable and Reusable (FAIR). The community is clearly moving towards transparent, reproducible and Open Science.
With more and more datasets becoming available, the chance increases that usable datasets for your research project are available. Therefore, before starting your project, it makes sense to try to find out whether an available dataset is sufficient to conduct your research or whether your data can be appropriately combined with an existing data set. Existing data can help you save lots of time in optimising your specific study design or by simply reusing it.
There are plenty of resources for potentially relevant data. A few of them are listed below:
3.2.Access existing data
Keep in mind that when reusing existing data, it is important to know the legal status of the data. You will have to take into account that consent of the author or creator of the data might be required. In certain situations, you might have to deal with copyrights, licenses, fees and charges.
For more details and information on this matter, please refer to the downloadable SURF report by the Centre for Intellectual Property Law (CIER): The legal status of raw data: a guide for research practice.
3.3.Research in progress (dynamic phase)
When you start collecting your data, it is important to consider which file formats you will use to store your data (i.e. the way in which the information is coded). It is recommended to choose as much as possible for independent (non-proprietary) file formats so that in the future, you can still use or display the file contents in a reliable way. A list of preferred and acceptable formats can be found at DANS and 4TU.ResearchData [PDF].
In some cases a specific format can be chosen for data collection and analysis and another format is chosen for archiving the data. After conversion of the file format, it is important to check your data for errors or changes that may be caused by the export process, e.g. loss of content, loss of metadata, loss of layout and loss of quality.
Storage and back-up
While collecting, processing and analysing data (dynamic phase), you will need solutions to store your data and you will have to back them up regularly to avoid accidental or malicious data loss.
It is important to keep your data safe and secure, so you should carefully consider your storage options. When choosing a storage option it is crucial to take into account the type of data you will be generating. Personal or sensitive data need extra measures to keep them secure and are best stored on network drives of the institution.
In its efforts to support researchers in their use of research data, UM is offering free additional storage space. This enables every researcher to save, edit and analyse data in a secure way. It also allows them to share data directly with fellow researchers within UM. By using this storage space, you will have less to worry about when it comes to GDPR (General Data Protection Regulation) compliance. In addition, a Data Management Plan will be drawn up collectively, so that you can make better use of your research data. If you are interested in using this extra storage space, please contact the information manager of your unit.
In some cases, you might need solutions to collaborate with colleagues from in- and outside the institution. Exchanging information should always happen within a secure environment and based on clear instructions and/or agreements.
Maastricht University offers a range of solutions for secure collaboration, such as SURFdrive, SURFfilesender and Virtual Research Environments (VREs), which in most cases will meet the requirements of your research project.
We advise against using (‘free’) online storage alternatives like Dropbox, Google Drive, Box, Hotmail, OneDrive, WeTransfer, Evernote and many others. Why? Because it is unclear how safe your data is when stored on one of these services. There even are services that require you to hand over intellectual rights to the provider.
We feel intellectual property should never be handed over to a third party. Moreover, we (UM) have a legal obligation to protect (especially) sensitive data.
Think carefully about how and with whom you share files. Take care of being compliant with UM’s policies. Record in writing agreements you have made with parties about sharing and use of data, to protect your intellectual property. Be in line with legislation and, if applicable for your project, informed consent given by your data subjects. Remember that if your project involves personal data, you need a Data Processing Agreement in case a third part is processing your data. For information on Data Processing Agreements, contact the information manager of your faculty.
To keep your data usable and understandable to yourself and future users you should document your data. Data documentation explains how the data were created, what the data mean, the content of the data, data manipulations that may have taken place …
Data documentation should contain information on the context of data collection, the collection methods, changes made to data, different versions…
3.4.Organising your research data
Once you have started collecting, generating and analysing data you might quickly lose the overview. You will save lots of time and annoying errors by structuring files and folders from the very first beginning of your research project.
- Determine a folder structure beforehand
- Define logical categories
- Use a naming convention (document the naming scheme in a Read Me file)
- Keep file names clear and short
- Avoid using spaces, dots and special characters in file names
- File names should be consistent, meaningful and easy to find
- Keep raw data separately, leave them untouched and use a working copy
- Separate in progress from completed data
- Avoid ambiguous filenames like Final_1, Final_2
- For notation of dates in file names use e.g. YYYYMMDD and consequently use this notation
- Use major and minor versions like:
- Major versions: v01, v02, v03
- Minor versions: v01_01, v02_02, v03_03
4.Publishing & Archiving
After finishing your project, you will want to publish your article. Nowadays publishers more and more often require that the data of your research become publically available as well. In this section, you can find reasons for publishing your data, which data to publish, what to keep in mind when publishing data and how to publish your data.
Remember though that not all data can be made publicly available. Data may be confidential due to patents or (sensitive) personal data. As such, there may be privacy issues that prevent you from making the data publically accessible.
4.1.Why archiving data
“Scientific integrity is a specific standard of conduct associated with the societal position of the researcher. It is about acting in accordance with the values of science, such as truthfulness, honesty and open reporting, even when no one is looking over the researcher’s shoulder.” – KNAW (Royal Netherlands Academy of Arts and Sciences)
Therefore, ensuring that a study can be validated and reproduced is an important consequence of Scientific Integrity. Archiving Research Data and making your data available is not only about preventing sloppy science and about preventing fraud. The re-use of data and data combining makes research more effective, will make new research possible and it will provide credits to the researcher by being cited.
Ten simple rules for the care and feeding of scientific data:
- Love your data, and help others love it too
- Share your data online, with a permanent identiﬁer
- Conduct science with a particular level of reuse in mind
- Publish workﬂow as context
- Link your data to your publications as often as possible
- Publish your code (even the small bits)
- Say how you want to get credit
- Foster and use data repositories
- Reward colleagues who share their data properly
- Be a booster for data science
Source and full article: https://arxiv.org/pdf/1401.2134v1.pdf
4.2.Selection of data
The two main reasons for data preservation are (re-)use and validation. Archiving data can be very costly. Sometimes the costs for replication may even be lower than the costs for preservation. Additionally, there might not be enough space to preserve everything. Therefore researches should consider which data to archive.
Data that should be preserved are data that:
- Are likely to be (re-)used
- Are unique
- Enrich an open access publication
- Have to be archived because of funder’s or institution’s requirements
- Are difficult to reproduce
Pre-conditions for data preservation are:
- Usable file format
- Sufficient data documentation and metadata
- Consideration of legal and ethical limitations
- Financial considerations
The scheme on RDNL can be helpful when selecting data for preservation.
During your Research project, you will in most cases use the storage facilities offered by your faculty.
After the project, or in continuation of a publication it is recommended to store the data in a midterm repository. Maastricht University Library provides DataverseNL as the Midterm Storage Facility for our institution.
Dataverse offers storage up to the prescribed ten years after the last publication based on the data or the completion of the Research Project. This is in accordance with the Maastricht University Code of Conduct that requires you to store the data for at least ten years after the publication based on the data. (Remember that depending on the discipline this might be even fifteen years and more.)
DataverseNL enables online storage, sharing and registration of research data, during the research period and up to the prescribed term of ten years after its completion. DataverseNL is a shared service provided by participating institutions and DANS.
DataverseNL uses the Dataverse software developed by Harvard University, which is used worldwide. A factsheet about DataverseNL can be found here.
Start by creating a Dataverse account using your institutional login. To be able to use DataverseNL, your institutional account has to be linked to an existing Dataverse. To link your account to a Dataverse and for further support on using DataversNL please contact the data steward, information manager or data manager of your faculty. Alternatively, contact Research Data Management support of the University Library.
Dataverse provides your dataset with a persistent identifier. Mentioning this persistent identifier (in e.g. a publication) is an easy way to refer to the relevant dataset. It is highly recommended to provide the publisher with the persistent identifier rather than handing over the dataset, which could have an impact on the ownership of the data.
Metadata are data about data. Assigning metadata means describing a dataset in a way that it is readable and findable by computers. DataverseNL enables you to provide your data with a standardised set of metadata (Dublin Core and DDI). A sufficient and set of metadata (elements that describe the data) will enhance findability, interoperability and reusability of your data. The quality of the descriptive information regarding the data has an impact on their reusability. As a minimum, there should be a sufficient amount of metadata to make the data findable and preferably also understandable and reusable by other researchers.
Licenses such as Creative Commons (CC) replace ‘all rights reserved’ copyright with ‘some rights reserved’. There are seven standard CC-licenses. CC-BY is the most commonly used license, in which attribution is mandatory when using data. Creative Commons offers an easy to use online Chooser to determine which license is right for you.
4.4.Long term storage
In case you wish to store your data for the long term (“eternity”) and make them available permanently, there are two Dutch repositories to consider:
- EASY offers sustainable archiving of research data and access to thousands of datasets.
EASY is CoreTrustSeal certified and hosted by Data Archiving and Network Services (DANS is an institute of KNAW and NWO)
- 4TU Research Data is a Data Seal of Approval certified data repository focussing on technical data, geospatial data and engineering data.
Although 4TU Research Data is provided by four Dutch technical universities, it is available as a service for all Dutch universities.
Contact & Support
This RDM Support @UM portal is maintained by the Research Support team of the UM Library. The content is provided and monitored by the CDDI partners.
For questions or support with regards to specific services, please use the contact information provided by the supplier of the service. Use the support form above for all your general questions and remarks. Or if you don’t know who to turn to for support.