1. Can you briefly describe what your research is about?
Creating a FAIR data infrastructure for storing and analyzing content from Dutch and EU legal documents (e.g. court decisions) is semantically interoperable with open-domain knowledge and existing legal data sources.
2. How did you do your research?
Using data engineering, integration and extraction processes, we built software to collect information about Dutch and EU court decisions from heterogeneous online sources.
We used Semantic Web technologies, ontologies and digital knowledge management techniques to store, represent and harmonize the meaning (semantics) of the information so that machines can process this data.
We then conducted pilot studies to analyze the court decisions from the perspective of their legal topics, and how they relate to each other through the citation network, (judges cite similar and relevant court decisions to support their decisions in a case).
3. What tools did you use to make your research FAIR?
Zenodo and Open Science Framework (OSF) for Findability & Accessibility:
- We archived our data in these repositories because they provided persistent storage (these repositories have policies in place for long term data storage to prevent common problems such as “dead links” and missing data).
- These repositories also automatically assign a unique Digital Object Identifier (DOI) to your data archives, which can be used to uniquely identify and access descriptions and downloads related to the data over its full lifetime.
Community standards used for software, metadata and data formats (to enhance Reusability):
- We published the software that performs our data processing and experimental analyses on Github using the open-source, widely used and community-supported Python programming language.
- The data input and build requirements for the software have been provided using community standards for data formats and specifications, e.g. comma-separated values files (CSV). We also explain what the variables of the input and output data files mean using community standard vocabularies and metadata.
- Jupyter notebooks are used for sharing our data analyses with other researchers who want to plug in and test our methodologies on their own data or reproduce and validate our results.
4. What UM-services did you use?
UM provided the computers and laptops used by members of our team.
5. To what extent were you able to make your research FAIR?
We tried to publish our data, and our experiment results in archives that have persistent storage, providing Digital Object Identifiers (DOIs) for the data. We also use community standards for terms we use to describe our input data, software analysis and output data from our experiments. We publish the descriptions of all variables used in our studies alongside the research data and actual results. If you are a researcher who wants to reuse our work, we are open to feedback on how we can make it quicker and easier for you to do so!
6. Is your data machine-readable?
To an extent. We have converted court decision metadata from multiple jurisdictions using different vocabularies to a common data model using ontologies and other semantic web technologies maintained by the law and technology research community. The variable names and what they represent are also captured and published alongside the data. We have developed a documented Application Programming Interface (API), which provides a standard method for how software engineers can extract and integrate our data with their own data analysis workflows and into those of their research colleagues.
7. What lessons have you learnt from the experience?
- FAIR is not a concrete set of tasks. It is a set of guidelines.
- We encountered challenges and limitations in making our data and software FAIR, but we learned that it is a continuous process. There is always room for improvement in making our digital resources FAIR.
- The FAIRness of your data is not binary. It is not either FAIR or not FAIR. Rather, it is a spectrum. The FAIRness of your data can be increased or decreased by decisions that you make in how to describe and publish it.
- At the core of FAIR is deciding how to make your data maximally understandable and reusable by other researchers with the least amount of time and effort required on both parts.
- How to implement FAIR is particular and unique to the community in which you are doing your research
8. How do you think we can benefit from FAIR research?
Time is precious. Saving time in reusing data and software generated by other researchers, and reproducing their results are some of the main benefits. If this is done on a larger scale, we will have less duplication of others’ work and more reuse and improvement of existing data to test new hypotheses. Less time will have to be spent on extracting, compiling and processing new data, which will lead to the acceleration of scientific discoveries and swifter contributions being made to the body of human knowledge.
9. Are your metadata shared in a repository?
Open access paper describing the overall project goals: https://doi.org/10.1017/err.2019.51
Example of a study conducted as part of the project: EU court decisions analysis
- Input data: https://doi.org/10.5281/zenodo.3926736
- Software performing the analysis: https://github.com/MaastrichtU-IDS/docona
- Results and output data: https://doi.org/10.5281/zenodo.4228652
- Technical Resources for the study: https://eu-corporate-mobility.org/
About Kody Moodley
Kody Moodley is a postdoctoral researcher who joined the Institute for Data Science at Maastricht University in March 2017. Kody completed his PhD in Computer Science through tenures at the University of Manchester and the University of KwaZulu-Natal.
- More about Kody Moodley (UM profile page)
- Publications overview (Pure)