SWFsEUROPE 

The SWFsEUROPE (SH.5) project focuses on the study of sovereign wealth funds in Europe. The researchers study how certain nation-states invest and become parts of global capitalism and how to implement FAIR principles. SWFsEUROPE is funded by the European Research Council. We interviewed Imogen Liu about the project.

1. Can you briefly describe what your research is about?
I am a political economist currently researching the actors and processes involved in realising Chinese state-owned foreign investment in Europe. Fundamentally, I am interested in researching questions of political, economic change, the inter-relatedness of states, markets and their political construction. I am trained in both qualitative and quantitative methods (R, Stata, Python). Before embarking on an academic career, I worked as a book editor for six years.
2. How did you do your research?
The research I’m doing is primarily qualitative, and my primary source is semi-structured interviews, specifically elite interviews. That can sometimes present its own challenges, and in terms of FAIR, this means that I essentially make my meta-data available.
3. What tools did you use to make your research FAIR?
I apply FAIR in my own research first from the faculty’s perspective, where we have chosen to adopt the F and the A of the principle, which is findable and accessible. I work with archival data and semi-structured interviews and for that means ensuring that I’ve catalogued my data correctly. So practice sound data management principles, I ensure that my data is first findable to the members of my research team and then storing it in a data repository for us that’s dataVerseNL. The next step in the project is to make the data inter-operable to the members in our team and that involved coming up with a standardised way of reading all about the data we collected as a team in a variety of contexts, so, for instance, the nature of the data, context, which kind of people are interviewed.
4. What UM-services did you use?
Data Stewards are essential people to be working within making data FAIR. And what they do is really help you bridge the gap between a set of abstract principles [Ed. Imogen and her team used the Data Steward services available at the University Library].  How to go on step by step when sitting in front of my computer or sitting in front of an interviewee to work FAIR.
5. To what extent were you able to make your research FAIR?
Since September 2020, when we started working together, we have uploaded draft versions of their meta-data on DataVerseNL. We have agreed to publish it once the final data is collected, publications are accessible, and the project is finished. This decision has been taken based on the data’s sensitive nature and to protect the participants’ interest in the study. Our last interview took place in July 2020.
6. Is your data machine-readable?
The data is essentially qualitative data physically gathered by the research. This data is, in essence, not machine-readable.
7. What lessons have you learnt from the experience?
There are many advantages to FAIR. The challenge is to find out how to go on? Especially to find out how our data translate specifically to different forms of data. The challenge with qualitative data is that I am doing interviews with elite interviewees, and the data can be sensitive. So it touches on issues like confidentiality, trade secrets, data privacy, and all these issues are things I need to think about in terms of ensuring FAIR and that it does not infringe on any ethical issues that the respondents have consented to.
8. How do you think we can benefit from FAIR research?
I think with any form of systematic accountability, there is additional work involved. I think the challenge now is we are just trying to make it off the ground. So, you know, maybe then it means the setup costs are higher. In the end, if everything is more standardized if researchers are thinking more about FAIR, then it pays off in the long run.
9. Are your metadata shared in a repository?
As explained before a draft version of the project’s data sets has been upload. We have agreed to publish at the end of their PhD projects. Consent, in qualitative research, is often a continuous process. This means that until the end of the PhD, the researchers won’t be sure which meta-data is highly safe for the researcher and the participants.

About Imogen Liu

PhD

Imogen is a political economist whose research interests cover subjects including state capitalism, foreign investment, sovereign wealth funds, and China’s political economy.

 

 

 

D3M 

Interview with Adam Jassem to find out more about the project and the relation with FAIR.

1. Can you briefly describe what your research is about?
In our research, we try to identify tax news and signals about US presidents’ future tax changes. We try to identify this tax news from their speeches.
2. How did you do your research?
We use text analytic methods to try to get something out of it. Our primary data is actually publically available; it is just the speeches made by the presidents.
3. What tools did you use to make your research FAIR?
The team used DataVerseNL, particularly for my research.
4. What UM-services did you use?
We used the Data Stewardship services of Maastricht University Library. Maria Vivas-Romero became the person in charge of following the project, creating the DataVerseNL environment and verifying copyright issues with the data providers in the United States.
5. To what extent were you able to make your research FAIR?
We managed to make a DataVerseNl draft. Together, we looked into the most appropriate vocabulary to draft the meta-data for the project. This meta-data was the one used in the field of Economics and used by the American Economic Journal. See more of the policy promoted by the Journal at https://www.aeaweb.org/journals/data/data-code-policy. For us, the main challenge was how to make it interoperable and re-usable. Trying to think what other researchers might want to do with it, and what kind of format is the most useful for them. It takes a lot of expertise when it comes to this putting yourself in other people’s shoes.
6. Is your data machine-readable?
Yes, the data in the project are script files created by the researchers. These files are all machine-readable. The last version should be uploaded in DataVerseNL at the end of the PhD project.
7. What lessons have you learnt from the experience?
Making sure that other people can understand your data! How to create meta-data and what terminology to use. We had to understand that some ontologies have certain vocabularies or words that are agreed to be understandable in the field. This was the main challenge, but hopefully, we managed to overcome it.
8. How do you think we can benefit from FAIR research?
By sharing all of it, we hope to encourage other people to look into our methods and replicate our research. Making sure that other people can do it, that they can access our data and understand it. You do not want all of these to fit your own project only when you have done all this work, and you can share it with others. Therefore, others do not have to do it from square one, then each time it is useful for everyone. I hope at the end the thinking that wins are: “Today, I did a lot of work, but maybe tomorrow I am going to also benefit from somebody else sharing their work with me”. Creating this community of people who create data that benefits all of us is the main idea.
9. Are your metadata shared in a repository?
As explained before, the meta-data has been put in DataVerseNL. I like to wait for the end of my PhD project when my papers are sent to a publisher for review to publish the data.

 

About Adam Jassem

 

 

Lawgex 

We interviewed project member and Postdoctoral researcher Kody Moodley to find out more about the project and its relation with FAIR.

1. Can you briefly describe what your research is about?
Creating a FAIR data infrastructure for storing and analyzing content from Dutch and EU legal documents (e.g. court decisions) is semantically interoperable with open-domain knowledge and existing legal data sources.
2. How did you do your research?
Using data engineering, integration and extraction processes, we built software to collect information about Dutch and EU court decisions from heterogeneous online sources. We used Semantic Web technologies, ontologies and digital knowledge management techniques to store, represent and harmonize the meaning (semantics) of the information so that machines can process this data. We then conducted pilot studies to analyze the court decisions from the perspective of their legal topics, and how they relate to each other through the citation network, (judges cite similar and relevant court decisions to support their decisions in a case).
3. What tools did you use to make your research FAIR?
Zenodo and Open Science Framework (OSF) for Findability & Accessibility:
  • We archived our data in these repositories because they provided persistent storage (these repositories have policies in place for long term data storage to prevent common problems such as “dead links” and missing data).
  • These repositories also automatically assign a unique Digital Object Identifier (DOI) to your data archives, which can be used to uniquely identify and access descriptions and downloads related to the data over its full lifetime.

Community standards used for software, metadata and data formats (to enhance Reusability):

  • We published the software that performs our data processing and experimental analyses on Github using the open-source, widely used and community-supported Python programming language.
  • The data input and build requirements for the software have been provided using community standards for data formats and specifications, e.g. comma-separated values files (CSV). We also explain what the variables of the input and output data files mean using community standard vocabularies and metadata.
  • Jupyter notebooks are used for sharing our data analyses with other researchers who want to plug in and test our methodologies on their own data or reproduce and validate our results.
4. What UM-services did you use?
UM provided the computers and laptops used by members of our team.
5. To what extent were you able to make your research FAIR?
We tried to publish our data, and our experiment results in archives that have persistent storage, providing Digital Object Identifiers (DOIs) for the data. We also use community standards for terms we use to describe our input data, software analysis and output data from our experiments. We publish the descriptions of all variables used in our studies alongside the research data and actual results. If you are a researcher who wants to reuse our work, we are open to feedback on how we can make it quicker and easier for you to do so!
6. Is your data machine-readable?
To an extent. We have converted court decision metadata from multiple jurisdictions using different vocabularies to a common data model using ontologies and other semantic web technologies maintained by the law and technology research community. The variable names and what they represent are also captured and published alongside the data. We have developed a documented Application Programming Interface (API), which provides a standard method for how software engineers can extract and integrate our data with their own data analysis workflows and into those of their research colleagues.
7. What lessons have you learnt from the experience?
  • FAIR is not a concrete set of tasks. It is a set of guidelines.
  • We encountered challenges and limitations in making our data and software FAIR, but we learned that it is a continuous process. There is always room for improvement in making our digital resources FAIR.
  • The FAIRness of your data is not binary. It is not either FAIR or not FAIR. Rather, it is a spectrum. The FAIRness of your data can be increased or decreased by decisions that you make in how to describe and publish it.
  • At the core of FAIR is deciding how to make your data maximally understandable and reusable by other researchers with the least amount of time and effort required on both parts.
  • How to implement FAIR is particular and unique to the community in which you are doing your research
8. How do you think we can benefit from FAIR research?
Time is precious. Saving time in reusing data and software generated by other researchers, and reproducing their results are some of the main benefits. If this is done on a larger scale, we will have less duplication of others’ work and more reuse and improvement of existing data to test new hypotheses. Less time will have to be spent on extracting, compiling and processing new data, which will lead to the acceleration of scientific discoveries and swifter contributions being made to the body of human knowledge.
9. Are your metadata shared in a repository?
Open access paper describing the overall project goals: https://doi.org/10.1017/err.2019.51​ Example of a study conducted as part of the project:  EU court decisions analysis

About Kody Moodley

Postdoctoral Researcher

Kody Moodley is a postdoctoral researcher who joined the Institute for Data Science at Maastricht University in March 2017. Kody completed his PhD in Computer Science through tenures at the University of Manchester and the University of KwaZulu-Natal.

 

 

 

FAIRHealth

PhD Candidate Chang Sun tells us about her project and her experience with making research FAIR.

1. Can you briefly describe what your research is about?
The secure analysis of health data on institutional infrastructure. We have made use of distributed machine learning and privacy-preserving techniques to analyze partitioned health data. The FAIR Health project seeks to develop (or has developed) a solution to make this data available for processing and analyzing with open metadata. Thus, these data did not have to leave the hospital. See: analyzing partitioned FAIR health data responsibly In the FAIRHealth project, we established a scalable technical and governance framework, which can combine access-restricted data from the Maastricht Study and CBS in a privacy-preserving manner. We first made the data FAIR at the data source and coupled FAIR data to a federated learning framework based on the “Personal Health Train” architecture. The project also invested in developing a governance framework, including the legal and ethical basis for processing personal data from individuals.
2. How did you do your research?
By using privacy-preserving distributed machine learning. We applied privacy-preserving distributed data mining methods and the Personal Health Train architecture to establish a secure infrastructure for analyzing distributed data. To be brief, this infrastructure enables researchers to send data analysis models to the data sources rather than transferring the original data to researchers.
3. What tools did you use to make your research FAIR?
  • Docker container, Conda and Jupyter notebooks for making project-reproducible (FAIR Software)
  • Zenodo for findability (PID and file storage)
  • API for accessibility

We used Docker, Conda (Python), Gitlab, GraphDB, Bioportal, data standards and ontologies (e.g., SNOMED Clinical Terms, LONIC).

4. What UM-services did you use?
We used Disqover to obtain the data variables’ metadata and a Linux machine in the UM network.
5. To what extent were you able to make your research FAIR?
We tried to make all data, analysis models, and development process FAIR. Before we did experiments with real personal data, we generated simulation data published with a valid license. The real data (variables) we applied in this project were also documented at both UM and CBS sides. To preserve individuals’ privacy, information about data instances (whose data has been used) are not available. The analysis models and tools used to develop the infrastructure are documented publically in our repository with detailed information, including parameters and specific versions.
6. Is your data machine-readable?
Yes. As described above, the FAIRHealth project enables data to be processed and analyzed at their sources. In this case, data was made interoperable and machine-readable so that the analysis model can be executed on the data without human interaction in between. We converted data from a CSV to an RDF (Resource Description Framework) format and stored them in a graph database.
7. What lessons have you learnt from the experience?
  • Good FAIR data should be defined with well-recognized terminology, where it is stored and whom they can ask for it.
  • It is essential to keep your work reproducible for yourself and more importantly, for others.
  • Interoperable remains a challenge.
  • FAIR data is not the same as Open data

For our scientific work’s reproducibility, FAIR is a concept that we should always keep in mind when we conduct research. We should make data and publications, models, tools, and developing steps in our research FAIR. Based on your domain and specific topic, you can emphasize one or multiple parts of FAIR.

8. How do you think we can benefit from FAIR research?
Make FAIR data and reproducible scientific workflow are the drivers for the sake of science.

About Chang Sun

PhD

Chang Sun is a PhD student who has started at the Institute of Data Science at Maastricht University in October 2017. Her research interests cover privacy-preserving data mining, federated/distributed machine learning, personal health data sharing and analysis.

 

 

 

Chang Sun