Enhancing Privacy for Genomic Data

When it comes to data privacy, genomic data may not be the first thing that springs to mind. But with the vast increase in services offering genome analysis, it’s important to make sure that your data is in the right hands. At the University of Luxembourg’s Interdisciplinary Centre for Security, Reliability and Trust (SnT), our researchers have been working on improved methods of securing genomic data. Túlio Pascoal, a doctoral researcher within SnT’s Critical and Extreme Security and Dependability (CritiX) research group, has been working on developing novel privacy-preserving mechanisms to enable secure and privacy-aware federated processing and release of genomic data. We spoke with him to find out more about his research.

What is your Ph.D. thesis about?

Genome data (i.e., DNA) of individuals is the key component to conduct Genome-Wide Association Studies (GWAS). GWAS are relevant and common observational studies that allow the identification of genetic variants associated with a particular trait (e.g., a disease). As a result, findings from GWAS have been allowing scientists to identify an individual’s predisposition to diseases early on, which can then be treated as soon as possible. Take, for example, the case of Angelina Jolie’s decision to undergo a preventive mastectomy due to a very high risk of developing breast cancer. The breast cancer risk variant she carries in her DNA has been identified in a GWAS study.

To achieve better statistical confidence, and therefore more precise findings, large-scale (a.k.a. federated GWAS) are performed by leveraging data from multiple genome data centres from around the globe. In such a collaborative setting, both individually private data from participants and the inputs shared by the institutions (e.g., biocentres that sequenced individuals’ DNA) for federated processing should be protected. Usually, federated systems leverage cryptographic solutions such as Homomorphic Encryption (HE), Secure Multiparty Computation (SMC) and Trusted Execution Environment (TEE) to allow secure and privacy-preserving processing of the data for computing GWAS statistics. Regrettably, enforcing privacy-preserving processing solutions is not enough. If not treated with proper care, when published, the results of GWAS might be used by adversaries to launch genomic privacy attacks. Furthermore, existing solutions for privacy-preserving releases only work in a static fashion (i.e., they do not assume that GWAS statistics can be updated as soon as more genome data of new individuals become available, which is becoming real since DNA sequencing prices have been decreasing exponentially).

In my Ph.D., I offered mechanisms to combine privacy-preserving processing and releasing of GWAS, while considering new practical properties and stronger adversarial models. For example, in my first paper (DyPS [1]), we show how GWAS statistics can be safely updated as new genomes are sequenced and added to the federation and when individuals ask to be removed from studies (to comply with data-privacy regulations, such as GDPR). In addition, we showed how safe releases can take place when some federation members are colluding to mount genomic privacy attacks against other honest members’ data.

Next, we identified new privacy conditions to allow private releases of dynamic GWAS considering interdependent privacy, i.e., assuming that a GWAS federation conducts multiple studies that potentially share genomes at the same time. We showed that overlapping regions of releases (e.g., individuals participating in more than one study simultaneously) can be used by adversaries to circumvent existing privacy-protection mechanisms in place. Therefore, in I-GWAS [2], we analysed and offered the conditions when interdependent GWAS releases can occur safely.

Last but not least, we offered GenDPR [3], which offers an approach to support aforementioned properties in a fully distributed environment where federation members do not need to exchange actual genome data for collaborative studies anymore.

Why is this topic crucial to be addressing now?

Protecting individuals’ private data is key to developing a better world. Especially when most people are not fully aware of the impact that if their genome information is leaked, it could compromise their life. For example, some insurance companies might decide not to cover a particular person if their predisposition to a rare disease is discovered. In addition, genome data cannot be revoked like credit cards and credentials, meaning that once genomic privacy is compromised, it’s out there forever. Therefore, paving the way for the future of GWAS by enabling practical properties of the 21st century (e.g., allowing dynamic updates of GWAS statistics and enabling individuals to withdraw consent at any time) is a substantial contribution.

How do you think your research results could be applied in the real-world?

In my research, I focused on offering practical and usable mechanisms so that their adoption by industry or other researchers can be facilitated. The results of my research shows that the technologies used (distributed Trusted Execution Environments (TEEs) and statistical inference methods) can enable dynamic and interdependent privacy-preserving GWAS in a practical and scalable manner.

Why did you choose this subject?

During my Bachelor and Master studies, I developed an expertise in network security and machine learning algorithms. I managed to publish several papers on mitigating denial-of-service attacks and hardening of software-defined networking (SDN) architectures in peer-reviewed venues.

In my Ph.D., on the other hand, apart from addressing some security issues to enable secure federated processing of genomic data, my research dealt with data privacy protection aspects, which was new to me at the time. Merging two computer science fields (security and privacy) that I am fond of was a perfect match. Additionally, I believe that genomics-related privacy issues are still at an early stage (society is not fully aware of these problems yet) and will have a significant impact in the future.

What do you think comes next for this research subject?

I strongly believe that the next steps are to create new privacy-preserving releasing mechanisms that can accomplish better release utility, while protecting releases.

Nowadays, existing approaches either limit the size of the data that can be used to generate a private release or utilise perturbation mechanisms (e.g., using differential privacy) to create differentially private releases that protect participants’ information. However, for more precise results and therefore better findings, we need to rely on the largest data possible and ideally without any accuracy loss. Therefore, in my opinion, designing hybrid mechanisms that can leverage the best of both approaches is the way to go in the future.

Now that your Ph.D. is complete what does your future hold?

I am leaving academia for the time being to join the industry sector as of next month as a cybersecurity consultant in a Luxembourg-based company. However, I will undoubtedly keep conducting research and writing research papers during my spare time as a pretence to continue doing science, which is a task I love and find motivating.

[1] Pascoal, T., Decouchant, J., Boutet A., Verissimo, P. “DyPS: Dynamic, Private and Secure GWAS”. Published in PoPETs 2021.

[2] Pascoal, T., Decouchant, J., Boutet A., Voelp, M. “I-GWAS: Privacy-Preserving Interdependent Genome-Wide Association Studies”. Accepted (to appear) in PoPETs 2023.

[3] Pascoal, T., Decouchant, J., Voelp, M. “Distributed and Secure Assessment of Privacy-Preserving Releases of GWAS”. Accepted (to appear) in Middleware 2022.

Share this