Case Western Reserve computer and data sciences researcher improving privacy for global genomic data sharing network, supported by $1.2 million NIH grant

A Case Western Reserve University computer and data sciences researcher is working to shore up privacy protections for people whose genomic information is stored in a vast global collection of vital, personal data.

photo of Erman Ayday
Erman Ayday

Erman Ayday, assistant professor of computer and data sciences at the Case School of Engineering, was recently awarded a four-year, $1.2 million grant from the National Institutes of Health’s National Library of Medicine to pursue novel methods for identifying and analyzing privacy vulnerabilities in the genomic data sharing network known commonly as “the Beacons.” 

Personal genomic data refers to each person’s unique genome, his or her genetic makeup, information that can be gleaned from DNA analysis of a blood test, or saliva sample. 

Genomics may sometimes be confused or conflated with genetics, but the terms refer to related, but different fields of study:

  • Genetics refers to the study of genes and the way that certain traits or conditions are passed down from one generation to another. It involves scientific studies of genes and their effects. Genes (units of heredity) carry the instructions for making proteins, which direct the activities of cells and functions of the body. 
  • Genomics is a more recent term that describes the study of all of a person’s genes (the genome), including interactions of those genes with each other and with the person’s environment. Genomics includes the scientific study of complex diseases such as heart disease, asthma, diabetes, and cancer because these diseases are typically caused more by a combination of genetic and environmental factors than by individual genes. 

Ayday plans to identify weaknesses in the beacons’ infrastructure, developing more complex algorithms to protect against people or organizations who share his ability to figure out one person’s identity or sensitive genomic information using publicly available information. 

Doing that will also protect the public—people who voluntarily shared their genomic information to hospitals where they were treated—with the understanding that their  identity or sensitive information would not be revealed.

“While the shared use of genomics data is valuable to research, it is also potentially dangerous to the individual if their identity is revealed,” Ayday said. “Someone else knowing your genome is power—power over you. And, generally, people aren’t really aware of this, but we’re starting to see how genomic data can be shared, abused.”

Other research has shown that “if someone had access to your genome sequence—either directly from your saliva or other tissues, or from a popular genomic information service—they could check to see if you appear in a database of people with certain medical conditions, such as heart disease, lung cancer, or autism.”

Human genomic research

There has been an ever-growing cache of genomic information since the conclusion of the Human Genome Project in 2003, the 13-year-long endeavor to “discover all the estimated 20,000-25,000 human genes and make them accessible for further biological study” as well as complete the DNA sequencing of 3 billion DNA subunits for research.

Popular genealogy sites such as Ancestry.com and 23andMe rely on this information–compared against their own accumulation of genetic information and analyzed by proprietary algorithms—to discern a person’s ancestry, for example.

Ayday said companies, government organizations and others are also tapping into genomic data. “The military can check the genome of recruits, insurance companies can check whether someone has a predisposition to a certain disease,” he said. “There are plenty of real life examples already.”

Scientists researching genomics are accessing shared DNA data considered critical to advance biomedical research. To access the shared data, researchers send digital requests (“queries”) to certain beacons, each specializing in different genetic mutations.

What are ‘the Beacons?’

The Beacon Network is an array of about 100 data repositories of human genome coding, coordinated by the Global Alliance for Genomics & Health (in collaboration with a Europe-based system called ELIXIR). 

And while “queries do not return information about single individuals,” according to the site, a scientific study in 2015 revealed that someone could infer the membership of a particular beacon by sending that site an excessive number of queries.

“And then we used a more sophisticated algorithm and showed that you don’t need thousands of queries,” Ayday said. “We did it by sending less than 10.”

Then, in a follow-up study, Ayday and team showed that someone could also reconstruct the entire genome for an individual with the information from only a handful of queries.

“That’s a big problem,” Ayday said. “Information about one, single individual should not be that easily found out.”


For more information, contact Mike Scott at mike.scott@case.edu