Scientists always promise anonymity in relying on volunteers to donate their DNA for analysis in quests for new treatments and causes of disease.
But hackers – or almost anyone equipped with a computer and Internet connection – has the potential to reveal the identities of volunteers, a group of researchers has found.
Whitehead Institute geneticist Yaniv Erlich detailed how last year, he and his team discovered the identities of anonymous male Utah residents who had participated in genome sequencing projects.
“Surnames can be recovered from personal genomes and we can use that to breach the privacy of participants in the database,” said Erlich at the American Association for the Advancement of Science summit in Chicago.
For Erlich’s test, the team began by analyzing unique genetic markers on Y chromosomes, since males give their surnames as well as inherited genetic traits to offspring. Next, they used free online databases to discover the surnames of the volunteers, followed up by searches in record databases to determine the individuals’ full names.
The ability to link DNA and identity pose risks. The 2008 Genetic Information Nondiscrimination Act was enacted to protect against the use of genetic information in health insurance or employment. Other forms of discrimination may exist, though, including barring people from obtaining life insurance based on their DNA.
Erlich’s research has spurred others to analyze the way genetic information can be protected, from establishing technical mechanisms to new privacy policies.
One possible solution to the problem is the use of homomorphic encryption, said Kristin Lauter, principal researcher for the cryptography group at Microsoft Research. The method is novel in that it allows web servers to process data without decrypting it.
“The science of cryptography is that when some other observer looks at these encrypted bits, they shouldn’t be able to tell anything about that information,” she said.
Lauter acknowledged, though, that the technology is limited since it’s still in the beginning phase of development.
“The real issues that we’re facing are issues of scalability and efficiency and things like that,” she said.
John Wilbanks, of Seattle-based nonprofit Sage Bionetworks, discussed the overriding need to change policies and strategies pertaining to online genetic codes.
Part of the concern over risk might relate to detaching volunteers from useful findings.
In many cases, individuals’ genetic sequences are eventually used for research, but they never have a researcher explain the results thoroughly, he said. One way to have people care less about the risk of having their gene sequences be disclosed is to increase the personal value people get from volunteering.
“If we think about the return of this data back to the people who consent to the studies, the risk tolerance goes up,” he said.