Amazon Web Services announced that genomic information of 1,700 individuals has been placed in its public cloud and can be accessed by anyone in the world.
The 200 terabytes of data are part of the 1000 Genomes Project, sponsored by the National Institutes of Health in partnership with more than 75 companies and organisations. The goal is to eventually store genomic information from 2,662 individuals from around the world to advance scientific research. Specifically, researchers are looking for genetic variants that have frequencies of greater than 1% across the sample set in an effort to study diseases.
Depositing the genomic information into AWS marks the largest collection of human genetics available worldwide being stored on AWS's servers, the company said. AWS is doing this all for free, but charges users for the supplemental compute power required to analyse the data. AWS says users can, for example, use Hadoop running on AWS's Elastic Cloud Compute (EC2) or Elastic MapReduce compute services to analyse the data stored in its Simple Storage Service (S3).
Most of the 1,700 genomic datasets are from anonymous individuals, and the 1000 Genomes project has an ethics standard, which requires informed consent for participants. Already, the project has collected data samples from populations around the world including: Utah residents with Northern and Western European ancestry, people with Chinese heritage in Denver, Mexican heritage in Los Angeles and African heritage in the Southwestern United States.
The announcement was made as part of the Big Data Summit being held at the White House where US government officials and researchers discussing challenges and opportunities Big Data creates.