Genomics Data Lake eBook
Scalability in Genomics starts with a performant, secure, and collaborative space to store your data.
Learn how Azure Data Lake can enhance your Genomics Pipelines
Centralizing your data in a data lake has the potential to help scale and automate bioinformatics pipelines (including secondary and tertiary analyses and machine learning) in cloud.
Azure Data Lake is touted as a limitless service place for storing your data. It provides the ability to store and organize petabyte-size files and connect to distributed computing resources (like Azure Databricks) with ease. In addition, Data Lake offers enterprise-grade security and role-based access controls.
In our Building a Genomics Data Lake in Azure eBook, the BlueGranite team has outlined our tips for organizing, securing, and using your genomics data in Azure.
Features of Azure Data Lake
- Azure Data Lake helps to eliminate data silos with a single storage platform. Organize your data by data type, maturity, or purpose to give users greater flexibility to collaborate and perform impactful analyses.
- Set access to resources in the data lake by a user's role, giving you total control over who sees what data. Data Lake also keeps your data encrypted at-rest and in-flight - Perfect for scenarios requiring HIPAA-compliance.
- Connect to Azure Data Factory to orchestrate your bioinformatics pipelines at scale using compute services such as Azure Databricks and Azure Machine Learning.