Skip to main content

In the healthcare industry, it is crucial to ensure the privacy and security of patient data. This often requires de-identifying the data and storing it separately from any identifiable information. AWS (Amazon Web Services) Glue offers a solution for healthcare providers to de-identify and ingest their data into S3 (Simple Storage Service). In this article, we will explore how AWS Glue can be used to accomplish this task while maintaining the highest level of security.

  • To start the process, the healthcare provider needs to create a Glue job. This job reads the patient records from the source database using a Glue connection. The connection is also utilized by a Glue crawler, which populates the Glue data catalog with the schema of the source database. Once the data catalog is populated, the provider can proceed with de-identifying the data. AWS Glue offers various transformations to strip personally-identifiable attributes from the data. This ensures that sensitive information is removed, and only non-identifiable information remains.
    After de-identification, the data is stored in an S3 bucket separate from the one containing identifiable information. This separation allows the healthcare provider to enforce additional restrictions on the bucket containing sensitive information through S3 bucket policies.
  • AWS is committed to helping healthcare providers address their security challenges. To date, AWS has released 51 HIPAA-eligible services, with more services in the process of becoming HIPAA-eligible. These services enable customers to build solutions that comply with HIPAA (Health Insurance Portability and Accountability Act) security and auditing requirements. With AWS Glue, healthcare providers can leverage a HIPAA-eligible service to de-identify and ingest their healthcare data into S3 securely.
  • AWS offers a range of security measures to protect the healthcare data stored in S3.
    Encryption: S3 provides settings to enable default encryption on a bucket, ensuring that all objects stored in the bucket are encrypted. This helps to safeguard the data from unauthorized access.
  • S3 offers object level logging, capturing all API calls made to the objects. These logs are consolidated in CloudTrail, facilitating easy access and monitoring. Additionally, S3 supports events to proactively alert customers of any read and write operations.
    Access Control: S3 bucket policies and IAM (Identity and Access Management) policies can be utilized to restrict access to the bucket containing sensitive healthcare data. Multi-factor authentication can be enforced to provide an extra layer of security.
    By implementing these security measures, healthcare providers can ensure the confidentiality, integrity, and availability of their data while complying with regulatory requirements.
  • Using Amazon S3 as the central storage layer offers various benefits for healthcare providers. S3 provides efficient data storage with high durability and scalability.
    By utilizing the storage management features of Amazon S3, healthcare providers can access operational metrics on their data sets and transition them between different storage classes. This enables cost optimization, as data can be moved to lower-cost storage classes based on usage patterns. Additionally, tagging objects on Amazon S3 allows healthcare providers to create a governance layer, granting role-based access to objects using Amazon IAM and Amazon S3 bucket policies. This provides further control and security over the stored data.


AWS Glue provides healthcare providers with a secure and efficient solution for de-identifying and ingesting their healthcare data into S3. By following the steps outlined in this article, healthcare providers can ensure the protection of patient privacy while leveraging the benefits of AWS services. With AWS continually expanding its range of HIPAA-eligible services, healthcare providers can confidently build solutions that comply with regulatory requirements and address their security challenges. By combining AWS Glue with the security measures offered by S3, healthcare providers can securely store and manage their healthcare data while optimizing costs and ensuring compliance. By leveraging AWS Glue and the wide range of AWS services, healthcare providers can focus on delivering top-quality care while ensuring the privacy and security of their patient data.

Integrate People, Process and Technology