AWS S3 Operations Best Practices !

Amazon Simple Storage Service is object storage service offered by Amazon Web Services .

Amazon S3 has a simple web services interface that you can use to store and retrieve any amount of data, at any time, from anywhere on the web.It gives access us highly scalable, reliable, fast, inexpensive data storage infrastructure.

This article explains the operational best practices in S3 to get the maximum object storage benefits.

1.High Availability

Cross Region Replication : Single point of failure in S3 can be taken care by enabling the Cross region replication this feature will copy the data from one region bucket to another region’s bucket based on the bucket operations.

Here are a few things to keep in mind as you start to think about how to make use of Cross-Region Replication in your own operating environment.

Versioning enable S3 versioning for the source and destination buckets.

Lifecycle Rules You can choose to use Lifecycle Rules on the destination bucket to manage older versions by deleting them or migrating them to Amazon Glacier.

Determining Replication Status – You (or your code) can use the HEAD operation on a source object to determine its replication status. You can also view this status in the Console.

Region-to-Region – Replication always takes place between a pair of AWS regions. You cannot use this feature to replicate content to two buckets that are in the same region.

2. Performance Optimization

Choosing right region:

Choosing a Region is very important in the performance of S3 buckets , we need to Co-locating with computing, other AWS resources to S3 bucket to get maximum performance by reducing the network latency.

Object Keys Naming convention

Pay Attention to Your Naming Scheme If:

You want consistent performance from a bucket
You want a bucket capable of routinely exceeding 100 TPS (Transactions Per Second)

Distributing the Key names
Don’t save your object's key name starts with a date or standard key names ( Diagram 1), it improves complexity in the S3 indexing and will reduce performance, because based on the indexing objects saves in the single storage partition (diagram 2).Amazon S3 maintains keys lexicographically in its internal indices.

alt alt

Add randomness to the beginning of the key name , Add additional prefixes to help to sort

alt

Other Techniques for Distributing Key Names
Store objects as a hash of their name add the original name as metadata
Add the reverse Epoch time value as the prefix to object keys

Transfer Acceleration

This new feature accelerates Amazon S3 data transfers by making use of optimized network protocols and the AWS edge infrastructure. Improvements are typically in the range of 50% to 500% for cross-country transfer of larger objects, but can go ever higher under certain conditions.
We can use the transfer acceleration if there, considerably higher data transfer is required for cloud migration etc.

Amazon Cloudfront

  • Amazon CloudFront can be used for performance optimization and can help by distributing content with low latency and high data transfer rate.
  • caching the content hence reducing the number of the direct requests to Amazon S3
  • providing multiple endpoints (Edge locations) for data availability
  • available in two flavors as Web distribution or RTMP distribution

Also enabling the enhanced networking features to the computing units and Parallel GET , LIST Requests will improve the performance of S3.

3.Security

S3 High-level Security can be achieved using the below guidelines

  • ACL, S3 bucket policy
  • Signed URL’s for sharing the object
  • S3 Encryption
  • VPC End points for S3
  • Versioning

ACL & Bucket policy

When you create a bucket or an object, Amazon S3 creates a default ACL that grants the resource owner full control over the resource. Creating the customized ACL to define Access control for allowed actions and resourced . In Bucket policy we can define the granular level of permission to the objects also we can define IP address restriction if required.

Signed urls for sharing the Objects :

S3 URL to provide limited public access with expiry, this can be done using API’s

alt

small example using python API.

import boto
conn = boto.connects3()
conn.generate
url(3600, 'GET', bucket='testbucket-sep18', key=‘windows.pem')

Output:

'https://aws-­testbucket-sep18.s3.amazonaws.com/windows.pem? Signature=hEBUPczy8DXCyqTz1JHgEaihvMo %3D&Expires=1431697820&AWSAccessKeyId=AKIAI65L23YDGKGQTRFA'

This output URL has an expiry of 3600 seconds for the object windows.pem in the bucket name testbucket-sep18.

S3 Encryption

You have the following options for protecting data at rest in Amazon S3 , based on our operational compliance we have to select encryption model.

  • Server-Side Encryption

Server-side encryption is about data encryption at rest that is, Amazon S3 encrypts your data at the object level as it writes it to disks in its data centers and decrypts it for you when you access it. As long as you authenticate your request and you have access permissions, there is no difference in the way you access encrypted or unencrypted objects. Amazon S3 server-side encryption uses one of the strongest block ciphers available, 256-bit Advanced Encryption Standard (AES-256), to encrypt your data.

  • Client - Side Encryption

You can encrypt data client-side and upload the encrypted data to Amazon S3. In this case, you manage the encryption process, the encryption keys, and related tools.

VPC Endpoints for S3

VPC endpoints are easy to configure, highly reliable, and provide a secure connection to S3 that does not require a gateway or NAT instances.EC2 instances running in private subnets of a VPC can now have controlled access to S3 buckets, objects and API functions that are in the same region as the VPC.

we can use an S3 bucket policy to indicate which VPCs and which VPC Endpoints have access to your S3 buckets, using this feature we can perform regular data transfer to S3 buckets from private subnet instances.

Versioning

Versioning provides an additional layer of protection for your S3 objects. You can easily recover from unintended user errors or application failures. You can also use Versioning for data retention and archive. Once you have enabled Versioning for a particular S3 bucket, any operation that would have overwritten an S3 object (PUT, POST, COPY, and DELETE) retains the old version of the object. Here’s a simple diagram of Versioning in action

alt

The actual version ids are long strings. You can retrieve the most recent version of an object by making a default GET request or you can retrieve any version (current or former) by making a version-aware request and including a version id. In effect, the complete key for an S3 object in a versioned bucket now consists of the bucket name, the object name, and the version id.

S3’s DELETE operation works in a new way when applied to a versioned object. Once an object has been deleted, subsequent default requests will no longer retrieve it. However, the previous version of the object will be preserved and can be retrieved by using the version id. Only the owner of an S3 bucket can permanently delete a version.

Normal S3 pricing applies to each version of an object. You can store any number of versions of the same object, so you may want to implement some expiration and deletion logic if you plan to make use of this feature

Additional security can be ensured by enabling MFA (Multi-Factor Authentication) to restrict accidental object deletion.

4.Cost Optimization

Region Selection

The s3 pricing model is region specific US Standard, the most established AWS region, is the cheapest,but this not the only factor to decide the region we need to co-locate the EC2 and other AWS resources in the same region because there are no data transfer charges when we do object's request within the region.

Lifecycle Policy

We can Optimize S3 storage cost by selecting an appropriate storage class for objects, You can reduce your costs by setting up S3 lifecycle policies that will transition your data to other S3 storage tiers or expire data that is no longer needed. Amazon S3 also offers configurable lifecycle policies for managing your data throughout its lifecycle. Once a policy is set, your data will automatically migrate to the most appropriate storage class without any changes to your application. Amazon S3 offers a range of storage classes designed for different use cases.

S3 Standard for general-purpose storage of frequently accessed data

S3 Standard - Infrequent Access for long-lived, but less frequently accessed data

Amazon Glacier for long-term archive.

In lifecycle policy we can define the object expiry When an object reaches the end of its lifetime, Amazon S3 queues it for removal and removes it asynchronously. There are additional cost considerations if you put lifecycle policy to expire objects that have been in STANDARD_IA for less than 30 days, or GLACIER for less than 90 days, we can enable the incomplete multipart delete rules this will remove incomplete upload objects.

Manage Cost Allocation using S3 Tags:

A tag is a key-value pair that represents a label that you assign to a bucket. In your AWS bill, costs are organized by tags that you define. We have to create a standard way to categorize the bucket based on usage , projects etc. Refer below example for S3 Tags.

alt

We have followed the standard naming conventions for tagging the keys and values , so in the AWS bill we can categorize the buckets and objects based on the key values ex: we can filter and identify how much we are spending for Operations team sandbox environment .

5. Monitoring S3

Enable the event notification

Event notification features for S3. The bucket owner (or others, as permitted by an IAM policy) can now arrange for notifications to be issued to Amazon Simple Queue Service (SQS) or Amazon Simple Notification Service (SNS) when a new object is added to the bucket or an existing object is overwritten.Notifications can also be delivered to AWS Lambda for processing by a Lambda function.Event notification will give the better control of the bucket level operations

alt

Enable the Audit logs & cloud watch integration

Amazon S3 is integrated with CloudTrail, a service that captures specific API calls made to Amazon S3 from your AWS account and delivers the log files to an Amazon S3 bucket that you specify. CloudTrail captures API calls made from the Amazon S3 console or from the Amazon S3 API.

Using the information collected by CloudTrail, you can determine

  • What request was made to Amazon S3
  • source IP address from which the request was made
  • who made the request
  • when it was made etc.

This information helps you to track changes made to your AWS resources and to troubleshoot operational issues. CloudTrail makes it easier to ensure compliance with internal policies and regulatory standards.
We can also configure cloud watch alarm for specific change event made ex: bucket deletion,object deletion etc.

Monitor objects cloud watch

You can use Amazon CloudWatch to monitor your Amazon S3 buckets, tracking metrics such as object counts and bytes stored.You can receive notifications or take automated actions by setting Amazon CloudWatch alarms on any of the Amazon S3 metrics.

For example, we can create an alarm for the above criteria will send out notifications.

  • When a specific Amazon S3 bucket crosses the threshold like number of objects, are more than 1000
  • bytes stored in bucket crossed the threshold of 1024 bytes

References

AWS Summit S3 deep dive sessions and S3 master class sessions.

https://www.youtube.com/watch?v=VC0k-noNwOU
https://www.youtube.com/watch?v=1TvJCLl9NNg
https://www.youtube.com/watch?v=bi6LnfFTChk