Cost-effective S3 Buckets

A few cost-saving tips on configuring S3 buckets

C05348A3-9AB8-42C9-A6E0-81DB3AC59FEB
           

Even though AWS Simple Storage Service (S3) is an extremely reliable, durable, and cost-effective Cloud Storage solution, it can sometimes result in unexpectedly high bills. Here are a few tips to help you minimize your Cloud costs.

AWS S3

Amazon Simple Storage Service (Amazon S3) is an object storage service that offers industry-leading scalability, data availability, security, and performance. The best features of S3 are that it offers a nearly unlimited amount of storage at a very low cost and that S3 is designed for 99.999999999% (11 9's) of durability, in other words, they will not lose your data. This is a pretty huge step compared to storing data on a hard drive at home, a single drive failure can be devastating!

There are a few interesting features of S3 that are worth noting, and that I configure on almost all my buckets.

S3 Intelligent-Tiering

S3 Intelligent-Tiering is the only cloud storage class that delivers automatic storage cost savings when data access patterns change, without performance impact or operational overhead. The Amazon S3 Intelligent-Tiering storage class is designed to optimize storage costs by automatically moving data to the most cost-effective access tier when access patterns change. For a small monthly object monitoring and automation charge, S3 Intelligent-Tiering monitors access patterns and automatically moves objects that have not been accessed to lower-cost access tiers.

S3 Intelligent-Tiering is the ideal storage class for data with unknown, changing, or unpredictable access patterns, independent of object size or retention period. You can use S3 Intelligent Tiering as the default storage class for virtually any workload, especially data lakes, data analytics, new applications, and user-generated content.

Sounds great? However, be aware that by moving data to S3 Intelligent-Tiering you will incur a small charge for every object, so it may not be a good fit for every use case, especially if you have a large number of small files.

For additional details please see https://aws.amazon.com/blogs/aws/new-automatic-cost-optimization-for-amazon-s3-via-intelligent-tiering/

Avoiding Hidden Costs from Unfinished Uploads

I, unfortunately, learned this the hard way. AWS will keep part of the files resulting from failed Multipart Uploads...forever. You are not easily able to see these files, but you will definitely be charged for them.

But what are Multipart Uploads, you may say, I can't be using it if I don't even know what that is!

What are MultiParts uploads?

S3’s multipart upload feature accelerates the uploading of large (>5Mb) objects by allowing you to split them up into logical parts that can be uploaded in parallel. If you upload a large file via the AWS Web console, it could be using multipart upload. It’s more efficient, allows for resumable uploads, and — if one of the parts fails to upload — the part is re-uploaded without stopping the upload progress.

The AWS CLI's s3 cp, as well as other aws s3 commands that involve uploading objects into an S3 bucket (for example, aws s3 sync or aws s3 mv) also automatically perform a multipart upload when uploading large files.

If you initiate a multipart upload but never finish it, or your upload is interrupted, the multipart upload will not get marked as complete, and the already-uploaded parts will linger in the bucket, occupy storage space, and incur storage charges.

You will not see these in-progress objects using the AWS S3 high-level command or in the web console. You have to use s3api sub-commands to identify these parts and clean them up. Depending on the type of S3 storage used, and size of the parts, and the number of failed attempts and retries, the cost of these parts could be anywhere from a few dollars to several hundred per month.

Finding MultiParts

Step1:

See if your bucket has any in-progress multi-part uploads. In the following example, we are logged into AWS, using a profile called saml- pub for our keys and tokens. And we are looking at a bucket called acme s3-test-bucket:

aws s3api list-multipart-uploads --bucket s3-test-bucket

If the command returns output, there are multipart objects. Take a note of the initiated date and time. If that date is recent, someone could be in the process of uploading a large object to the bucket. But if the date is from several days (weeks or months) ago, then you could be looking at one or more multipart upload that never completed cleanly.

Step2:

For Each UploadId, you can list the parts and get their size. In the above example, for the first upload ID, there are 27 parts of 8.3M

each. For the 2nd one (which is not included here), there are 32 parts that are each 8.3M. So about 224M + 265M = 489M of unused objects for a bucket that is seemingly empty.

The output of the above command has been truncated in the middle as it is just an example:


aws s3api list-parts --bucket s3-test-bucket --key 'tmpfile.1048576.c8pmuT' --upload-
id 9.SZ6YhcDeJmLV9KKTnYkk3zHW4jM2cgsBCUezjx5IyriVeZH6GGmzcV4KX4IVtF.
6gKZrQG5hJS1Ebe9RILgYtmC_myEsljmT2afcJGkCex6L55lPgfOwYGjR5lWJNdmEkiUIk_e00U9b0oYIYZWhNVxj6SFTjQ3qKoiapM5zw-

Automatic Cleanup of Incomplete Uploads

As a best practice, you can enable this setting even if you are not sure that you are actually making use of multipart uploads. Some applications will default to the use of multipart uploads when uploading files above a particular, application-dependent, size.

One way to remove old in-progress multipart upload parts is to set up a rule for incomplete multipart uploads.

This can be done in the AWS console by:

  1. clicking on the "Management" tab of your S3 bucket, select "Create lifecycle rule".
  2. Under "Lifecycle rule actions", select "Delete expired delete markers or incomplete multipart uploads",
  3. and enter the number of days you want to keep these objects.

As you set up the rule, decide what is the appropriate/desired expiration period. For example, if you set it to 1 day, and you have a slow and large upload, which may not finish in one day, you could disrupt a good multipart upload that is in progress. Perhaps 7 days could be a better value. This way, the in-progress, multipart uploads that started 7 days ago and did not finish, will get deleted.

You can configure this rule in the CloudFormation or Terraform templates that you use to create your S3 bucket.

Here is an example CloudFormation template to create an S3 bucket with Encryption, Public Access Block, and lifecycle rules for failed MultiParts uploads, as well as moving files to Intelligent Tiering:

Parameters:
  bucketname:
    Type: String
    Default: my-bucket-name
    Description: Domain Name
Resources:
  s3bucket1F310132:
    Type: AWS::S3::Bucket
    Properties:
      AccessControl: Private
      BucketEncryption:
        ServerSideEncryptionConfiguration:
          - ServerSideEncryptionByDefault:
              SSEAlgorithm: AES256
      BucketName:
        Ref: bucketname
      LifecycleConfiguration:
        Rules:
          - AbortIncompleteMultipartUpload:
              DaysAfterInitiation: 1
            ExpiredObjectDeleteMarker: true
            Id: multipart
            Status: Enabled
          - Id: IA
            Status: Enabled
            Transitions:
              - StorageClass: INTELLIGENT_TIERING
                TransitionInDays: 0
      PublicAccessBlockConfiguration:
        BlockPublicAcls: true
        BlockPublicPolicy: true
        IgnorePublicAcls: true
        RestrictPublicBuckets: true
      Tags:
        - Key: Project
          Value: Website Images
        - Key: stackName
          Value: S3CloudfrontStack

Here is the AWS CDK (Python) code that can also be used for this purpose:

myBucket = aws_s3.Bucket(
    self,
    's3_bucket',
    bucket_name = 'my-bucket-name',
    encryption=aws_s3.BucketEncryption.S3_MANAGED,
    access_control=aws_s3.BucketAccessControl.PRIVATE,
    public_read_access=False,
    block_public_access=aws_s3.BlockPublicAccess.BLOCK_ALL,
    # removal_policy=RemovalPolicy.DESTROY,
    removal_policy=RemovalPolicy.RETAIN,
    auto_delete_objects=False,
    lifecycle_rules = [
        aws_s3.LifecycleRule(
            id="multipart",
            enabled=True,
            abort_incomplete_multipart_upload_after=Duration.days(1),
            expired_object_delete_marker=True,
        ),
        aws_s3.LifecycleRule(
            id="IA",
            enabled=True,
            transitions=[{
                "storageClass": aws_s3.StorageClass.INTELLIGENT_TIERING,
                "transitionAfter": Duration.days(0)
            }]
        )
        ]
)
Posted Comments: 0

Tagged with:
AWS