Amazon Simple Storage Service or Amazon S3 is one of the earliest and most popular AWS services. Amazon S3 is a web service which is storage for the internet. As the name suggests, it makes web scale computing simpler for everyone using it. Content stored in Amazon S3 can be accessed using the standard HTTP protocol.
Amazon S3 is designed to have 99.99% availability and 99.999999999% of durability of the objects stored in S3.
There are multiple use cases of Amazon S3 including but not limited to File Storage, Back up & Archive and even static website hosting.
While Amazon S3 is used for content storage and distribution, a better way to do content distribution is using another popular service of AWS called Amazon CloudFront.
Amazon CloudFront is a content distribution network (CDN) web service from AWS. Amazon CloudFront speeds up distribution of static and dynamic web content using a worldwide network of edge locations.
When you create a distribution for your content to be delivered using CloudFront, it automatically caches the object to multiple edge locations across the world. When an end user requests this content, the user is routed to the edge location which provides the lowest latency ensuring that the content is delivered with the best possible performance. CloudFront works seamlessly with not just all the AWS services but also with custom origin servers.
Use cases for CloudFront include, delivering streaming media, software distribution and even complete websites with static and dynamic content.
It is very clear that both of these services from AWS are quite popular. Amazon S3 stores more than 2 trillion objects and 1.1 Million requests per second as of April 2013 and is used as primary backbone for popular services like Dropbox, Instagram, and Pinterest among others. Similarly usage of CloudFront is increasing at a fast pace.
A lot of customers using these services want to ensure that they know how and when their content was accessed from S3 or CloudFront.
For example, if as a company I am storing and sharing internal documents on Amazon S3, I need to be sure that after setting up the security policies and access control, my data is not being accessed inadvertently by anyone outside my network.
If I am distributing a media file or software download using CloudFront, the data about from which country my content is most accessed from will help in building a targeted marketing strategy for that country.
Is it possible for AWS or customers using these services to keep a track of all the requests that these services receive?
Yes it is possible; AWS allows users of Amazon S3 and CloudFront to track all the requests by enabling an optional feature to log all the requests. The logs generated by this option get stored in a specified Amazon S3 bucket. Users can then extract files from the bucket for analysis.
The only challenge here is that AWS does not provide any tools to process or analyze this data.
So how do we solve this proble