If you use CloudFront to serve static assets, you can setup CloudFront to send logs to an S3 bucket. Once the logs are in an S3 bucket, you can use Athena to query the logs. This gives you greater visibility into what requests are being made to your CloudFront distributions.
Assuming that I have already setup a
cloudfront_logs table in Athena, I can query for logs from the last week, ordered by date, with a query like this:
SELECT * FROM cloudfront_logs WHERE "date" >= CURRENT_DATE - interval '7' day ORDER BY "date" desc, time desc LIMIT 100;
If I wanted to know how many requests were made to my CloudFront distribution each day last week, I could use a query like this:
SELECT "date", COUNT(*) AS count FROM cloudfront_logs GROUP BY "date" ORDER BY "date" desc LIMIT 100
Maybe I want to know which IP addresses have most frequently made requests to a CloudFront distribution. In that case I could use a query like:
SELECT "request_ip", COUNT(*) AS count FROM cloudfront_logs GROUP BY "request_ip" ORDER BY count desc LIMIT 100
What if we wanted to see how many cache hits we had compared to cache misses. A query like the following would provide that information:
SELECT SUM(CASE WHEN cl.result_type = 'Miss' THEN 1 ELSE 0 END) AS MissCount, SUM(CASE WHEN cl.result_type='Hit' THEN 1 ELSE 0 END) AS HitCount FROM cloudfront_logs cl
These are relatively simple queries, but they do show how powerful Athena queries can be for CloudFront log analysis.