EC2 DevOps Tips
Install the CloudWatch agent in your VMs to get critical information like available memory.
Host your app servers in the same availability zone as your database, for lower latency and lower cost (inter-AZ traffic within the same region is also charged). And to increase availability, because hosting the app servers and database in two zones means that either zone failure takes down your service.
Use Compute Optimiser to get recommendations regarding over- and under-utilisation.
Don’t use burstable instances, for multiple reasons1.
If you’re using an earlier generation instance, consider upgrading for greater performance and/or lower cost.
Enable termination protection to prevent someone from accidentally deleting your VM.
To ssh into an EC2 instance, you need the .pem file. And modify the security group to allow traffic from your IP address to port 22.
Use Amazon Linux 2 distro, since it’s better integrated with AWS.
If you’re running multiple instances of your app server, use a placement group to reduce correlated failure.
Pay attention to your EC2 instances’ CPU utilisation in peak hours. If it’s high, like 80%, open the load balancer metrics and check if the latency has spiked beyond your target latency. If so, you’ve found a bottleneck that needs to be addressed to ensure continued scalability. Peak latency is the first metric that increases when a system is coming close to its scalability limits. And you want to address it when time is still on your side.
If your VM is underpowered, find out where the bottleneck is — CPU, memory, or maybe it’s not even your EC2 instance? Upgrade that. For example, if you have plenty of CPU, but insufficient memory, upgrade the memory by switching to a high-memory instance that has proportionally more memory for the CPU. Or, conversely, if your CPU is the bottleneck, switch to a high-CPU instance that has proportionally more CPU for the memory 2.
Make sure you’re running supported software, whether the OS image or third-party software. You don’t need to run the latest version, just not one that has reached its end of life.
Their performance suddenly drops when you exhaust the CPU credit balance.
Monitoring becomes harder: if you look at a graph showing a CPU utilisation of 10%, that means the CPU is not a bottleneck, right? Not if you’re entitled to only 10% baseline utilisation, in which case 10% means 100%.
You can receive an unpredictable bill at the end of the month.
A non-burstable instance costs less for the same CPU utilisation.
If you didn’t understand all these points, that’s the point — if you don’t use burstable instances, you don’t need to.
The alternative is to optimise your software to make full use of the resources given, but that’s harder than optimising the hardware to match software needs, since the cloud provider has already done the work.