Radar speed trap

AWS Service Quotas: Discovering Where you Stand

We’ve written a few posts in the last week about AWS Service Quotas.  These are restrictions on services that are set by AWS (but can often be increased).

If our first post, we looked at New Actions in unSkript that can be used to determine quota values and request a quota increase.  In this post, we’ll take the Actions a step further, and build an Action that compares the AWS quota to actual usage – generating an alert when a threshold is met.


Getting Started

To begin, we will need to consider how the Action will work.  For any given service, we’ll need to query AWS at least twice:

  1. Get the Quota Limit.
  2. Determine the usage of a service.

Every query requires one call to complete Step 1.  However, Step two can require many queries to complete the usage query.  In the simplest case, we can do just one query:

Example: Client VPN Endpoints per Region.  If we query AWS for the list of endpoints in a region, we can simply get the length of the response to know how many endpoints exist.

However, there are times where there will be multiple queries:

Example: Routes per Client VPN Endpoint. In the first query, we get the list of VPN endpoints.  In step 2, we must query every VPN endpoint to get the count of Routes.  If there are 4 VPN endpoints, there will be a total of 5 calls made (On call to get the list of 4 VPN endpoints, and then one call to each of the four endpoints).

To account for these two options, we create an input Dictionary.

The Simple, one pass Dictionary

For the Describe AMIs call (only one Usage query is required), the Dict looks like this:

{'QuotaName':'AMIs','ServiceCode':'ec2','QuotaCode': 'L-B665C33B', 'ApiName': 'describe_images', 'ApiFilter' : '[]','ApiParam': 'Images', 'initialQuery': ''},

To get the Quota, we need the ServiceCode and the QuotaCode (If you need to obtain these variables, you can use the unSkript Action, or you can refer to the table in the unSkript Docs).  The one usage API call will be made to the describe_images endpoint, and retrieve a list of Images.  Counting this length gives us our usage.

The Two Pass Dictionary


To determine the Attachments per transit gateway, we must again get the quota from the Service Code and Quota Code.  To get the count of attachments per transit gateway, we us the initalQuery array to make a first query.

The first query probes the describe_transit_gateways endpoint, to give a list of TransitGateways.  In the second set of calls, we call the describe_transit_gateway_attachments endpoint for each transit gateway. The filter has a string VARIABLE that is replaced with the TransitGatewayId for each gateway -ensuring that each call is made to a different transit gateway.  We can then count the length of the response to find out how many attachments are in each transit gateway.  If we have 12 transit gateways. we will have 12 usage reports.

{'QuotaName':'Attachments per transit gateway','ServiceCode':'ec2','QuotaCode': 'L-E0233F82', 'ApiName': 'describe_transit_gateway_attachments', 'ApiFilter' : '[{"Name": "transit-gateway-id","Values": ["VARIABLE"]}]', 'ApiParam': 'TransitGatewayAttachments', 'initialQuery': '["describe_transit_gateways","TransitGateways", "TransitGatewayId"]'},


For most of our quota measurements, these two approaches work well.  However, with over 2600 different quotas inside AWS, not all of them fit neatly into these two buckets. For example Multicast Network Interfaces per transit gateway requires 3 calls: Transit gateways -> Multicast Domains – > Domain attachments.

For others, there is custom code to iterate over.  These require an extra if statement in the code to properly account for their usage.

Action Format

We can differentiate between the two types of query by looking at the ‘initialQuery’ parameter. If it is empty, we can do the Simple query, otherwise, do the double query (with a for loop that queries each initial result).  For outliers, we can add specific code inside the if/else:

(this is simplified a bit from what actually runs):

for i in table: 
    #get quota 
    sq = sqClient.get_service_quota(ServiceCode=i.get('ServiceCode'),QuotaCode=i.get('QuotaCode')) 
    quotaValue =sq['Quota']['Value'] 

    #get usage 
if i.get('initialQuery') = '': res = aws_get_paginator(ec2Client, i.get('ApiName'), i.get('ApiParam'), Filters=filterList) count = len(res) percentage = count/quotaValue combinedData = {'Quota Name': i.get('QuotaName'), 'Limit':quotaValue, 'used': count, 'percentage':percentage} result.append( combinedData) print(combinedData)
res = aws_get_paginator(ec2Client, i.get('ApiName'), i.get('ApiParam'), Filters=filterList) for j in res:
#build the filter query with some simple substitutions
res2 = aws_get_paginator(ec2Client, i.get('ApiName'), i.get('ApiParam'), Filters=filterList)
count = len(res2)
percentage = count/quotaValue
objectResult = {j[initialQueryFilter] : count}

quotaName = f"{i.get('QuotaName')} for {j[initialQueryFilter]}"
combinedData = {'Quota Name': quotaName, 'Limit':quotaValue, 'used': count, 'percentage':percentage}


Action Output

Once all of the values have been collected, the percentage utilized is compared to the warning percentage input. If the utilization is over the requested percentage, the Service data will be added to the output of the Action. With this information, the SRE responsible can decide the correct Action to take – either prune away some usage, or request an increase from AWS.

For example, testing all VPC Service quotas with a earning of 50% utilization gives the following data:

{'Instances': [{'Limit': 20.0,
                'Quota Name': 'VPCs Per Region',
                'percentage': 0.65,
                'used': 13},
               {'Limit': 20.0,
                'Quota Name': 'Internet gateways per Region',
                'percentage': 0.6,
                'used': 12},
               {'Limit': 5.0,
                'Quota Name': 'NAT gateways per Availability Zone',
                'percentage': 0.8,
                'used': 4},
               {'Limit': 50.0,
                'Quota Name': 'Routes per route table',
                'percentage': 0.5,
                'used': 25},
               {'Limit': 20.0,
                'Quota Name': 'Rules per network ACL',
                'percentage': 0.65,
                'used': 13}]}

Availability Today

As we publish this article, we have 2 Actions heading into unSkript:

  1. A general AWS_ServiceQuota Compare Action that has the basic framework described above. This will likely require customization for each Quota you wish to test against.
  2. AWS VPC Service Quota Warning. This Action takes all of the VPC service quotas (as of February 2023) and tests them against your infrastructure.

Coming Soon:

  1. AWS EC2 Service Quota Warning. This Action will test your infrastructure against all EC2 Service Quotas, and warn you if you are approaching the quota threshold.

We’re really excited to see how people use these Service Quota alerts in their infrastructure.  If you have questions – feel free to reach out in our Slack Community.  If you haven’t tried unSkript – try our OSS Docker Container, or use our free trial online!

Share your thoughts