<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://onecloudplease.com/feed.xml" rel="self" type="application/atom+xml" /><link href="https://onecloudplease.com/" rel="alternate" type="text/html" /><updated>2026-03-13T09:47:14+00:00</updated><id>https://onecloudplease.com/feed.xml</id><title type="html">One Cloud Please</title><subtitle>The ramblings of Ian Mckay, a DevOps dude from Australia</subtitle><author><name>Ian Mckay</name></author><entry><title type="html">Bucketsquatting is (Finally) Dead</title><link href="https://onecloudplease.com/blog/bucketsquatting-is-finally-dead" rel="alternate" type="text/html" title="Bucketsquatting is (Finally) Dead" /><published>2026-03-13T00:00:00+00:00</published><updated>2026-03-13T00:00:00+00:00</updated><id>https://onecloudplease.com/blog/bucketsquatting-is-finally-dead</id><content type="html" xml:base="https://onecloudplease.com/blog/bucketsquatting-is-finally-dead"><![CDATA[<p><img src="/images/posts/bucket.jpg" alt="" /></p>

<p>For a decade, I have been working with AWS and third-party security teams to resolve bucketsquatting / bucketsniping issues in AWS S3. Finally, I am happy to say AWS now has a solution to the problem, and it changes the way you should name your buckets.</p>

<h2 id="what-is-bucketsquatting">What is Bucketsquatting?</h2>

<p>Bucketsquatting (or sometimes called bucketsniping) is an issue I first wrote about in 2019, and it has been a recurring issue in AWS S3 ever since. If you’re interested in the specifics of the problem, I recommend you check out my original post on the topic: <a href="https://onecloudplease.com/blog/s3-bucket-namesquatting">S3 Bucket Namesquatting - Abusing predictable S3 bucket names</a>. In short, the problem is that S3 bucket names are globally unique, and if the owner of a bucket deletes it, that name becomes available for anyone else to register. This can lead to a situation where an attacker can register a bucket with the same name as a previously deleted bucket and potentially gain access to sensitive data or disrupt services that rely on that bucket.</p>

<p>Additionally, it is a common practice for organizations to use predictable naming conventions for their buckets, such as appending the AWS region name to the end of the bucket name (e.g. <code class="language-plaintext highlighter-rouge">myapp-us-east-1</code>), which can make it easier for attackers to guess and register buckets that may have been previously used. This latter practice is one that AWS’ internal teams commonly fall victim to, and it is one that I have been working with the AWS Security Outreach team to address for almost a decade now across dozens of individual communications.</p>

<h2 id="a-new-namespace">A new namespace</h2>

<p>To address this issue, AWS has introduced a new protection that works effectively as a “namespace” for S3 buckets. The namespace syntax is as follows:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>&lt;yourprefix&gt;-&lt;accountid&gt;-&lt;region&gt;-an
</code></pre></div></div>

<p>For example, if your account ID is <code class="language-plaintext highlighter-rouge">123456789012</code>, your prefix is <code class="language-plaintext highlighter-rouge">myapp</code>, and you want to create a bucket in the <code class="language-plaintext highlighter-rouge">us-west-2</code> region, you would name your bucket as follows:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>myapp-123456789012-us-west-2-an
</code></pre></div></div>

<p>Though not explicitly mentioned, the <code class="language-plaintext highlighter-rouge">-an</code> here refers to the “account namespace”. This new syntax ensures that only the account that owns the namespace can create buckets with that name, effectively preventing bucketsquatting attacks. If another account tries to create a bucket with the same name, they will receive an <code class="language-plaintext highlighter-rouge">InvalidBucketNamespace</code> error message indicating that the bucket name is already in use. Account owners will also receive an <code class="language-plaintext highlighter-rouge">InvalidBucketNamespace</code> error if they try to create a bucket where the bucket region does not match the region specified in the bucket name.</p>

<p>Interestingly, the <a href="https://aws.amazon.com/blogs/aws/introducing-account-regional-namespaces-for-amazon-s3-general-purpose-buckets/">guidance</a> from AWS is that this namespace is <b><u>recommended to be used by default</u></b>. Namespaces <a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/bucketnamingrules.html#general-purpose-bucket-names">aren’t new</a> to S3, with suffixes like <code class="language-plaintext highlighter-rouge">.mrap</code>, <code class="language-plaintext highlighter-rouge">--x-s3</code>, and <code class="language-plaintext highlighter-rouge">-s3alias</code> all being examples of existing namespaces that AWS previously used for new features; however, this is the first time AWS has introduced a namespace that is recommended for general use by customers to protect against a specific security issue.</p>

<p>It is AWS’ stance that all buckets should use this namespace pattern, unless you have a compelling reason not to (hint: there aren’t many). To this end, AWS is allowing security administrators to set policies that require the use of this namespace through the use of a new condition key <code class="language-plaintext highlighter-rouge">s3:x-amz-bucket-namespace</code>, which can be applied within an Organization’s SCP policies to enforce the use of this protection across an organization.</p>

<p>This doesn’t retroactively protect any existing buckets (or published templates that use a region prefix/suffix pattern without the namespace), but it does provide a strong protection for new buckets going forward (okay, so it’s <em>dying</em>, not dead). If you wish to protect your existing buckets, you’ll need to create new buckets with the namespace pattern and migrate your data to those buckets.</p>

<h2 id="what-about-the-other-cloud-providers">What about the other cloud providers?</h2>

<p>While AWS has introduced this new namespace protection for S3 buckets, the other major cloud providers handle things slightly differently.</p>

<p>Google Cloud Storage already has a namespace concept in place for its buckets, which is based on <a href="https://docs.cloud.google.com/storage/docs/domain-name-verification">domain name verification</a>. This means that only the owner of a domain can create buckets with names that are of a domain name format (e.g. <code class="language-plaintext highlighter-rouge">myapp.com</code>), and they must verify ownership of the domain before they can create buckets with that name. Bucketsquatting is still possible with non-domain name formatted buckets, but the use of domain name formatted buckets is Google’s solution to the issue.</p>

<p>For Azure Blob Storage, <a href="https://learn.microsoft.com/en-us/rest/api/storageservices/naming-and-referencing-containers--blobs--and-metadata#resource-uri-syntax">storage accounts</a> are scoped with a configurable account name and container name, so the same issue does apply. This is further exacerbated by the fact that Azure’s storage account names have a maximum of 24 characters, leaving a fairly small namespace for organizations to work with. <em>(h/t <a href="https://news.ycombinator.com/user?id=vhab">vhab</a> for pointing this out)</em></p>

<h2 id="tldr">tl;dr</h2>

<p>There is a new namespace for S3 buckets. The namespace protects you from bucketsquatting attacks, and you should use it for any S3 buckets you create.</p>

<p>If you liked what I’ve written, or want to hear more on this topic, reach out to me on <a href="https://www.linkedin.com/in/iann0036/">LinkedIn</a> or <a href="https://twitter.com/iann0036">𝕏</a>.</p>]]></content><author><name>Ian Mckay</name></author><summary type="html"><![CDATA[For a decade, I have been working with AWS and third-party security teams to resolve bucketsquatting / bucketsniping issues in AWS S3. Finally, I am happy to say AWS now has a solution to the problem, and it changes the way you should name your buckets.]]></summary></entry><entry><title type="html">MistakenVMtity: Another cloud image confusion attack</title><link href="https://onecloudplease.com/blog/mistakenvmtity-another-cloud-image-confusion-attack" rel="alternate" type="text/html" title="MistakenVMtity: Another cloud image confusion attack" /><published>2025-03-10T00:00:00+00:00</published><updated>2025-03-10T00:00:00+00:00</updated><id>https://onecloudplease.com/blog/mistakenvmtity-another-cloud-image-confusion-attack</id><content type="html" xml:base="https://onecloudplease.com/blog/mistakenvmtity-another-cloud-image-confusion-attack"><![CDATA[<p><img src="/images/posts/cloud-confusion.jpg" alt="" /></p>

<p>Last month, Seth Art from Datadog Security Labs published an excellent post on AWS cloud image confusion attacks. In this post, I’ll explain how Azure has a similar issue with its CLI.</p>

<p>If you haven’t seen <a href="https://securitylabs.datadoghq.com/articles/whoami-a-cloud-image-name-confusion-attack/">the Datadog Security Labs post</a>, I highly recommend you check it out. It’s a great read and provides a lot of context for the issue I’ll be discussing here. They do have the better title pun though.</p>

<h2 id="image-confusion-attacks">Image confusion attacks</h2>

<p>When provisioning virtual machines within the cloud, users typically specify an image to use as the base for the VM. This image is often referred to by a name or ID. In the case of AWS, the image is referred to as an Amazon Machine Image (AMI) and is identified by an AMI ID. In Azure, the image is referred to as a Virtual Machine Image and is identified by a URN which is comprised of a combination of a publisher name, an offer name, a SKU, and a version, all concatenated by a colon (e.g. <code class="language-plaintext highlighter-rouge">Canonical:ubuntu-24_04-lts:server:24.04.202502210</code>).</p>

<p>An image confusion attack occurs when an attacker is able to create an image with a name that matches the search or filter criteria that a user is using to select their intended image. This can lead to the attacker’s image being selected instead of the legitimate image. An attacker will generally create an image that acts just like the legitimate image, but with some additional functionality that can be used to compromise the user’s environment with remote code execution, data exfiltration, or other malicious activities. In the AWS example, this was done using the AWS CLI command <code class="language-plaintext highlighter-rouge">aws ec2 describe-images</code> and Terraform data providers which performed a search for images based on the name or partial name of the image, which could include the attacker’s image.</p>

<h2 id="the-github-example">The GitHub example</h2>

<p>In 2023, I was looking at how GitHub advised deploying its GitHub Enterprise Server offering on Azure. The <a href="https://web.archive.org/web/20230521182616/https://docs.github.com/en/enterprise-server@3.8/admin/installation/setting-up-a-github-enterprise-server-instance/installing-github-enterprise-server-on-azure#creating-the-github-enterprise-server-virtual-machine">documentation at the time</a> advised using the Azure CLI to determine the latest version of the GitHub Enterprise Server image as follows:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ az vm image list --all -f GitHub-Enterprise | grep '"urn":' | sort -V
</code></pre></div></div>

<p>This command would list all the images available in Azure that had an offer name of “GitHub-Enterprise” and then sort them by version number. The user could then select the latest version of the image to use for their deployment. Notably, the command did not filter by publisher name or SKU, only by offer name. This meant that an attacker could create an image with the offer name “GitHub-Enterprise” under their separate publisher identifier and have it appear in the list of images returned by the command. Publisher identifiers are unique in Azure, but not offers, SKUs or versions.</p>

<p>In Azure, to register an image which has a public URN, you list your offering on the Azure Marketplace via the Azure Partner Center. After some KYC checks, you can register any arbitrary publisher identifier. In my case, I registered “ghes” for GitHub Enterprise Server.</p>

<p><img src="/images/posts/partner-center-1.png" alt="" /></p>

<p>I then created an offer with the version number of “99.99.99” to ensure my image would appear as the latest image in the list.</p>

<p><img src="/images/posts/partner-center-2.png" alt="" /></p>

<p>I also selected the option to hide the plan from the Azure Marketplace UI, which would prevent users from more clearly identifying the difference.</p>

<p><img src="/images/posts/partner-center-3.png" alt="" /></p>

<p>This specific offer was not fully published to the Azure Marketplace to avoid direct customer impact to GitHub customers and was instead reported to GitHub. Though GitHub stated that these findings “do not present a significant security risk”, they have since updated their documentation to use a specific filter for the GitHub Enterprise Server image, as follows:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>az vm image list --all -f GitHub-Enterprise | grep '"urn": "GitHub:' | sort -V
</code></pre></div></div>

<p>This change specifically filters the images by the publisher name “GitHub” and the offer name “GitHub-Enterprise”. If you are a provider looking to avoid this issue, I would recommend you follow this pattern in your documentation, or alternatively provide a full list of URNs for your users to select from.</p>

<h2 id="an-extra-step-needed">An extra step needed</h2>

<p>In my testing of Marketplace publication, I found that when executing a deployment of my free marketplace VM image using <code class="language-plaintext highlighter-rouge">az vm create</code>, Azure would initially reject my request to deploy the image. This was because the terms of the Marketplace image was not yet “accepted”.</p>

<p><img src="/images/posts/azurevmreject.png" alt="" /></p>

<p>The user would be required to execute <code class="language-plaintext highlighter-rouge">az vm image accept-terms</code> or <code class="language-plaintext highlighter-rouge">az vm image terms accept</code> to accept the terms of the image before the deployment could proceed. I found this to be initially confusing as images like the base Ubuntu image or the GitHub Enterprise Server image did not require this step. After some investigation and a support ticket, Microsoft confirmed this was an undocumented trait of certain images in the Azure Marketplace. Microsoft stated:</p>

<blockquote style="max-width: 70%;">
  <p style="font-size: 25px;">The GitHub Enterprise Server offering <i>[sic]</i> is a 1PP product (Core Virtual Machine) and not an Azure Virtual Machine(3PP) which are created by 3PP Publishers in-fact Marketplace Partners. Not all the partners in marketplace are allowed to create the 1PP offer and only few approved Marketplace Partners are allowed to create 1PP VM offers. And in the 1PP marketplace offers will be auto accepted the terms and conditions.</p>
</blockquote>

<p>This limits the attack surface of this image confusion attack for Azure, as users would need to accept the terms of the image before deploying it, however many Marketplace images do have the requirement to accept terms before deployment.</p>

<h2 id="the-partial-search-issue">The partial search issue</h2>

<p>Those of you with keen eyes will notice that the updated image search command for the GitHub example uses grep to filter the publisher of the image and not the <code class="language-plaintext highlighter-rouge">--publisher -p</code> argument that exists for the <a href="https://learn.microsoft.com/en-us/cli/azure/vm/image?view=azure-cli-latest#az-vm-image-list">az vm image list</a> command. In fact, the use of the <code class="language-plaintext highlighter-rouge">--publisher</code> flag is what many publishers such as <a href="https://my.f5.com/s/article/K000141027">F5</a>, <a href="https://wiki.almalinux.org/cloud/Azure.html#azure-cli">AlmaLinux</a> and even at one point <a href="https://discourse.ubuntu.com/t/find-ubuntu-images-on-microsoft-azure/18918">Canonical</a> advise their users do to find the latest images for their offerings.</p>

<p>Using only the CLI-provided flags however makes the results still susceptible to the above attack as the <code class="language-plaintext highlighter-rouge">--publisher</code> flag, as well as the <code class="language-plaintext highlighter-rouge">--offer</code> and <code class="language-plaintext highlighter-rouge">--sku</code> flags, are wildcarded by default. This means that if you were to register a publisher with a name that starts with the intended target publisher name, you could still have your image appear in the list of images returned by the command.</p>

<p><img src="/images/posts/azurecli-partial.png" alt="" /></p>

<p>This is the reason why the updated GitHub command uses grep to filter the publisher name.</p>

<p>The partial search seems to only be an issue with specifically the <code class="language-plaintext highlighter-rouge">az vm image list</code> command. Other commands such as <code class="language-plaintext highlighter-rouge">az vm create</code> or <code class="language-plaintext highlighter-rouge">az vm image accept-terms</code> do not have this issue and instead seem to directly concatenate the provided publisher, offer and SKU to form the URN. The same seems to be the case for most Terraform plans as the term <code class="language-plaintext highlighter-rouge">latest</code> can be used in lieu of a version number to deploy the latest image, negating the need for a search data provider.</p>

<h2 id="working-as-intended">Working as intended?</h2>

<p>Similar to the official response from AWS, I believe most providers will consider this to be working as intended. The Azure CLI is a tool that is designed to be used by administrators and developers who are expected to have a certain level of knowledge about the resources they are working with and the burden of ensuring the publisher is correct would generally fall on the user.</p>

<p>However, as we have seen with the GitHub example, this can lead to confusion and potential security risks. Azure removing the partial wildcard nature within the <code class="language-plaintext highlighter-rouge">az vm image list</code> command would mitigate this risk but this would likely be too much of a breaking change to be considered by the Azure team.</p>

<p>If you liked what I’ve written, or want to hear more on this topic, reach out to me on 𝕏 at <a href="https://twitter.com/iann0036">@iann0036</a>.</p>]]></content><author><name>Ian Mckay</name></author><summary type="html"><![CDATA[Last month, Seth Art from Datadog Security Labs published an excellent post on AWS cloud image confusion attacks. In this post, I'll explain how Azure has a similar issue with its CLI.]]></summary></entry><entry><title type="html">Resource Control Policies: Closing the data perimeter gap</title><link href="https://onecloudplease.com/blog/resource-control-policies-closing-the-data-perimeter-gap" rel="alternate" type="text/html" title="Resource Control Policies: Closing the data perimeter gap" /><published>2024-11-17T00:00:00+00:00</published><updated>2024-11-17T00:00:00+00:00</updated><id>https://onecloudplease.com/blog/resource-control-policies-closing-the-data-perimeter-gap</id><content type="html" xml:base="https://onecloudplease.com/blog/resource-control-policies-closing-the-data-perimeter-gap"><![CDATA[<p><img src="/images/posts/noexit.jpg" alt="" /></p>

<p>It’s pre:Invent season, and one of the most consequential identity and access management features was just released by the identity team at AWS. Resource Control Policies, a strong tool for establishing data perimeters, is now available for organization administrators.</p>

<p>This post explores this new feature, how it helps, what its limits are, and what we might see in the future.</p>

<h2 id="intro-to-rcps">Intro to RCPs</h2>

<p><a href="https://docs.aws.amazon.com/organizations/latest/userguide/orgs_manage_policies_rcps.html">Resource Control Policies</a>, or RCPs, is a feature available in AWS Organizations that allows you to control the maximum permissions allowable to certain resources or resource types for accounts within your organization.</p>

<p>Like <a href="https://docs.aws.amazon.com/organizations/latest/userguide/orgs_manage_policies_scps.html">Service Control Policies</a> (SCPs), RCPs are permission policies which represent a <em>boundary</em> of maximum permissions that can be applied within an account. This means that RCPs are policies which cannot grant authority for a certain action and can only deny actions from taking place. This makes it a tool that is likely to be used by organizational administrators who wish to establish strong controls for a data perimeter around sensitive resources within their organization.</p>

<p>To put it in other words, whilst an SCP statement could be described as:</p>

<blockquote style="max-width: 70%;">
  <p style="font-size: 25px;">despite what the policy on the <strong>identity</strong> says, the following action is not permitted</p>
</blockquote>

<p>An RCP statement could similarly be described as:</p>

<blockquote style="max-width: 70%;">
  <p style="font-size: 25px;">despite what the policy on the <strong>resource</strong> says, the following action is not permitted</p>
</blockquote>

<h2 id="building-an-effective-perimeter">Building an effective perimeter</h2>

<p>In order to build an effective data perimeter, administrators need to enforce the use of trusted identities, expected networks, and known resources. RCPs assist in enforcing organization-wide compliance with ensuring resources can only be accessed by trusted identities, and only via expected networks. The data perimeter adds an additional coarse-grained layer of protection to the existing practices of fine-grained protections, applied via least privilege role-based access control, network firewalls and resource policies.</p>

<p><img src="/images/posts/DataPerimetersTable.png" alt="" /></p>

<h3 id="trusted-identity-enforcement">Trusted identity enforcement</h3>

<p>Let’s take a look at how to apply an RCP to ensure only identities within your organization may access the sensitive resources or data that lies within your accounts. The following policy can be used to ensure that sensitive material from S3, SQS, KMS and Secrets Manager cannot be accessed by identities outside of your organization:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "NoAccessOutsideOrg",
            "Effect": "Deny",
            "Principal": "*",
            "Action": [
                "s3:*",
                "sqs:*",
                "kms:*",
                "secretsmanager:*",
                "sts:*"
            ],
            "Resource": "*",
            "Condition": {
                "StringNotEqualsIfExists": {
                    "aws:PrincipalOrgID": "&lt;YOURORGID&gt;"
                },
                "BoolIfExists": {
                    "aws:PrincipalIsAWSService": "false"
                }
            }
        }
    ]
}
</code></pre></div></div>

<p>The effect of the policy is that any API call to these services must originate from an identity within your organization, or be on behalf of an AWS service. Additionally, outside principals cannot use STS to assume an identity within the organization to bypass the block. If a user within the organization attempts to, for example, allow <code class="language-plaintext highlighter-rouge">s3:GetObject</code> to an external account via an S3 Bucket Policy, the external account would still be forbidden from accessing objects within the bucket as the RCP will override the allow with its explicit deny.</p>

<p>Those with a keen sense of potential exploits may see the carve out for AWS services and remember the <a href="https://docs.aws.amazon.com/IAM/latest/UserGuide/confused-deputy.html">confused deputy problem</a> as a potential problem. Thankfully, RCPs also have an answer to this in the form of enforceable confused deputy protections. We can add the following statement to our RCP to guard against this potential:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "EnforceConfusedDeputyProtection",
            "Effect": "Deny",
            "Principal": "*",
            "Action": [
                "s3:*",
                "sqs:*",
                "kms:*",
                "secretsmanager:*",
                "sts:*"
            ],
            "Resource": "*",
            "Condition": {
                "StringNotEqualsIfExists": {
                    "aws:SourceOrgID": "&lt;YOURORGID&gt;"
                },  
                "Null": {
                    "aws:SourceAccount": "false"
                },
                "Bool": {
                    "aws:PrincipalIsAWSService": "true"
                }
            }
        }
    ]
}
</code></pre></div></div>

<p>The above statement applies specifically when the calling principal is an AWS service, and enforces that the <code class="language-plaintext highlighter-rouge">aws:SourceOrgID</code> must be equal to your organization ID (that is, the AWS service is using a principal to access the resource on behalf of another resource that belongs to your organization). The use of <code class="language-plaintext highlighter-rouge">aws:SourceAccount</code> is used in the <code class="language-plaintext highlighter-rouge">Null</code> condition operator so that the control applies only when the request has the context of an originating account (i.e. is susceptible to the cross-service confused deputy problem).</p>

<h3 id="expected-network-enforcement">Expected network enforcement</h3>

<p>We can also use RCPs to ensure that access is only granted from expected networks and that data doesn’t traverse through an unexpected network path. The following policy can be used to ensure data from S3, SQS, KMS and Secrets Manager can only be accessed if the caller is within the corporate network:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "EnforceNetworkPerimeter",
            "Effect": "Deny",
            "Principal": "*",
            "Action": [
                "s3:*",
                "sqs:*",
                "kms:*",
                "secretsmanager:*",
                "sts:*"
            ],
            "Resource": "*",
            "Condition": {
                "NotIpAddressIfExists": {
                    "aws:SourceIp": "&lt;YOURIPRANGE&gt;"
                },
                "StringNotEqualsIfExists": {
                    "aws:SourceVpc": "&lt;YOURVPCID&gt;"
                },
                "BoolIfExists": {
                    "aws:PrincipalIsAWSService": "false",
                    "aws:ViaAWSService": "false"
                },
                "ArnNotLikeIfExists": {
                    "aws:PrincipalArn": [
                        "arn:aws:iam::*:role/aws:ec2-infrastructure"
                    ]
                }
            }
        }
    ]
}
</code></pre></div></div>

<p>The effect of the policy is that any attempt to access the resources within these services (or use STS to assume a role to do so) is blocked where the caller’s IP address falls outside the expected CIDR range or originates from a VPC ID that isn’t the expected one. Again, we specifically carve out an exception for AWS services, including those which use <a href="https://docs.aws.amazon.com/IAM/latest/UserGuide/access_forward_access_sessions.html">forward access sessions</a>. We also have an additional carve out for EBS volume decryption, which uses a known IAM role to call KMS for decryption of the data key for volumes it manages.</p>

<p>A small note that all of the above examples don’t consider OIDC-based identities for readability purposes. Check out the <a href="https://github.com/aws-samples/data-perimeter-policy-examples/tree/4bc433ff6c4721049fc2eb542c89246343b5fb8a/resource_control_policies">aws-samples</a> repository for a more detailed version which allows for those scenarios.</p>

<h2 id="iam-access-analyzer">IAM Access Analyzer</h2>

<p>With the introduction of RCPs come additions to IAM Access Analyzer’s External access finding details. Because RCPs have the ability to affect the effective permissions of a call, some of the automated findings may also be rendered invalid. To combat this without outright exposing potentially sensitive details of the RCP itself, the External access finding now has a field which indicates whether or not an RCP <em>may</em> affect a specific finding.</p>

<p><img src="/images/posts/accessanalyzerfindingrcp.png" alt="" /></p>

<h2 id="limitations-of-rcps">Limitations of RCPs</h2>

<p>At launch, RCPs only support actions for S3, SQS, KMS, Secrets Manager and STS. This is a short list of likely the most impactful services for organization administrators to establish a data perimeter for. I’m confident this list will quickly expand based on customer demand.</p>

<p>Unfortunately, RCPs do not allow the use of the <code class="language-plaintext highlighter-rouge">*</code> wildcard by itself in the Action field, but instead enforce that all actions need to be scoped to a service namespace. This disallows a kind of automatic opt-in to protections as they become available via RCPs. RCPs also do not support the <code class="language-plaintext highlighter-rouge">NotPrincipal</code> element or the <code class="language-plaintext highlighter-rouge">NotAction</code> element.</p>

<p>Like SCPs, RCPs also do not apply to the organization management account. Administrators should ensure extra security is applied to this account to compensate. RCPs do however apply to delegated administrator accounts.</p>

<p>RCPs do not apply to services which use service-linked roles, as this would break specific requirements in order for some services to operate correctly. These roles do however fall directly in the AWS side of the <a href="https://aws.amazon.com/compliance/shared-responsibility-model/">Shared Responsibility Model</a>.</p>

<p>Finally, RCPs do have limits and quotas which are very similar to SCPs, including a 5kb policy limit and a limit of 5 policies at each organizational OU, account or root level.</p>

<h2 id="time-to-start-building">Time to start building</h2>

<p>RCPs close a gap in the quest to better protect an organization’s sensitive data through the use of effective data perimeters by giving administrators a new tool to apply these guardrails. This does however introduce another layer of complexity which, if mismanaged, could lead to unexpected consequences such as outages. Administrators should carefully evaluate all the effects of these policies before applying them and in particular investigate specific nuances with how the various AWS services may use differing access mechanisms to reach resources.</p>

<p>Though service support is still limited at launch, I’d encourage administrators to explore the use of RCPs and to start using specific, limited policies to protect resources with known access patterns within their organization.</p>

<p>If you liked what I’ve written, or want to hear more on this topic, reach out to me on 𝕏 at <a href="https://twitter.com/iann0036">@iann0036</a>.</p>]]></content><author><name>Ian Mckay</name></author><summary type="html"><![CDATA[It's pre:Invent season, and one of the most consequential identity and access management features was just released by the identity team at AWS. Resource Control Policies, a strong tool for establishing data perimeters, is now available for organization administrators.]]></summary></entry><entry><title type="html">Poor mans MFA for AWS Client VPN</title><link href="https://onecloudplease.com/blog/poor-mans-mfa-for-aws-client-vpn" rel="alternate" type="text/html" title="Poor mans MFA for AWS Client VPN" /><published>2024-07-13T00:00:00+00:00</published><updated>2024-07-13T00:00:00+00:00</updated><id>https://onecloudplease.com/blog/poor-mans-mfa-for-aws-client-vpn</id><content type="html" xml:base="https://onecloudplease.com/blog/poor-mans-mfa-for-aws-client-vpn"><![CDATA[<p><img src="/images/posts/client-vpn-slack.jpg" alt="" /></p>

<p>The <a href="https://docs.aws.amazon.com/vpn/latest/clientvpn-admin/what-is.html">AWS Client VPN</a> service is a common way to seamlessly connect users into internal networks, however administrators often need ways to ensure a heightened level of security considering the attack surface. In this post, I describe a low-tech, low-cost solution to better authenticate users using a second factor.</p>

<h2 id="client-vpn-authentication-methods">Client VPN authentication methods</h2>

<p>AWS Client VPN supports connection to federated providers, either via a dedicated Active Directory integration (via AWS Directory Service) or via a SAML provider. These options are good, however often this solution is required either in an environment without federation already established or where the VPN is required on mobile devices, which doesn’t have a supported way to perform the browser-based flow. Because of this, the mutual authentication option is an easy and convenient way to get going quickly and at a low cost.</p>

<p>The Active Directory integration does have the ability to integrate MFA natively, using a RADIUS server, however this typically is a complex setup.</p>

<h2 id="as-easy-as-a-thumbs-up">As easy as a thumbs up</h2>

<p>The AWS Client VPN service does have the option to provide a <a href="https://docs.aws.amazon.com/vpn/latest/clientvpn-admin/connection-authorization.html">client connect handler</a> for the VPN endpoint. This handler is a custom Lambda function you can write to authorize or reject each new connection attempt. Typically, the intent would be to use device posture checks or username lookups from a datastore to evaluate the outcome of the attempt, however we do have a somewhat generous 30 second limit to work with. Notably, this check is in addition to the already established mutual certificate presentation, which takes place before this check is attempted.</p>

<p>A creative alternative solution is to make use of the Slack Bot API to prompt the user to confirm new connections. As users initiate a connection, the Lambda function is invoked and takes the Slack user identifier embedded in the common name of the issued mutual certificate, and uses the Slack Bot API to send a direct message in Slack to the user. The user doesn’t directly respond to the message however, and is instead prompted to give it a thumbs up 👍 reaction. Once the Lambda function sends the initial message, it then short polls the Slack endpoint to retrieve the reactions on its sent message. If it detects the correct reaction before the attempt times out, it responds with a successful authentication attempt.</p>

<p>Here’s what that looks like in practice:</p>

<p><img src="/images/posts/slack-vpn-mfa.png" alt="" /></p>

<h2 id="setting-it-up">Setting it up</h2>

<p>The following assumes you have already set up a Client VPN endpoint using mutual authentication. The <a href="https://docs.aws.amazon.com/vpn/latest/clientvpn-admin/mutual.html">AWS docs</a> do a pretty good job at walking you through this. You’ll also need appropriate permissions to install a new bot to your Slack workspace (this is typically allowed for non-administrators).</p>

<p>One modification to the process is to ensure you include the Slack ID of the user in the common name of the issued certificate to clients, like the following:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>./easyrsa build-client-full &lt;fullnameofuser&gt;-&lt;slackmemberid&gt;.mydomain.com nopass
</code></pre></div></div>

<p>The Slack ID for a user can be found by clicking on the users Slack profile and selecting the “Copy member ID” option in the expand menu.</p>

<p>Next, we’ll set up the Slack Bot itself. To do this, visit <a href="https://api.slack.com/apps">https://api.slack.com/apps</a> and click on the “Create New App” button. Use the “From Scratch” option, give your bot a new friendly name, and select the workspace to authorize your bot into.</p>

<p><img src="/images/posts/slack-bot-setup.png" alt="" /></p>

<p>I highly recommend scrolling down on the initial page and adding an App Icon for your bot to help distinguish it more.</p>

<p>Navigate to the “OAuth &amp; Permissions” page for the bot and scroll to the “Scopes” section. Add the scopes <code class="language-plaintext highlighter-rouge">chat:write</code> and <code class="language-plaintext highlighter-rouge">reactions:read</code>.</p>

<p><img src="/images/posts/slack-bot-scopes.png" alt="" /></p>

<p>Once done, scroll up and click the “Install to Workspace” button. Authorize the request, navigate back to the “OAuth &amp; Permissions” page and you should have a “Bot User OAuth Token” generated for you, starting with <code class="language-plaintext highlighter-rouge">xoxb-</code>.</p>

<p><img src="/images/posts/slack-bot-tokengen.png" alt="" /></p>

<p>Take the “Bot User OAuth Token” and save it to the “token” field of a new Secrets Manager secret within your AWS account. I’ve called my secret “myslackbot” here but you can use anything you wish and modify the upcoming script as needed.</p>

<p><img src="/images/posts/slack-bot-secret.png" alt="" /></p>

<p>The final change is to create the authorization Lambda for the client connection handler. One particularly confusing limitation is that the name of the Lambda function must be prefixed with <code class="language-plaintext highlighter-rouge">AWSClientVPN-</code>. Below is the full Python source code for that - no external libraries needed!</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>import boto3
import os
import json
import pprint
import time
from urllib.request import Request, urlopen
from urllib.error import URLError, HTTPError

def handler(event, context):
    client = boto3.client('secretsmanager')
    secret = json.loads(client.get_secret_value(SecretId='myslackbot')['SecretString'])

    channel = event['common-name'].split("-").pop().split(".")[0]
    if len(channel) &lt; 2 or len(channel) &gt; 12:
        return

    body = {
        'channel': channel,
        'text': 'React with a :thumbsup: to this message to approve the current login attempt from ' + event['public-ip'] + ' (' + event['platform'] + ').\n\nYou must complete this action within 30 seconds.'
    }
    req = Request(
        'https://slack.com/api/chat.postMessage',
        json.dumps(body).encode('utf-8'),
        headers={
            'Content-Type': 'application/json; charset=utf-8',
            'Authorization': 'Bearer ' + secret['token']
        }
    )
    msg = json.loads(urlopen(req).read())

    while True:
        time.sleep(2)
        req = Request(
            'https://slack.com/api/reactions.get?channel=' + msg['channel'] + "&amp;timestamp=" + msg['ts'],
            headers={
                'Content-Type': 'application/json; charset=utf-8',
                'Authorization': 'Bearer ' + secret['token']
            }
        )
        reactions = json.loads(urlopen(req).read())

        if 'reactions' in reactions['message']:
            for reaction in reactions['message']['reactions']:
                if '+1' in reaction['name']:
                    return {
                        'allow': True,
                        'error-msg-on-denied-connection': '',
                        'posture-compliance-statuses': [],
                        'schema-version': 'v2'
                    }
</code></pre></div></div>

<p>Once you’ve configured your client connection handler in the VPN endpoint, you have completed your setup and can test your new MFA solution for yourself.</p>

<h2 id="finishing-up">Finishing up</h2>

<p>The above solution was the result of running into a bunch of limitations, but then looking around and considering alternatives that may seem unusual at first however turn out to be quite effective. I’m reminded that this is a good skill to have and can lead to some new experiences that might benefit you in future circumstances.</p>

<p>If you liked what I’ve written, or want to hear more on this topic, reach out to me on 𝕏 at <a href="https://twitter.com/iann0036">@iann0036</a>.</p>]]></content><author><name>Ian Mckay</name></author><summary type="html"><![CDATA[The AWS Client VPN service is a common way to seamlessly connect users into internal networks. In this post, I describe a low-tech, low-cost solution to better authenticate users using a second factor.]]></summary></entry><entry><title type="html">HTTPS Endpoints and more tricks with AWS Step Functions</title><link href="https://onecloudplease.com/blog/https-endpoints-and-more-tricks-with-aws-step-functions" rel="alternate" type="text/html" title="HTTPS Endpoints and more tricks with AWS Step Functions" /><published>2024-01-13T00:00:00+00:00</published><updated>2024-01-13T00:00:00+00:00</updated><id>https://onecloudplease.com/blog/https-endpoints-and-more-tricks-with-aws-step-functions</id><content type="html" xml:base="https://onecloudplease.com/blog/https-endpoints-and-more-tricks-with-aws-step-functions"><![CDATA[<p><img src="/images/posts/https-endpoints-step-functions.jpg" alt="" /></p>

<p>AWS re:Invent 2023 is now behind us and one of my favourite announcements was the introduction of <a href="https://docs.aws.amazon.com/step-functions/latest/dg/connect-third-party-apis.html">HTTPS Endpoints</a> to AWS Step Functions. In this post, I explain the feature, test its limits and also show off some other tricks for data manipulation within your state machines.</p>

<p>For the impatient, <a href="https://github.com/iann0036/chess-dot-com-state-machine-sample/blob/main/template.yml">here</a> is the final result.</p>

<h2 id="https-endpoints-feature">HTTPS Endpoints feature</h2>

<p>HTTPS endpoints use <a href="https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-api-destinations.html#eb-api-destination-connection">Amazon EventBridge API destination connections</a> to determine the authentication mechanism used. This service subsequently uses Secrets Manager to store the credentials that will be included to authenticate requests.</p>

<p>Then within the state machine, you reference this connection and specify your own URL and HTTP method. You can also optionally include your own query parameters, headers and/or request body.</p>

<p>There are some limitations though. Firstly, there is a 60 second timeout (hard limit) for the totality of the request. There are additional mandatory headers which Step Functions sets and you cannot override. These are:</p>

<ul>
  <li>Host (value: <em>hostname of the URL</em>)</li>
  <li>User-Agent (value: <code class="language-plaintext highlighter-rouge">Amazon|StepFunctions|HttpInvoke|us-east-1</code>, where <code class="language-plaintext highlighter-rouge">us-east-1</code> is replaced by your region)</li>
  <li>Range (value: <code class="language-plaintext highlighter-rouge">bytes=0-262144</code>)</li>
</ul>

<p>Note that the request will still fail if the response exceeds 256kb even though the Range header is set. The presence of the header can also cause confusion as some servers will respond with a <code class="language-plaintext highlighter-rouge">206 Partial Content</code> status code even if all data is returned, so be aware of that.</p>

<p>The client IP address for the requests are different for each request and appear to lie within the standard EC2 public IP range <a href="https://docs.aws.amazon.com/vpc/latest/userguide/aws-ip-ranges.html">published by AWS</a>. There is no capability to use Elastic IPs or other networking constructs within your account.</p>

<p>Your state machine IAM role will need to include actions that allow access to the connection and its associated secret, as well as the <code class="language-plaintext highlighter-rouge">states:InvokeHTTPEndpoint</code> action which has the optional conditionals of <code class="language-plaintext highlighter-rouge">states:HTTPEndpoint</code> and <code class="language-plaintext highlighter-rouge">states:HTTPMethod</code> to help scope down what endpoints and HTTP methods the state machine can call. I have included an example of a granular policy in the CloudFormation template at the end of this post.</p>

<h2 id="gathering-the-data">Gathering the data</h2>

<p><img src="/images/posts/sfunc4.png" alt="" /></p>

<p>In order to demonstrate the capabilities of the new feature, I’ve chosen to consume the <a href="https://www.chess.com/news/view/published-data-api">Chess.com API</a>. This is a free and anonymous API which retrieves metadata about games and players on their platform.</p>

<p>I will retrieve a list of all <a href="https://www.chess.com/terms/grandmaster-chess">grandmasters</a>, their country of origin, and aggregate these details by country.</p>

<p>Because this is a public endpoint, there is no need for an Authorization or similar header when accessing the endpoint, however EventBridge API destinations <em>require</em> the use of Basic Authorization, OAuth or API Key header. One creative way of avoiding sending an unnecessary header is to create your connection using the API Key type but set the header to one of the immutable headers, such as <code class="language-plaintext highlighter-rouge">User-Agent</code>.</p>

<p><img src="/images/posts/sfunc2.png" alt="" /></p>

<p>I created the step to gather the list of grandmasters by hitting the URL <code class="language-plaintext highlighter-rouge">https://api.chess.com/pub/titled/GM</code>. Because I am only interested in the content of the response body, I apply an OutputPath filter of <code class="language-plaintext highlighter-rouge">$.ResponseBody</code>. This provides me with the list of grandmaster usernames, but not their origin country or actual name. For that, we need to retrieve their details using additional individual HTTPS calls.</p>

<p>To do this efficiently, we use the <a href="https://docs.aws.amazon.com/step-functions/latest/dg/use-dist-map-orchestrate-large-scale-parallel-workloads.html">Distributed Map</a> type within Step Functions. To ensure we do not overload the Chess.com API, we limit the concurrency to 40. We also use a standard exponential backoff for the inner HTTPS call to allow for retries in the event of an occasional error.</p>

<p>This brings us to a state where we have an array of the individual grandmaster details.</p>

<h2 id="aggregating-the-data">Aggregating the data</h2>

<p><img src="/images/posts/sfunc3.png" alt="" /></p>

<p>Aggregating data (using map-reduce style methods) within a state machine is not a native function, however with some clever usage it is possible.</p>

<p>To do this, we first need to ensure all fields are present in the individual grandmaster details. Unfortunately, the <code class="language-plaintext highlighter-rouge">name</code> field isn’t always present on these responses so to fix that we add the following <code class="language-plaintext highlighter-rouge">ResultSelector</code> to the HTTPS endpoint step within the distributed map:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{
    "output.$": "States.JsonMerge(States.StringToJson('{\"name\":\"Unknown Player\"}'), $.ResponseBody, false)"
}
</code></pre></div></div>

<p>This takes the resulting detail from the HTTP response, and performs a JSON merge with the static object we defined with a default name. If the name is not present, this field will be used.</p>

<p>Next, we format the resulting name in the way we would like it, as well as extract the 2-letter country code from the URL which looks like <code class="language-plaintext highlighter-rouge">https://api.chess.com/pub/country/US</code>. To do this, we use a Pass state. The Parameters of the Pass state are as follows:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{
    "displayName.$": "States.Format('{} ({})', $.output.name, $.output.username)",
    "country.$": "States.ArrayGetItem(States.StringSplit($.output.country, '/'), 4)"
}
</code></pre></div></div>

<p>Note that the array index used is 4 and not 5. This is because empty segments (the one in between <code class="language-plaintext highlighter-rouge">http:/</code> and the next <code class="language-plaintext highlighter-rouge">/</code>) get discarded during the <code class="language-plaintext highlighter-rouge">States.StringSplit</code> operation.</p>

<p>Using the output of the distributed map, we apply a new Pass state with the following parameters:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{
    "original.$": "$",
    "countries.$": "States.ArrayUnique($[*].country)",
    "countriesCount.$": "States.ArrayLength(States.ArrayUnique($[*].country))",
    "iterator": 0,
    "output": {}
}
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">original</code> key contains the distributed map output, the <code class="language-plaintext highlighter-rouge">countries</code> key uses JSONPath and <code class="language-plaintext highlighter-rouge">States.ArrayUnique</code> to select the unique list of countries, the <code class="language-plaintext highlighter-rouge">countriesCount</code> key is the length of the countries, the <code class="language-plaintext highlighter-rouge">iterator</code> key is initialised at 0, and the <code class="language-plaintext highlighter-rouge">output</code> key is initialised with an empty map.</p>

<p>Then we enter a loop. The loop will continue whilst the iterator is less than the length of countries. We then use a Pass state to set the <code class="language-plaintext highlighter-rouge">country</code> key to the country at the <code class="language-plaintext highlighter-rouge">iterator</code> index of the <code class="language-plaintext highlighter-rouge">countries</code> list. We then use one more Pass state increase the iterator with:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>States.MathAdd($.iterator, 1)
</code></pre></div></div>

<p>We also set the <code class="language-plaintext highlighter-rouge">output</code> key to the following (spaced for visibility):</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>States.JsonMerge(
    States.StringToJson(
        States.Format(
            '\{"{}":{}\}',
            $.country,
            States.JsonToString(
                $.original[?(@.country == $.country)]['displayName']
            )
        )
    ),
    $.output
, false)
</code></pre></div></div>

<p>The above performs the following transformations:</p>

<ol>
  <li>Retrieve the list of all <code class="language-plaintext highlighter-rouge">displayName</code> strings within the <code class="language-plaintext highlighter-rouge">original</code> key, filtering where the <code class="language-plaintext highlighter-rouge">country</code> key is equal to the country within the <code class="language-plaintext highlighter-rouge">original</code> key entries which we previously created using JSONPath</li>
  <li>Convert that list to a JSON string</li>
  <li>Create a new JSON-compatible string where the key is the <code class="language-plaintext highlighter-rouge">country</code> and the value is the above string-encoded array of names</li>
  <li>Convert the string to a JSON object</li>
  <li>Merge that object with the <code class="language-plaintext highlighter-rouge">output</code> variable</li>
</ol>

<p>We’re basically adding the country code as a key of the <code class="language-plaintext highlighter-rouge">output</code> JSON object one at a time, then increasing the iterator to reference the next country in the list.</p>

<p>Once it has completed the loop, we are left with our final output.</p>

<p><img src="/images/posts/sfunc5.png" alt="" /></p>

<h2 id="finishing-up">Finishing up</h2>

<p>I have provided a CloudFormation template that contains the full state machine and associated connection <a href="https://github.com/iann0036/chess-dot-com-state-machine-sample/blob/main/template.yml">here</a>. Feel free to deploy this into your own AWS account and try it yourself.</p>

<p>The HTTPS Endpoints feature is a very useful addition to the Step Functions service that I believe will have huge uptake. I personally want to do more with the Step Functions service as I believe more architectures can be more than serverless, they can be “functionless” (i.e. no Lambda functions). I would however like to see more useful intrinsics become available in the service. As you can see from this post, developers are often pushing the limits of what is available. Consider this my <a href="https://twitter.com/search?q=%23awswishlist">#awswishlist</a> item.</p>

<p>A big thank you to <a href="https://twitter.com/__steele">Aidan Steele</a> for helping review this post. If you liked what I’ve written, or want to hear more on this topic, reach out to me on 𝕏 at <a href="https://twitter.com/iann0036">@iann0036</a>.</p>]]></content><author><name>Ian Mckay</name></author><summary type="html"><![CDATA[HTTPS Endpoints was one of my favourite re:Invent 2023 announcements. I talk about it and other interesting things you can achieve within your state machines in this post.]]></summary></entry><entry><title type="html">Swiping right on the AWS WAF CAPTCHA challenge</title><link href="https://onecloudplease.com/blog/swiping-right-on-the-aws-waf-captcha-challenge" rel="alternate" type="text/html" title="Swiping right on the AWS WAF CAPTCHA challenge" /><published>2023-07-25T00:00:00+00:00</published><updated>2023-07-25T00:00:00+00:00</updated><id>https://onecloudplease.com/blog/swiping-right-on-the-aws-waf-captcha-challenge</id><content type="html" xml:base="https://onecloudplease.com/blog/swiping-right-on-the-aws-waf-captcha-challenge"><![CDATA[<p><img src="/images/posts/captcha.jpg" alt="" /></p>

<p>In 2021, AWS WAF <a href="https://aws.amazon.com/about-aws/whats-new/2021/11/aws-waf-captcha-support/">introduced</a> a new CAPTCHA feature to help protect sites against bot traffic. The release had some <a href="https://twitter.com/iann0036/status/1457908922925256704">mixed</a> <a href="https://twitter.com/iann0036/status/1457911175094538248">reviews</a> but the idea was that it was an effective protection against programmatic solvers or “bots”.</p>

<p>In this post, I walk through my methodology for beating one of the CAPTCHA challenges presented programmatically. If you’d like to follow along, you can try the CAPTCHA challenges yourself <a href="https://efw47fpad9.execute-api.us-east-1.amazonaws.com/latest">here</a>.</p>

<h2 id="the-aws-waf-captcha-system">The AWS WAF CAPTCHA system</h2>

<p>The CAPTCHA <a href="https://docs.aws.amazon.com/waf/latest/developerguide/waf-captcha-and-challenge.html">feature</a> in AWS WAF is an optional action as a result of a match against customer-defined rules. It is intended to be an option to help bridge the difficult decision of a hard deny or hard allow when client heuristics may appear suspicious but not outright bot-like.</p>

<p>When triggered, the action prompts viewers of a website with interactive challenges designed to test that a human viewer is real and block bots seeking to crawl or disrupt human traffic. At launch, and to this day, there are two challenges available which I will call the “car maze” and “shape match” challenges.</p>

<p>I created a Twitter (𝕏?) thread about beating the car maze challenge when it was originally released which you can read here:</p>

<div style="max-width: 60%; padding-left: 0; padding-right: 0; margin-left: auto; margin-right: auto; margin-top: 15px;"><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Had a bit of fun today with the WAF CAPTCHA thing. The car maze turned into a fun programming challenge! 1/ <a href="https://t.co/D6Rf4SZGy4">pic.twitter.com/D6Rf4SZGy4</a></p>&mdash; Ian Mckay (@iann0036) <a href="https://twitter.com/iann0036/status/1459770171581550593?ref_src=twsrc%5Etfw">November 14, 2021</a></blockquote><script async="" src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></div>

<p>I will note that there have been some changes since writing the thread and discussing my findings with the AWS WAF service team that make the car maze challenge slightly more complex, though the same concepts still broadly apply.</p>

<p>Let’s go through the same process with the shape match challenge!</p>

<h2 id="shape-matching">Shape matching</h2>

<p>The shape match challenge features an image of 5 random 3D shapes lined up horizontally which has been split across the vertical axis and reordered. The interface gives you a slider which you can move to match usually only one shape at a time and gives you instructions as to which shape to match up and submit. The bottom section wraps as you drag the slider.</p>

<p><img src="/images/posts/captcha1.png" alt="" /></p>

<p>The available shapes are: <code class="language-plaintext highlighter-rouge">ball</code>, <code class="language-plaintext highlighter-rouge">cone</code>, <code class="language-plaintext highlighter-rouge">cube</code>, <code class="language-plaintext highlighter-rouge">cylinder</code>, <code class="language-plaintext highlighter-rouge">donut</code>, <code class="language-plaintext highlighter-rouge">knot</code> and <code class="language-plaintext highlighter-rouge">pyramid</code>.</p>

<p>The challenge presents both halves of the shapes as a single JPEG image, always at a 320x160 resolution. Taking a similar approach as the car maze solve, I’m using HTML canvas to inspect the image, extract pixel data and draw for my own visualization. For my first step, I sample the top-left pixel colour and eliminate these pixels from consideration. Because the challenge is a JPEG, some colour blending and artifacts are present so in most of the below steps I check for colour closeness by ensuring the RGB channels are within a small boundary (in this case, no more than 7 away). The top and bottom 80 pixels of the Y-axis represent the top and bottom sections, respectively.</p>

<p><img src="/images/posts/captcha2.png" alt="" /></p>

<p>I now want to identify the location and width of the shapes at the midline for the top and bottom sections. The shapes in the challenge always have a clear separation between them, so in order to do this I move left-to-right at just above and below the midline (skipping the exact pixels on the midline, as JPEG artifacting can sometimes merge the pixels at y=79 and y=80). When I hit a non-background pixel, I mark the starting point of the shape. Once I hit a background pixel again, I can presume the start and stop points on the X-axis.</p>

<p>This gives me a set of values which intersect at the midline, however there are typically more values than the 5 shapes that are present. This is because shapes like the donut and knot intersect the midline at multiple points. To overcome this, we need to find any space in between where the shapes hit the midline where there isn’t a clear path to the relative extremes of the axis (i.e. where it is presumed to be in the center of the donut / knot). We take the middle of each of the clear spaces and start drawing a line towards the extreme of the axis, allowing a deviation to the left or right if clear space is present. Any line that does not reach the axis extreme is considered to be within the shapes, so these points are aggregated with regard to the shape boundary at the midline. This finally provides us with 5 positions and widths for both the top and bottom sections.</p>

<p><img src="/images/posts/captcha3.png" alt="" /></p>

<p>Because the donut always has two midline points which are of roughly equal width, we can mark this as a high probability match straight away. Additionally, if we see a single shape with more than 2 midline point intersections we can safely assume it is of the knot as this is the only shape that does this. At this point, I can start drawing the resulting shapes on individual canvases and mark those which are assumed during development.</p>

<p><img src="/images/posts/captcha4.png" alt="" /></p>

<p>We can then use the widths of the top and bottom shape midline intersections and find roughly matching widths. This gives us strong candidates for matching top and bottom section shapes, allowing us to calculate the relative X-axis offset needed to create the shapes. Under good circumstances, we now have 5 completed shapes but no way of identifying at least 3 of them.</p>

<p>In order to discover more information about the potential shapes, we calculate more landmark points to gain additional heuristics on the shape type. These points are calculated by the following:</p>

<ul>
  <li><strong>Point 1:</strong> From the extreme left side at the midline, move towards the Y-axis extreme</li>
  <li><strong>Point 2:</strong> From the extreme right side at the midline, move towards the Y-axis extreme</li>
  <li><strong>Point 3:</strong> From the X-axis center at the midline, move towards the Y-axis extreme - if blocked, deviate left if able</li>
  <li><strong>Point 4:</strong> From the X-axis center at the midline, move towards the Y-axis extreme - if blocked, deviate right if able</li>
</ul>

<p>Here are the paths that discovery takes to find the landmark points:</p>

<p><img src="/images/posts/captcha5.png" alt="" /></p>

<p>A ball shape always has a short Y-axis travel for points 1 and 2 for both sections, as well as a short X-axis travel from the center of the midline for points 3 and 4. The Y-axis travel for points 3 and 4 are generally identical and have roughly the same value as the X-axis travel for points 1 and 2.</p>

<p>A cone or pyramid shape typically also has a short Y-axis travel for points 1 and 2 in the top section, but a large Y-axis travel for all points in the bottom section.</p>

<p>A cube or cylinder generally has a roughly matching X-axis and Y-axis for the diametrically opposing points (point 1 in the top and point 2 in the bottom, and vice-versa).</p>

<p>Although it is challenging to decide between a cone/pyramid and cube/cylinder due to their shape similarities, there is one more trick we can use. Taking a path across the X-axis just below the midline, track the colours during movement. If the colour always gradually changes slightly, we can assume there is a gradient and the shape is a cone or cylinder. If there is exactly one or two colours, these represent the visible faces of a pyramid or cube.</p>

<p>We’ve now successfully identified each shape and their offsets.</p>

<p><img src="/images/posts/captcha6.png" alt="" /></p>

<h2 id="solving-the-challenge">Solving the challenge</h2>

<p>The challenge generally accepts an offset value as its answer and so without any UI interference we could simply respond with a network request programmatically. However, I wanted to see the actual solution occur so I looked into actually performing the sliding action.</p>

<p>I had never programmatically moved a slider before and it turns out it is actually a rare automation to achieve, but it is possible. I came across <a href="https://stackoverflow.com/a/61547444/546911">this StackOverflow answer</a> which showed I can create custom <code class="language-plaintext highlighter-rouge">mousedown</code>, <code class="language-plaintext highlighter-rouge">mousemove</code> and <code class="language-plaintext highlighter-rouge">mouseup</code> Mouse Events which worked in order to drag the slider. Notably, there was some math required to slide to the correct position, as the image width was 320 pixels, the slider would drag a maximum of 274 pixels, and the challenge solution endpoint accepted an answer between 0 and 255.</p>

<p>Occasionally, identification would fail due to an edge case or similar, however this simply meant that a new challenge would load and the automation could try again immediately. There seems to be no lockout or escalation of difficulty.</p>

<h2 id="the-road-not-travelled">The road not travelled</h2>

<p>There were a few approaches I could have taken during the development of this solution, however I took what I thought was the simplest and easiest to understand solution. I did look into using the JavaScript version of OpenCV, which I could pretty easily use to find the contours of the shapes and I could have used this to assist with some edge case resolution.</p>

<p><img src="/images/posts/captcha7.png" alt="" /></p>

<p>Additionally, the audio-based accessibility CAPTCHA alternative still remains for those in the speech recognition space looking for a fun challenge.</p>

<h2 id="final-thoughts">Final thoughts</h2>

<p>The AWS WAF CAPTCHA remains an effective deterrent for all but the most determined of bot authors. I don’t envy the position the AWS WAF service team members are in. They are charged with creating a novel, interactive CAPTCHA challenge that has little cognitive load for users but remains challenging enough that it isn’t easily toppled by bots. I believe that if there were a constantly evolving rotation of new WAF challenge types we would have an effective protection purely based on the bot authors ability to adapt. Sadly this hasn’t yet happened. Features like <a href="https://aws.amazon.com/waf/features/bot-control/">Bot Control</a> seem to be a far more effective way of dealing with bot traffic without generally affecting users, so I’d recommend that instead.</p>

<p>If you liked what I’ve written, or want to hear more on this topic, reach out to me on Twitter (or whatever it’s called now) at <a href="https://twitter.com/iann0036">@iann0036</a>.</p>]]></content><author><name>Ian Mckay</name></author><summary type="html"><![CDATA[In 2021, AWS WAF introduced a new CAPTCHA feature to help protect sites against bot traffic. In this post, I walk through my methodology for beating the CAPTCHA challenges programmatically.]]></summary></entry><entry><title type="html">Cedar: Avoiding the cracks</title><link href="https://onecloudplease.com/blog/cedar-avoiding-the-cracks" rel="alternate" type="text/html" title="Cedar: Avoiding the cracks" /><published>2023-07-06T00:00:00+00:00</published><updated>2023-07-06T00:00:00+00:00</updated><id>https://onecloudplease.com/blog/cedar-avoiding-the-cracks</id><content type="html" xml:base="https://onecloudplease.com/blog/cedar-avoiding-the-cracks"><![CDATA[<p><img src="/images/posts/cedar1.jpg" alt="" /></p>

<p>With the <a href="https://aws.amazon.com/about-aws/whats-new/2023/05/cedar-open-source-language-access-control/">open-source release</a> of the Cedar engine and the <a href="https://aws.amazon.com/about-aws/whats-new/2023/06/amazon-verified-permissions-generally-available/">general availability release</a> of Amazon Verified Permissions, more and more engineers are considering integrating Cedar into their own systems for authorization, but what do policy authors need to consider to avoid unexpected outcomes?</p>

<p>In this post, I’ll walk through my experiences in where policy authoring can go wrong and the steps you can take to overcome these issues. This post will walk through some advanced evaluation scenarios, so if you’re new to the Cedar language I highly recommend you first read my introductory post on the topic, <a href="https://onecloudplease.com/blog/cedar-a-new-policy-language">Cedar: A new policy language</a>.</p>

<h2 id="non-unique-entity-identifiers">Non-unique entity identifiers</h2>

<p>Though I mentioned it in my previous post, it’s important to always use unique identifiers for entities to ensure they do not get re-used in the future. The reason this may be a problem is that a reliance may start to occur on the entity, the entity goes away at some point in time, then a new entity of the same name comes into existence at a later point.</p>

<p>For example, consider the following statement:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>permit(
    principal == User::"John",
    action,
    resource == Account::"Corporate"
);
</code></pre></div></div>

<p>If the user named John leaves the company, and then another John joins the company and happens to take the same entity identifier, it’s possible for the new John to inherit some privileges he should not be entitled to. The <a href="https://cedarland.blog/design/why-no-entity-wildcards/content.html">Cedarland blog</a> has some more detail on the reasoning behind this.</p>

<h3 id="solutions">Solutions</h3>

<p>Always use unique identifiers, such as the identifiers your IdP provider uses, to uniquely identify principals. Additionally, use resource identifiers which are also unique for the context provided. Comments and annotations can help you keep track of identifiers where necessary.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>permit(
    principal == User::"9a6afab1-5a37-4c90-aa40-24277b93ca28", // John Smith
    action,
    resource == Account::"710f18bc-b8ab-4313-b362-8e6264cfcf91" // Corporate Account
);
</code></pre></div></div>

<h2 id="invalid-statements">Invalid statements</h2>

<p>Invalid statements not being evaluated is in my opinion one of the easiest ways to get an unexpected result from your policy evaluations. Consider the following policy:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>permit(
    principal,
    action == Action::"Connect",
    resource
);

forbid(
    principal,
    action == Action::"Connect",
    resource == Endpoint::"AdminEndpoint"
) unless {
    context.viaAdminNetwork == true
};
</code></pre></div></div>

<p>The intention behind the policy is to allow connections to all endpoints except the admin endpoint unless the context object has the <code class="language-plaintext highlighter-rouge">viaAdminNetwork</code> key set to true. Unfortunately, the implementation of the context object in this example is that the <code class="language-plaintext highlighter-rouge">viaAdminNetwork</code> key is omitted, not <code class="language-plaintext highlighter-rouge">false</code>, if the call does not come from the admin network.</p>

<p>The result of this is that the forbid statement is not processed as there is an evaluation error due to the missing key. However, as the permit statement has been evaluated, and there are no other valid forbid statements, the result is an allow of the call. Even though the evaluated result is allow, there will be errors in the diagnostic return, as you can see from this Cedar playground screenshot:</p>

<p><img src="/images/posts/cedar2.png" alt="" /></p>

<p>There is more discussion on the reasoning for this behaviour over at the <a href="https://cedarland.blog/design/why-ignore-errors/content.html#what-about-forbid-statements">Cedlarland blog</a>.</p>

<h3 id="solutions-1">Solutions</h3>

<p>Cedar has a validation engine that uses a schema to define the properties of entities within your system. This allows Cedar to warn you during the authoring phase when policies may not be valid. It is a best practice that you always construct a schema for your system.</p>

<p>The following schema would allow a developer to catch the unsafe usage of the attribute:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{
    "": {
        "entityTypes": {
            "Endpoint": {
                "shape": {
                    "type": "Record",
                    "attributes": {}
                }
            }
        },
        "actions": {
            "Connect": {
                "appliesTo": {
                    "resourceTypes": ["Endpoint"],
                    "context": {
                        "type": "Record",
                        "attributes": {
                            "viaAdminNetwork": { "type": "Boolean", "required": false }
                        }
                    }
                }
            }
        }
    }
}
</code></pre></div></div>

<p>Where possible, the inputs provided by the context object should be predictable. The developer may consider always setting the <code class="language-plaintext highlighter-rouge">viaAdminNetwork</code> key to simplify.</p>

<p>Alternatively, we can also modify the policy to test for the presence of the key itself, as shown:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>permit(
    principal,
    action,
    resource
);

forbid(
    principal,
    action,
    resource
) unless {
    context has "viaAdminNetwork" &amp;&amp; context.viaAdminNetwork == true
};
</code></pre></div></div>

<p>Developers might also consider overriding an allow result if any evaluation errors are present in the evaluation response, if that outcome is more desirable.</p>

<h2 id="dangers-of-short-circuiting">Dangers of short-circuiting</h2>

<p>Short-circuiting is a performance feature of the Cedar language which allows it to skip evaluation of specific expressions that should not affect the result of the policy evaluation. It is present under the following conditions:</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">expression1 &amp;&amp; expression2</code>: expression2 is not evaluated when expression1 is false</li>
  <li><code class="language-plaintext highlighter-rouge">expression1 || expression2</code>: expression2 is not evaluated when expression1 is true</li>
  <li><code class="language-plaintext highlighter-rouge">if expression1 then expression2 else expression3</code>: expression2 is not evaluated when expression1 is false and expression3 is not evaluated when expression1 is true</li>
</ul>

<p>This is typically a good thing, however it will <em>not</em> produce an error due to an invalid expression unless it actually evaluates that expression. For example, consider the below policy:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>permit (
    principal,
    action == Action::"login",
    resource
)
when { context.isPrimarySite == true || principal.isBreakGlasEntity == true };
</code></pre></div></div>

<p>Note that this policy has the typo <code class="language-plaintext highlighter-rouge">isBreakGlasEntity</code>, which is missing an ‘s’. The intention behind the policy is that the login action is permitted only when accessing from the primary site under normal conditions, or if the principal is a special “break glass” entity under any conditions. This policy works under normal conditions, but due to the typo will error and not permit the break glass entity when they are most needed.</p>

<h3 id="solutions-2">Solutions</h3>

<p>A Cedar schema should again be used to determine the valid entity attributes during the entity modelling process and warn of inconsistencies during the policy authoring phase.</p>

<p>The following Cedar schema should be used to help find the typo during the authoring time of the policy:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{
    "": {
        "entityTypes": {
            "User": {
                "shape": {
                    "type": "Record",
                    "attributes": {
                        "isBreakGlassEntity": { "type": "Boolean", "required": true }
                    }
                }
            }
        },
        "actions": {
            "login": {
                "appliesTo": {
                    "principalTypes": [ "User" ],
                    "context": {
                        "type": "Record",
                        "attributes": {
                            "isPrimarySite": { "type": "Boolean", "required": true }
                        }
                    }
                }
            }
        }
    }
}
</code></pre></div></div>

<p>In addition to schema validation, it is also important to perform positive and negative testing against your policies (in a local or non-production environment) to ensure the policies will act in the way you expect for critical paths.</p>

<h2 id="ambiguous-entity-type">Ambiguous entity type</h2>

<p>When writing condition statements which interact with an entity store, entities don’t have an inherit type associated with them. Consider the following entity store:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[
  {
    "uid": "User::\"alice\"",
    "attrs": {
      "active": true
    }
  },
  {
    "uid": "Action::\"redeemValidTicket\""
  },
  {
    "uid": "Ticket::\"someTicketID\"",
    "attrs": {
      "active": false
    }
  }
]
</code></pre></div></div>

<p>and the policy:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>permit (
    principal,
    action == Action::"redeemValidTicket",
    resource
)
when { resource.active == true };
</code></pre></div></div>

<p>The intention behind this is to allow ticketholders redeem active tickets. The implementing developer allowed the full resource entity ID (<code class="language-plaintext highlighter-rouge">"Ticket::\"someTicketID\""</code>) be passed in as the resource input. Alice can’t redeem the <code class="language-plaintext highlighter-rouge">"Ticket::\"someTicketID\""</code> resource as it is marked as not active, however Alice can perform a successful redemption with the resource entity ID <code class="language-plaintext highlighter-rouge">"User::\"alice\""</code>. Even though her user active attribute was never intended for that purpose, it nonetheless can lead to an unexpected allow.</p>

<h3 id="solutions-3">Solutions</h3>

<p>The developer could enforce that the “Ticket::” prefix is used (or perform the concatenation themselves).</p>

<p>The entity store could be modified to provide a unique attribute that the policy could match on using the <code class="language-plaintext highlighter-rouge">has</code> operator (<code class="language-plaintext highlighter-rouge">resource has "ticketIssueDate"</code>).</p>

<p>The entity store could also be modified to place tickets in a new entity type “TicketGroup” using the parents construct and enforce via policy that the resource is within this group (<code class="language-plaintext highlighter-rouge">resource in TicketGroup::"IssuedTickets"</code>).</p>

<p>Additionally, there is also a <a href="https://github.com/cedar-policy/rfcs/blob/feature/khieta/is-operator/text/0005-is-operator.md">pending RFC</a> that is discussing introducing an <code class="language-plaintext highlighter-rouge">is</code> operator to perform entity matching.</p>

<h2 id="unexpected-order-of-operations">Unexpected order of operations</h2>

<p>Like other languages, Cedar has a de-facto order of operations due to the way the <a href="https://docs.cedarpolicy.com/syntax-grammar.html">grammar</a> is constructed. This means that operations such as math works as you would expect:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>permit (
    principal,
    action,
    resource
)
when { 1 + 2 * 3 + 4 * 5 == 27 }; // always true
</code></pre></div></div>

<p>It’s important to read and understand the grammar before constructing complex and ambiguous policies to avoid unintended effects. Consider the below policy:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>permit (
    principal,
    action,
    resource
)
when {
    if resource.owner == principal then true else false &amp;&amp;
    resource.isRestricted == false
};
</code></pre></div></div>

<p>The intention behind the policy is to allow access when the principal is the resource owner and the resource is not restricted, however the effect of the policy is that a principal who is the resource owner is permitted access even when the resource is marked as restricted.</p>

<p>This is because the order of operations for an <code class="language-plaintext highlighter-rouge">if-then-else</code> operation is higher than that of the <code class="language-plaintext highlighter-rouge">&amp;&amp;</code> operation and so the evaluation of the above condition is intrinsically like so:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>if (resource.owner == principal) then (true) else (false &amp;&amp; resource.isRestricted)
</code></pre></div></div>

<h3 id="solutions-4">Solutions</h3>

<p>Read the <a href="https://docs.cedarpolicy.com/syntax-grammar.html">grammar</a> when in doubt of the order of operations.</p>

<p>If you are ever in doubt, or simply want to be more explicit, use parentheses to explicitly show the intended grouping of operations:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>permit (
    principal,
    action,
    resource
)
when {
    (if resource.owner == principal then true else false) &amp;&amp;
    resource.isRestricted == false
};
</code></pre></div></div>

<h2 id="side-channels">Side channels</h2>

<p>Issues can often arise from the specific implementation that surrounds the use of Cedar, whether via Amazon Verified Permissions or a direct engine implementation. The engine can only evaluate against the inputs you have provided and if those inputs are not sanitized or invalid, it can lead to a compromise.</p>

<p>Late last year, the popular json5 library released a <a href="https://github.com/json5/json5/security/advisories/GHSA-9c47-m6qq-7p4h">security advisory</a> regarding the potential for prototype pollution. If you were to allow a user to specify their own context object, but override certain keys which were used in sensitive operations, an attacker could use this vulnerability to manipulate the inputs the Cedar engine receives.</p>

<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// userInput = '{"foo": "bar", "__proto__": {"isAdmin": true}}'</span>

<span class="kd">const</span> <span class="nx">ctx</span> <span class="o">=</span> <span class="nx">JSON5</span><span class="p">.</span><span class="nx">parse</span><span class="p">(</span><span class="nx">userInput</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">secCheckKeysSet</span><span class="p">(</span><span class="nx">ctx</span><span class="p">,</span> <span class="p">[</span><span class="dl">'</span><span class="s1">isAdmin</span><span class="dl">'</span><span class="p">,</span> <span class="dl">'</span><span class="s1">isMod</span><span class="dl">'</span><span class="p">]))</span> <span class="p">{</span>
  <span class="k">throw</span> <span class="k">new</span> <span class="nb">Error</span><span class="p">(</span><span class="dl">'</span><span class="s1">Forbidden...</span><span class="dl">'</span><span class="p">);</span>
<span class="p">}</span>

<span class="k">return</span> <span class="nx">avpclient</span><span class="p">.</span><span class="nx">isAuthorized</span><span class="p">({</span>
  <span class="dl">'</span><span class="s1">context</span><span class="dl">'</span><span class="p">:</span> <span class="nx">ctx</span><span class="p">,</span>
  <span class="p">...</span>
<span class="p">});</span>
</code></pre></div></div>

<h3 id="solutions-5">Solutions</h3>

<p>As always, a healthy supply-chain security program is recommended for organizations who make heavy use of external libraries. Input sanitization is also an important step to ensure that the engine can make appropriate authorization decisions.</p>

<p>As more and more built-in integrations become available, take advantage of these to shift more of the burden outside of your responsibility and avoid side-channel issues.</p>

<h2 id="wrapping-up">Wrapping up</h2>

<p>As new language bindings, AWS integrations, external integrations, and even <a href="https://github.com/cedar-policy/rfcs/pulls">changes to the Cedar language itself</a> continue to be produced, the overall community and ecosystem is growing. The scenarios above highlight the importance of a solid understanding of the language, but also solutions to help you overcome these hurdles and scale your authorization logic faster than would otherwise be possible.</p>

<p>If you liked what I’ve written, or want to hear more on this topic, reach out to me on Twitter at <a href="https://twitter.com/iann0036">@iann0036</a>. You can also join the discussion over at the official <a href="https://communityinviter.com/apps/cedar-policy/cedar-policy-language">Cedar Slack workspace</a>.</p>]]></content><author><name>Ian Mckay</name></author><summary type="html"><![CDATA[More and more engineers are considering integrating Cedar into their own systems for authorization, but what do policy authors need to consider to avoid unexpected outcomes? In this post, I'll walk through my experiences in where policy authoring can go wrong and the steps you can take to overcome these issues.]]></summary></entry><entry><title type="html">Exploring Amazon VPC Lattice</title><link href="https://onecloudplease.com/blog/exploring-amazon-vpc-lattice" rel="alternate" type="text/html" title="Exploring Amazon VPC Lattice" /><published>2023-04-01T00:00:00+00:00</published><updated>2023-04-01T00:00:00+00:00</updated><id>https://onecloudplease.com/blog/exploring-amazon-vpc-lattice</id><content type="html" xml:base="https://onecloudplease.com/blog/exploring-amazon-vpc-lattice"><![CDATA[<p><img src="/images/posts/crispix.jpg" alt="" /></p>

<p><small><em>(yes, that is a picture of my <a href="https://www.kelloggs.com.au/en_AU/products/crispix-product.html">breakfast</a>)</em></small></p>

<p>Today, AWS has <a href="https://aws.amazon.com/blogs/aws/simplify-service-to-service-connectivity-security-and-monitoring-with-amazon-vpc-lattice-now-generally-available/">released</a> Amazon VPC Lattice to General Availability. This post walks through creating a simple VPC Lattice service using CloudFormation, and takes a look at the service overall.</p>

<p>VPC Lattice was my <a href="https://twitter.com/iann0036/status/1599318778709704711">#1 favourite announcement</a> of AWS re:Invent 2022, so I’m excited to see it released today. As of the time of writing, it’s available in US East (Ohio), US East (N. Virginia), US West (Oregon), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), and Europe (Ireland).</p>

<h2 id="how-it-works">How it works</h2>

<p>VPC Lattice is a service that enables you to connect clients to services within a VPC. It is very similar to AWS PrivateLink (also known as private VPC Endpoints), but with a key difference.</p>

<p>Whilst PrivateLink works by placing Elastic Network Interfaces within your subnet, which your clients can hit to tunnel network traffic through to the destination service, VPC Lattice works by exposing endpoints as link-local addresses. <a href="https://en.wikipedia.org/wiki/Link-local_address">Link-local addresses</a> are (generally) only accessible by software that runs on the client instance itself.</p>

<p>AWS has carved out the range <code class="language-plaintext highlighter-rouge">169.254.171.0/24</code> for VPC Lattice’s use, typically routing directly to <code class="language-plaintext highlighter-rouge">169.254.171.0</code> (there’s also an IPv6 equivalent). This is not the first network that AWS exposes via link-local addresses. You may know of:</p>

<ul>
  <li>EC2’s Instance Metadata Service, which is located at <code class="language-plaintext highlighter-rouge">169.254.169.254</code></li>
  <li>Route 53’s DNS Resolver, which is located at <code class="language-plaintext highlighter-rouge">169.254.169.253</code></li>
  <li>ECS’s Task Metadata Endpoint, which is located at <code class="language-plaintext highlighter-rouge">169.254.170.2</code></li>
  <li>Amazon Time Sync Service (NTP), which is located at <code class="language-plaintext highlighter-rouge">169.254.169.123</code></li>
</ul>

<p>Generally, these endpoints are automatically available to clients within the VPC network without any special routing or security rules. VPC Lattice differs from this slightly, as it requires Security Groups and NACLs to allow traffic to and from the VPC Lattice data plane at <code class="language-plaintext highlighter-rouge">169.254.171.0/24</code> on whichever port the destination service exposes. I was pretty surprised by this requirement when I saw it as it’s the first link-local address to need this, but it does give network administrators some basic control. Generally, it’s advised to use a <a href="https://docs.aws.amazon.com/vpc-lattice/latest/ug/security-groups.html#managed-prefix-list">managed prefix list</a> instead of the exact range above, as it’s subject to change.</p>

<p>Targets which VPC Lattice connects to closely match that of load balancing target groups, including EC2 instances, VPC IP addresses (both IPv4 and IPv6), Lambda functions, and ALBs. An EKS-specific target type is in private beta as of the time of writing.</p>

<h2 id="a-walkthrough">A walkthrough</h2>

<p><img src="/images/posts/vpclattice.drawio.png" alt="" /></p>

<p>For this walkthrough, we’ll discuss the various components needed for a VPC Lattice setup. For simplicity, we’ll be creating a Lambda function as a client (initiates a HTTPS request), and another Lambda function as a server (responds to the HTTPS request). If you want to skip ahead, here’s the <a href="https://github.com/iann0036/vpc-lattice-demo/blob/main/template.yaml">completed template</a>.</p>

<p>Let’s begin by creating a basic VPC. The VPC will have two private subnets, but we won’t add any direct routing between them. For simplicity, we’ll also skip adding Network ACLs.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Resources:

  # Basic VPC

  VPC:
    Type: AWS::EC2::VPC
    Properties:
      CidrBlock: 10.0.0.0/16
      EnableDnsHostnames: true
      EnableDnsSupport: true

  PrivateSubnet1:
    Type: AWS::EC2::Subnet
    Properties:
      CidrBlock: 10.0.0.0/24
      MapPublicIpOnLaunch: false
      VpcId: !Ref VPC
      Tags:
        - Key: Name
          Value: Private Subnet (Source Subnet)
      AvailabilityZone: !Select
        - 0
        - Fn::GetAZs: !Ref AWS::Region

  PrivateSubnet2:
    Type: AWS::EC2::Subnet
    Properties:
      CidrBlock: 10.0.1.0/24
      MapPublicIpOnLaunch: false
      VpcId: !Ref VPC
      Tags:
        - Key: Name
          Value: Private Subnet (Destination Subnet)
      AvailabilityZone: !Select
        - 1
        - Fn::GetAZs: !Ref AWS::Region

  RouteTablePrivate1:
    Type: AWS::EC2::RouteTable
    Properties:
      VpcId: !Ref VPC
      Tags:
        - Key: Name
          Value: Private Route Table (Source Subnet)

  RouteTablePrivate1Association1:
    Type: AWS::EC2::SubnetRouteTableAssociation
    Properties:
      RouteTableId: !Ref RouteTablePrivate1
      SubnetId: !Ref PrivateSubnet1

  RouteTablePrivate2:
    Type: AWS::EC2::RouteTable
    Properties:
      VpcId: !Ref VPC
      Tags:
        - Key: Name
          Value: Private Route Table (Destination Subnet)

  RouteTablePrivate2Association1:
    Type: AWS::EC2::SubnetRouteTableAssociation
    Properties:
      RouteTableId: !Ref RouteTablePrivate2
      SubnetId: !Ref PrivateSubnet2
</code></pre></div></div>

<p>Next, we’ll create the service itself. The service will be a Lambda function which performs a basic successful response to any requests, whilst including it’s own event payload in its response body. The function will be within the second private subnet within the VPC, and its security group will only have a single inbound rule from the VPC Lattice service on the port in which it serves.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>  # Inbound Lambda (Service)

  InboundLambdaFunctionRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Statement:
          - Action: sts:AssumeRole
            Effect: Allow
            Principal:
              Service: lambda.amazonaws.com
      Policies:
        - PolicyName: root
          PolicyDocument:
            Statement:
              - Effect: Allow
                Action:
                  - logs:CreateLogGroup
                  - logs:CreateLogStream
                  - logs:PutLogEvents
                  - xray:PutTraceSegments
                  - xray:PutTelemetryRecords
                  - ec2:CreateNetworkInterface
                  - ec2:DescribeNetworkInterfaces
                  - ec2:DeleteNetworkInterface
                Resource: '*'

  InboundLambdaFunction:
    Type: AWS::Lambda::Function
    Properties:
      Handler: index.handler
      Role: !GetAtt InboundLambdaFunctionRole.Arn
      TracingConfig:
        Mode: Active
      Runtime: python3.9
      Timeout: 10
      Code:
        ZipFile: |
          import os
          import json
          import http.client

          def handler(event, context):
            print(event)
            return {
              "statusCode": 200,
              "body": json.dumps({
                "success": "true",
                "capturedEvent": event
              }),
              "headers": {
                "Content-Type": "application/json"
              }
            }
      VpcConfig:
        SecurityGroupIds:
          - !Ref InboundLambdaFunctionSecurityGroup
        SubnetIds:
          - !Ref PrivateSubnet2

  InboundLambdaFunctionSecurityGroup:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupDescription: Security group for InboundLambdaFunction
      VpcId: !Ref VPC
      SecurityGroupEgress: []
      SecurityGroupIngress:
        - IpProtocol: tcp
          FromPort: 443
          ToPort: 443
          CidrIp: 169.254.171.0/24 # should be the prefix list instead, this'll work though
      GroupName: demo-inboundsg
</code></pre></div></div>

<p>Next up, we’ll create the components of the VPC Lattice service itself. This includes:</p>
<ul>
  <li>The service network</li>
  <li>A security group which controls which clients may access the service network</li>
  <li>The service we are creating</li>
  <li>A listener for the service (HTTPS on port 443)</li>
  <li>A target group for the listener to point to, with an initial target of the previously created Lambda function</li>
</ul>

<p>To keep things simple, we’re not adding an auth policy for the service network or the service itself.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>  # VPC Lattice

  VPCLatticeServiceNetwork:
    Type: AWS::VpcLattice::ServiceNetwork
    Properties:
      Name: demo-servicenetwork
      AuthType: NONE

  VPCLatticeServiceNetworkSecurityGroup:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupDescription: Security group for service network access
      VpcId: !Ref VPC
      SecurityGroupEgress: []
      SecurityGroupIngress:
        - IpProtocol: tcp
          FromPort: 443
          ToPort: 443
          CidrIp: !GetAtt VPC.CidrBlock
      GroupName: demo-servicenetworksg

  VPCLatticeServiceNetworkVPCAssociation:
    Type: AWS::VpcLattice::ServiceNetworkVpcAssociation
    Properties:
      SecurityGroupIds:
        - !Ref VPCLatticeServiceNetworkSecurityGroup
      ServiceNetworkIdentifier: !Ref VPCLatticeServiceNetwork
      VpcIdentifier: !Ref VPC

  VPCLatticeService:
    Type: AWS::VpcLattice::Service
    Properties:
      Name: demo-service
      AuthType: NONE

  VPCLatticeServiceNetworkServiceAssociation:
    Type: AWS::VpcLattice::ServiceNetworkServiceAssociation
    Properties:
      ServiceNetworkIdentifier: !Ref VPCLatticeServiceNetwork
      ServiceIdentifier: !Ref VPCLatticeService

  VPCLatticeListener:
    Type: AWS::VpcLattice::Listener
    Properties:
      Name: demo-listener
      Port: 443
      Protocol: HTTPS
      ServiceIdentifier: !Ref VPCLatticeService
      DefaultAction:
        Forward:
          TargetGroups:
            - TargetGroupIdentifier: !Ref VPCLatticeTargetGroup
              Weight: 100

  VPCLatticeTargetGroup:
    Type: AWS::VpcLattice::TargetGroup
    Properties:
      Name: demo-targetgroup
      Type: LAMBDA
      Targets:
        - Id: !GetAtt InboundLambdaFunction.Arn
</code></pre></div></div>

<p>It’s important to note that by associating the service network to the VPC, there are routes created within the VPCs route table to correctly send traffic destined towards <code class="language-plaintext highlighter-rouge">169.254.171.0/24</code> to the VPC Lattice service.</p>

<p><img src="/images/posts/vpclattice-1.png" alt="" /></p>

<p>The target group also automatically adds a resource-based policy statement to the Lambda function for you (some other services require you to explicitly add an <code class="language-plaintext highlighter-rouge">AWS::Lambda::Permission</code>).</p>

<p><img src="/images/posts/vpclattice-3.png" alt="" /></p>

<p>Finally, we’ll create the client which will send requests to the VPC Lattice service. Again, this will be driven via a basic Lambda function. Note that this time, the security group requires an outbound rule towards the VPC Lattice service.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>  # Outbound Lambda (Client)

  OutboundLambdaFunctionRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Statement:
          - Action: sts:AssumeRole
            Effect: Allow
            Principal:
              Service: lambda.amazonaws.com
      Policies:
        - PolicyName: root
          PolicyDocument:
            Statement:
              - Effect: Allow
                Action:
                  - logs:CreateLogGroup
                  - logs:CreateLogStream
                  - logs:PutLogEvents
                  - xray:PutTraceSegments
                  - xray:PutTelemetryRecords
                  - ec2:CreateNetworkInterface
                  - ec2:DescribeNetworkInterfaces
                  - ec2:DeleteNetworkInterface
                Resource: '*'

  OutboundLambdaFunction:
    Type: AWS::Lambda::Function
    Properties:
      Handler: index.handler
      Role: !GetAtt OutboundLambdaFunctionRole.Arn
      TracingConfig:
        Mode: Active
      Runtime: python3.9
      Environment:
        Variables:
          ENDPOINT: !GetAtt VPCLatticeServiceNetworkServiceAssociation.DnsEntry.DomainName
      Timeout: 10
      Code:
        ZipFile: |
          import os
          import json
          import http.client

          def handler(event, context):
            conn = http.client.HTTPSConnection(os.environ["ENDPOINT"])

            conn.request("POST", "/", json.dumps(event), {
              "Content-Type": 'application/json'
            })
            res = conn.getresponse()
            data = res.read()

            print(data.decode("utf-8"))
      VpcConfig:
        SecurityGroupIds:
          - !Ref OutboundLambdaFunctionSecurityGroup
        SubnetIds:
          - !Ref PrivateSubnet1

  OutboundLambdaFunctionSecurityGroup:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupDescription: Security group for OutboundLambdaFunction
      VpcId: !Ref VPC
      SecurityGroupEgress:
        - IpProtocol: tcp
          FromPort: 443
          ToPort: 443
          CidrIp: 169.254.171.0/24 # should be the prefix list instead, this'll work though
      SecurityGroupIngress: []
      GroupName: demo-outboundsg
</code></pre></div></div>

<p>Now that our template is done, we can deploy it via CloudFormation. If you got stuck anywhere, try the pre-made version <a href="https://github.com/iann0036/vpc-lattice-demo">here</a>.</p>

<p>Once deployed, navigate to the Lambda console and find the function named something similar to “OutboundLambdaFunction”. Create a test event using any JSON object and invoke it. You should see the results from the service come back to you by observing the logs.</p>

<p><img src="/images/posts/vpclattice-2.png" alt="" /></p>

<h2 id="a-note-on-pricing">A note on pricing</h2>

<p>It’s worth noting that the pricing model for VPC Lattice is different to that of PrivateLink and will probably end up costing you more overall. For N. Virginia, a PrivateLink service costs $0.01/hour <em>per availability zone</em>, plus $0.01/GB with volume discounts. For the same region, a VPC Lattice service costs $0.025/hour <em>regardless of AZs</em>, plus $0.025/GB with no volume discounts, plus $0.10 per million requests (with the first 300k requests per hour free).</p>

<h2 id="wrapping-up">Wrapping up</h2>

<p>I’m interested to see how architectures will evolve with this new technology. Whilst PrivateLink remains more affordable and already widespread, I can see architects reaching for this new technology to improve their security posture and reduce the load on networking engineers.</p>

<p>If you liked what I’ve written, or want to hear more on this topic, reach out to me on Twitter at <a href="https://twitter.com/iann0036">@iann0036</a>.</p>]]></content><author><name>Ian Mckay</name></author><summary type="html"><![CDATA[Today, AWS has released Amazon VPC Lattice to General Availability. This post walks through creating a simple VPC Lattice service using CloudFormation, and takes a look at the service overall.]]></summary></entry><entry><title type="html">Cedar: A new policy language</title><link href="https://onecloudplease.com/blog/cedar-a-new-policy-language" rel="alternate" type="text/html" title="Cedar: A new policy language" /><published>2023-01-11T00:00:00+00:00</published><updated>2023-01-11T00:00:00+00:00</updated><id>https://onecloudplease.com/blog/cedar-a-new-policy-language</id><content type="html" xml:base="https://onecloudplease.com/blog/cedar-a-new-policy-language"><![CDATA[<p><img src="/images/posts/cedar-photo.png" alt="" /></p>

<p>Cedar is a new language created by AWS to define access permissions using policies, similar to the way IAM policies work today. In this post, we’ll look at why this language was created, how to author the policies, and some additional features of the language. The language was designed by the <a href="https://www.amazon.science/blog/a-gentle-introduction-to-automated-reasoning">Amazon automated reasoning team</a> for use in new services such as <a href="https://aws.amazon.com/verified-permissions/">Amazon Verified Permissions</a>, <a href="https://aws.amazon.com/verified-access/">AWS Verified Access</a> and likely other future services and integrations.</p>

<h2 id="why-write-a-new-language">Why write a new language?</h2>

<p>IAM policies, introduced <a href="https://aws.amazon.com/blogs/aws/iam-identity-access-management/">over 11 years ago</a>, have been integrated into the AWS ecosystem as the fundamental way to control both human and system access to AWS resources. IAM policies are highly optimized for AWS and have constructs (like ARNs) which make it not suitable for usage on principals and resources outside of AWS.</p>

<p>Cedar is a <em>generalist</em> language which has no implicit AWS constructs within it, and this allows it to be used as an authorization engine for non-AWS applications. This is why it’s used at the core of the Amazon Verified Permissions service, where AWS manages the policy dataset and allows systems to directly make authorization calls against the evaluation engine. Incidentally, the name “Cedar” was coined as a follow on from the internal policy language of IAM, “Balsa”.</p>

<p>Cedar is written in Rust, which makes it run in milliseconds, and was designed to be simple to reason about the effect of policies. For example, it allows for the creation of tooling which takes two policies and determines whether they are exactly equivalent, or whether there are authorization requests that would differ in the result when evaluated against each policy.</p>

<h2 id="how-it-works">How it works</h2>

<p>The policy evaluation engine for the Cedar language takes one or more policies, and evaluates whether a requested action is permitted or forbidden (allowed or denied). Cedar requires the principal making the request, the action being taken, the resource being accessed, and optionally additional request context at the time of the authorization call. Cedar also consumes the policies to be evaluated and may also use a list of entities (principals, actions and resources) that exist within your application, however these may be provided ahead of time or indirectly depending upon the service integration.</p>

<p>The request context object may be set by the requesting application or, in the case of AWS Verified Access, <a href="https://docs.aws.amazon.com/verified-access/latest/ug/trust-data-default-context.html">defined</a> by the service.</p>

<p>Cedar has a <a href="https://www.cedarpolicy.com/playground">playground</a> which allows you to play with the engine itself. It is also currently integrated into the Amazon Verified Permissions and AWS Verified Access services. As of the time of writing, Cedar is not available as an open-source or otherwise downloadable library.</p>

<h3 id="syntax">Syntax</h3>

<p>A typical Cedar policy statement looks like the following:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>permit(
    principal == User::"John",
    action == Action::"view",
    resource
)
when {
    resource in Folder::"John's Stuff" &amp;&amp;
    context.authenticated == true
};
</code></pre></div></div>

<p>A policy can contain a number of statements by simply appending them onto the policy document. The syntax is not whitespace dependent and may be compressed into a single line. Typically, principals and resources should use immutable identifiers and not names. The examples in this post use simple names for readability purposes only.</p>

<p>The policy contains the following parts:</p>

<ol>
  <li>The effect, which will always be either <code class="language-plaintext highlighter-rouge">permit</code> or <code class="language-plaintext highlighter-rouge">forbid</code></li>
  <li>The scope, which specifies the principals, actions, and resources to which the effect applies</li>
  <li>Optionally, condition clauses, which may either be a <code class="language-plaintext highlighter-rouge">when</code> or an <code class="language-plaintext highlighter-rouge">unless</code> condition</li>
</ol>

<p>Entities (principals, actions or resources) will always follow the format <code class="language-plaintext highlighter-rouge">TypeOfEntity::"UniqueIdentifier"</code>. The type of entity may be further namespaced, for example, <code class="language-plaintext highlighter-rouge">Company::Account::Department::Person::"John"</code>.</p>

<p>Entity types are ambiguous and not determined by their namespace. This means a single entity can be either a principal, action or resource, depending upon the specific context. The only exception is that actions must have their rightmost namespace use the keyword <code class="language-plaintext highlighter-rouge">Action</code> (i.e. <code class="language-plaintext highlighter-rouge">Action::"MyAction"</code>, <code class="language-plaintext highlighter-rouge">CustomNamespace::Action::"MyAction"</code>).</p>

<h3 id="evaluation-logic">Evaluation logic</h3>

<p>When evaluating a request, Cedar will consider all statements within the policy, and in the case of Amazon Verified Permissions, all policies provided in a policy store (as if it were one big policy). If <em>any</em> <code class="language-plaintext highlighter-rouge">forbid</code> statement matches the request, the request will be denied, regardless of any <code class="language-plaintext highlighter-rouge">permit</code> statements. If <em>at least one</em> <code class="language-plaintext highlighter-rouge">permit</code> statement matches the request (and no <code class="language-plaintext highlighter-rouge">forbid</code> statements match), the request will be allowed. If no statements match, the request will be implicitly denied.</p>

<p>If you’ve worked with AWS IAM, you’ll recognize Cedar’s policy evaluation logic is the same. This also means that ordering of statements in a policy is irrelevant and has no effect on the outcome of an authorization request.</p>

<p>Because <code class="language-plaintext highlighter-rouge">forbid</code> statements are applied universally without the ability to override, they are commonly used to craft guardrails across the entire policy store.</p>

<h3 id="the-scope">The scope</h3>

<p>The scope is written in a way that almost looks like a set of arguments in a function. It always consists of the keywords <code class="language-plaintext highlighter-rouge">principal</code>, <code class="language-plaintext highlighter-rouge">action</code> and <code class="language-plaintext highlighter-rouge">resource</code>. Each of these keywords may optionally be followed by either an <code class="language-plaintext highlighter-rouge">== Some::"Entity" </code> or an <code class="language-plaintext highlighter-rouge">in Some::"Group"</code> to scope down the principals, actions or resources in which the statement applies to. In addition, an inline set in the form <code class="language-plaintext highlighter-rouge">in [ Some::"Entity", SomeOther::"Entity", ... ]</code> can be used for the <code class="language-plaintext highlighter-rouge">action</code> keyword only. When no keywords have this suffix, the policy applies to all requests, so long as the conditions are met.</p>

<p>The scope is generally used for role-based access control, where you would like to apply policies scoped to a specific defined or set of resources, actions, principals, or combination thereof.</p>

<h3 id="condition-clauses">Condition clauses</h3>

<p>Condition clauses further limit whether a policy takes effect for the specific request. Typically policy statements will either have no condition clauses or one condition clause, however the syntax does allow for any number of condition clauses to form a statement.</p>

<p>Condition clauses are more flexible than the scope, featuring a basic set of <a href="https://docs.aws.amazon.com/verified-access/latest/ug/built-in-policy-operators.html">operators</a> to allow you to form a boolean result of acceptance based off of the principal, action, resource or context of the request, as well as the attributes or nested hierarchy of these entities where a list of entities has been defined. The use of logical operators such as <code class="language-plaintext highlighter-rouge">&amp;&amp;</code> and <code class="language-plaintext highlighter-rouge">||</code> allow you to form long, complex conditions to match your specific requirements. The <code class="language-plaintext highlighter-rouge">like</code> operator allows you to perform string matching with the use of a <code class="language-plaintext highlighter-rouge">*</code> wildcard character.</p>

<p>Condition clauses are intended to perform attribute-based access control. Though it is possible to include scope conditions within a condition clause, exactly the way you would in the scope, it’s recommended that you retain those scope conditions in the scope for both readability and performance reasons.</p>

<h2 id="additional-language-features">Additional language features</h2>

<p>Using the above syntax is all you need to start writing basic statements to permit or forbid access to your application, however there are some more features of the language which we’ll go through. Some of these features may not be available or useful depending upon the service in which Cedar is integrated into.</p>

<h3 id="comments">Comments</h3>

<p>Policies may contain the <code class="language-plaintext highlighter-rouge">//</code> operator to add comments, which are particularly useful for indicating an abstract identifier, for example:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>// the following was added by the accounts team
// it was approved by Jane Doe
permit(
    principal == User::"9a6afab1-5a37-4c90-aa40-24277b93ca28", // John Smith
    action,
    resource == Account::"710f18bc-b8ab-4313-b362-8e6264cfcf91" // MyCorp Dev Account
);
</code></pre></div></div>

<h3 id="entities">Entities</h3>

<p>Cedar supports accepting a list of known entities (resources, actions or principals) within a system. This is helpful as you may author policies which interact with the hierarchy or attributes of the entities within condition clauses. When an authorization request is made, the principal, action and resource identifiers will correlate to the defined entity of the same identifier when present in the entity list.</p>

<p>The structure of the entity list differs from service to service. In the Cedar playground, the entity list looks like the following:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[
  {
    "uid": "User::\"john\"",
    "parents": [
      "UserGroup::\"Staff\""
    ],
    "attrs": {
      "department": "Hardware Engineering",
      "age": 30
    }
  },
  {
    "uid": "UserGroup::\"Staff\""
  }
]
</code></pre></div></div>

<p>In Amazon Verified Permissions (for an <code class="language-plaintext highlighter-rouge">IsAuthorized</code> call), the same entity list would look like this:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[
  {
    "EntityId": {
      "EntityType": "User",
      "EntityId": "john"
    },
    "Parents": [
      {
        "EntityType": "UserGroup",
        "EntityId": "Staff"
      }
    ],
    "Attributes": {
      "department": {
        "String": "Hardware Engineering"
      },
      "age": {
        "Long": 30
      }
    }
  },
  {
    "EntityId": {
      "EntityType": "UserGroup",
      "EntityId": "Staff"
    }
  }
]
</code></pre></div></div>

<p>We can use the known attributes in the entity to construct policies that permit or forbid access. For example:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>permit(
    principal,
    action == Action::"Access",
    resource == Room::"Drinks Lounge"
) when {
    principal.age &gt;= 18
};
</code></pre></div></div>

<p>This policy allows access only when the principal has the attribute “age”, and its value is equal to or greater than the number 18. If the age attribute wasn’t set, or the principal wasn’t defined at all in the entities list, this statement wouldn’t permit access.</p>

<p>The entities can also have the concept of a hierarchy, at any nesting level, to act based on this. For example:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>permit(
    principal,
    action == Action::"Access",
    resource == Room::"Common Area"
) when {
    principal in UserGroup::"Staff"
};
</code></pre></div></div>

<p>This policy allows any entity which has a parent of the <code class="language-plaintext highlighter-rouge">UserGroup::"Staff"</code> entity access. Once again, if the entity isn’t defined or isn’t a child of <code class="language-plaintext highlighter-rouge">UserGroup::"Staff"</code>, this statement wouldn’t permit access. The <code class="language-plaintext highlighter-rouge">in</code> operator applies to both direct children, as well as all descendants of those children. Additionally, the <code class="language-plaintext highlighter-rouge">in</code> operator also applies to the referenced parent, i.e. if the principal was <code class="language-plaintext highlighter-rouge">UserGroup::"Staff"</code> in the above example the policy would permit access.</p>

<h3 id="extensions">Extensions</h3>

<p>In addition to the base data types of strings, booleans, integers and sets/arrays, Cedar supports the additional data types of IP addresses, and decimals. These two data types can only be declared using a function call-like syntax, and can only be operated on using their in-built methods. These data types are known as extensions.</p>

<p>In the case of IP addresses, the syntax looks like the following:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>permit(
    principal,
    action,
    resource
) when {
    ip(context.client_ip).isInRange("10.0.0.0/8")
};
</code></pre></div></div>

<p>The IP address type is created using the <code class="language-plaintext highlighter-rouge">ip(...)</code> syntax, and calls the <code class="language-plaintext highlighter-rouge">isInRange(...)</code> function to return a boolean. A similar effect is seen for the use of the decimal types:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>forbid(
    principal,
    action,
    resource
) when {
    decimal(context.risk_score).greaterThan(decimal("7.2"))
};
</code></pre></div></div>

<p>Because Cedar does not allow any floating point types to be passed in, inputs must be in the form of a string (i.e. “8.24”). Decimal supports up to 4 digits after the decimal point.</p>

<p>Both extensions have a number of other methods available, all of which currently return a boolean result.</p>

<h3 id="policy-templates">Policy templates</h3>

<p>Policy templates is a Cedar feature useful for applying a common policy to a large group of principals or resources. A policy template allows you to add a variable substitution to the equality operators in the scope block for the <code class="language-plaintext highlighter-rouge">principal</code> and/or <code class="language-plaintext highlighter-rouge">resource</code> keywords. A policy template by itself is not effective, but allows policies to be created by simply providing the variable values instead of duplicating the full syntax. Policies generated from policy templates will automatically update if a policy template changes. A policy template may look like this:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>permit(
    principal == ?principal,
    action == Action::"download",
    resource in ?resource
) when {
    context.mfa == true
};
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">?principal</code> and <code class="language-plaintext highlighter-rouge">?resource</code> keywords represent the variables that may be substituted. A policy created from this template would allow the principal to download all children of the resource when accessing using MFA.</p>

<h2 id="examples">Examples</h2>

<p>The following is a set of examples to help you get started and understand the language.</p>

<h3 id="allow-all">Allow all</h3>

<p><em>Policy:</em></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>permit(
    principal,
    action,
    resource
);
</code></pre></div></div>

<p>This statement permits all requests. It may be restricted by <code class="language-plaintext highlighter-rouge">forbid</code> statements elsewhere in the policy set.</p>

<h3 id="deny-all">Deny all</h3>

<p><em>Policy:</em></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>forbid(
    principal,
    action,
    resource
);
</code></pre></div></div>

<p>This statement forbids all requests. It cannot be overridden and renders all other statements in the policy set useless.</p>

<h3 id="specific-rbac-policy">Specific RBAC policy</h3>

<p><em>Policy:</em></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>permit(
    principal == Customer::"John",
    action == Action::"checkout",
    resource == CheckoutCounter::"12"
);
</code></pre></div></div>

<p>This statement allows customer “John” to checkout at checkout counter 12.</p>

<h3 id="when-condition-clause">When condition clause</h3>

<p><em>Policy:</em></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>permit(
    principal,
    action == Action::"connectDatabase",
    resource == Database::"db1"
) when {
    context.port == 5432
};
</code></pre></div></div>

<p><em>Context:</em></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{
    "port": 5432
}
</code></pre></div></div>

<p>This statement allows any principal to connect to database “db1”, so long as the “port” attribute in their request context is 5432.</p>

<h3 id="unless-condition-clause">Unless condition clause</h3>

<p><em>Policy:</em></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>permit(
    principal,
    action in [HTTPMethod::Action::"GET", HTTPMethod::Action::"POST", HTTPMethod::Action::"DELETE"],
    resource
) unless {
    [Viewer::"anonymous", Viewer::"unknown"].contains(principal) ||
    context.waf_risk_rating &gt;= 7
};
</code></pre></div></div>

<p><em>Context:</em></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{
    "waf_risk_rating": 8.5
}
</code></pre></div></div>

<p>This statement allows any principal to perform a HTTP GET, POST or DELETE against any resource unless they are identified as an anonymous or unknown viewer or their WAF risk rating is greater than or equal to 7.</p>

<h3 id="ip-and-decimal-usage">IP and decimal usage</h3>

<p><em>Policy:</em></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>permit(
    principal,
    action == HTTPMethod::Action::"GET",
    resource
) when {
    (
        // local subnet or same machine
        ip(context.http_request.client_ip).isInRange(ip("10.0.0.0/8")) ||
        ip(context.http_request.client_ip).isLoopback()
    ) &amp;&amp;
    decimal(context.risk_score).lessThan(decimal("6.5"))
};
</code></pre></div></div>

<p><em>Context:</em></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{
    "http_request": {
        "client_ip": "10.0.1.54"
    },
    "risk_score": "4.7"
}
</code></pre></div></div>

<p>This statement allows any principal to perform a HTTP GET against any resource when their IP address is within the 10.0.0.0/8 or loopback CIDR range and the value of the string-encoded risk score is less than 6.5.</p>

<h3 id="entity-attributes">Entity attributes</h3>

<p><em>Policy:</em></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>permit(
    principal,
    action == SecuritySystem::Action::"swipeCardAccess",
    resource == Room::"Sydney Boardroom"
) when {
    principal.location like "Sydney*" ||
    principal.training.contains("All Access")
};
</code></pre></div></div>

<p><em>Entities:</em></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[
    {
        "uid": "Employee::\"1453\"",
        "attrs": {
            "location": "Sydney East",
            "training": [
                "General"
            ]
        }
    },
    {
        "uid": "Employee::\"325\"",
        "attrs": {
            "location": "Los Angeles",
            "training": [
                "General",
                "All Access"
            ]
        }
    }
]
</code></pre></div></div>

<p>This statement allows any principal to swipe card access to the Sydney Boardroom if their location attribute starts with “Sydney” or their training attribute contains the “All Access” item. Both employees 1453 and 325 would be permitted under this statement.</p>

<h3 id="entity-attributes-relationship">Entity attributes relationship</h3>

<p><em>Policy:</em></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>permit(
    principal,
    action == HTTP::Action::"GET",
    resource
) when {
    resource.owner == principal.username
};
</code></pre></div></div>

<p><em>Entities:</em></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[
    {
        "uid": "User::\"Josh\"",
        "attrs": {
            "username": "josh1"
        }
    },
    {
        "uid": "File::\"blogpost.txt\"",
        "attrs": {
            "owner": "josh1"
        }
    }
]
</code></pre></div></div>

<p>This statement allows any principal to HTTP GET a file which they have ownership of. The entity <code class="language-plaintext highlighter-rouge">User::"Josh"</code> would be permitted to perform a <code class="language-plaintext highlighter-rouge">HTTP::Action::"GET"</code> on the <code class="language-plaintext highlighter-rouge">File::"blogpost.txt"</code> entity.</p>

<h3 id="entity-inheritance">Entity inheritance</h3>

<p><em>Policy:</em></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>forbid(
    principal,
    action,
    resource == Application::"oracle"
) unless {
    principal in Group::"Admins"
};
</code></pre></div></div>

<p><em>Entities:</em></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[
    {
        "uid": "User::\"Ian\"",
        "parents": [
            "Group::\"Admins\"",
            "Group::\"Users\""
        ]
    }
]
</code></pre></div></div>

<p>This statement forbids any principal to perform any action against the oracle application unless they are a part of the Admins group. The entity <code class="language-plaintext highlighter-rouge">User::"Ian"</code> would be exempt from this forbid statement.</p>

<h3 id="policy-template">Policy template</h3>

<p><em>Policy Template:</em></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>permit(
    principal == ?principal,
    action == Action::"Connect",
    resource == ?resource
);
</code></pre></div></div>

<p><em>Policy Variables:</em></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>principal: User::"Harry"
resource: VPN::"vpn1"
</code></pre></div></div>

<p>The policy created from the policy template allows the user Harry to connect to the VPN “vpn1”.</p>

<h2 id="wrapping-up">Wrapping up</h2>

<p>The Cedar language is both excitingly new and comfortingly familiar. It opens a new world of possible use cases and, of course, a new set of challenges and considerations. I look forward to seeing how the language gets used in real world scenarios and the ways people will architect their applications around the services Cedar supports.</p>

<p>A big thank you to members from the identity and automated reasoning teams for helping answer some questions I had during the creation of this post. If you liked what I’ve written, or want to hear more on this topic, reach out to me on Twitter at <a href="https://twitter.com/iann0036">@iann0036</a>.</p>]]></content><author><name>Ian Mckay</name></author><summary type="html"><![CDATA[Cedar is a new language created by AWS to define access permissions using policies, similar to the way IAM policies work today. In this post, we'll look at why this language was created, how to author the policies, and some additional features of the language.]]></summary></entry><entry><title type="html">Patching the AWS JavaScript SDK for Service Workers</title><link href="https://onecloudplease.com/blog/patching-the-aws-js-sdk" rel="alternate" type="text/html" title="Patching the AWS JavaScript SDK for Service Workers" /><published>2022-01-11T00:00:00+00:00</published><updated>2022-01-11T00:00:00+00:00</updated><id>https://onecloudplease.com/blog/patching-the-aws-js-sdk</id><content type="html" xml:base="https://onecloudplease.com/blog/patching-the-aws-js-sdk"><![CDATA[<p><img src="/images/posts/nodejssw.png" alt="" /></p>

<p>The AWS JavaScript SDK supports Node.js, React Native and web browsers, but what if you’re running in a <a href="https://developers.google.com/web/fundamentals/primers/service-workers">service worker</a>? In this post, I’ll explain how I modified version 2 of the AWS JavaScript SDK to run within a service worker context.</p>

<h2 id="background">Background</h2>

<p>For the <a href="https://onecloudplease.com/project/former2">Former2</a> project, I produce browser extensions for most major browsers in order to bypass the lack of CORS for the <a href="https://github.com/aws/aws-sdk-js/blob/master/SERVICES.md">majority</a> of AWS services. This means that I embed a copy of the AWS JavaScript SDK in order to make the calls needed via the browser extension, which has authority to ignore the lack of CORS.</p>

<p>The browser extensions use a “manifest”, which details the functionality of the extension and what actions are permitted. Google is <a href="https://developer.chrome.com/docs/extensions/mv3/mv2-sunset/">sunsetting</a> version 2 of the manifest for Google Chrome and requires all extensions to move to manifest version 3 by the end of 2022. Along with some <a href="https://developer.chrome.com/docs/extensions/mv3/intro/mv3-migration/">structural</a> differences, one of the major changes required is to move from background pages (logic that runs in the background of an extension) to service workers.</p>

<p>Service workers (which are a subset of JavaScript workers) have greater limitations than background pages, including the lack of access to the DOM and its features, as well as the replacement of <a href="https://developer.mozilla.org/en-US/docs/Web/API/Worker">XMLHttpRequest for fetch</a>. Service workers will also move to an inactive state if unused in a short period of time, meaning initialized variable data isn’t persisted, though I’ve skipped talking about my specific remediations to this in this article (hint: use IndexedDB).</p>

<h2 id="the-challenge">The Challenge</h2>

<p>Version 3 of the AWS JavaScript SDK is written in a way that it’s supported in a service worker context, but version 2 does not due to a variety of reasons. If you’re already using version 3 of the SDK, or are starting development on a service worker from scratch using version 3, you won’t have a problem.</p>

<p>As the Former2 project heavily relies on the syntax of version 2 of the SDK, as well as the fact that the service calls a majority of available services in the SDK, I wanted to avoid a migration effort to version 3 of the SDK. Others with existing projects making heavy use of SDK version 2 that are seeking to move to service workers (or <a href="https://workers.cloudflare.com/">CloudFlare Workers</a>) might also benefit from this.</p>

<p>Note that this is not an official change, and these changes could break current or future functionality in unintended ways, so I don’t recommend you use this in a production context.</p>

<h2 id="attempting-to-import">Attempting to import</h2>

<p>After performing the changes to the browser extension manifest, my first issue was that the SDK script could no longer be directly loaded into the shared DOM model.</p>

<p><strong>Before:</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>"background":  {
  "scripts": [
    "aws-sdk-2.1046.0.js",
    "bg.js"
  ]
},
</code></pre></div></div>

<p><strong>After:</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>"background":  {
  "service_worker": "bg.js"
},
</code></pre></div></div>

<p>Service workers come with a way to load scripts using the <a href="https://developer.mozilla.org/en-US/docs/Web/API/WorkerGlobalScope/importScripts">importScripts()</a> function. So I added the following to the top of my <code class="language-plaintext highlighter-rouge">bg.js</code> script:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>importScripts("aws-sdk-2.1046.0.js");
</code></pre></div></div>

<p>This addition now silently failed the AWS calls I requested the extension make, without much debugging information.</p>

<p>It’s at this point that I’d like to call out <a href="https://github.com/sk16">Saurav Kushwaha</a> for his <a href="https://github.com/aws/aws-sdk-js/issues/1902">prior work</a> in this area, which overrides the XHRClient class used in the AWS namespace with <a href="https://github.com/iann0036/aws-sdk-serviceworker/blob/master/lib/http/xhr.js#L48">fetch</a>. I did need to perform a couple of slight modifications to properly return correct error codes however.</p>

<p>After replacing the XHRClient class, I was happy to see that some calls were successfully returning, but for some reason there was still some failures.</p>

<h2 id="xml-is-hard">XML is hard</h2>

<p>The failures I was seeing were coming from STS and S3, and I quickly realised that these were APIs that returned XML-based responses.</p>

<p>One immediate problem that actually showed error logs was that <code class="language-plaintext highlighter-rouge">window</code> was not defined, where parts of the SDK expected it to be available.</p>

<p><img src="/images/posts/sw1.png" alt="" /></p>

<p>I quickly added a one-liner to make that available during initialisation:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>if(!window){var window = {}};
</code></pre></div></div>

<p>After that change, I was now receiving an error that it could not load the XML parser.</p>

<p><img src="/images/posts/sw2.png" alt="" /></p>

<p>Digging into the SDK, the logic looked like the following:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>if (window.DOMParser) {
  // use the native DOM parser library
} else if (window.ActiveXObject) {
  // use the ActiveXObject to parse, a fallback for IE8 and lower
} else {
  throw new Error("Cannot load XML parser");
}
</code></pre></div></div>

<p>The SDK relies on the native DOM parser to interpret XML responses from those services, so in order to alleviate this I decided to find a polyfill to replace it. I came across <a href="https://www.npmjs.com/package/@xmldom/xmldom">xmldom</a> module on npm and found it suitable for my needs. I did need to bundle this into a browser-compatible library, so used <a href="https://github.com/browserify/browserify">browserify</a> to achieve this.</p>

<p>After importing the new DOM parser library for use by the SDK, I re-tested the calls which produced a valid response end-to-end. All done, or so I thought.</p>

<h2 id="something-strange">Something strange</h2>

<p>Though my application now seemed to be working well, producing no errors and always returning valid responses, I noticed that many of my list calls (for example, <code class="language-plaintext highlighter-rouge">S3.ListBucket</code>) weren’t returning the resources within my account I expected.</p>

<p>I suspected some issues with the XML parser and dumped both the response of the HTTP call, and the object immediately after xmldom had parsed it. Both of these correctly showed the bucket names I was expecting, yet the response produced an empty array.</p>

<p><img src="/images/posts/sw3.png" alt="" /></p>

<p><img src="/images/posts/sw4.png" alt="" /></p>

<p>This one hurt my head. After debugging for probably a few hours, I found the issue. During the process of constructing the response in a clean format, the SDK requests the properties <a href="https://developer.mozilla.org/en-US/docs/Web/API/Element/firstElementChild"><code class="language-plaintext highlighter-rouge">Element.firstElementChild</code></a> and <a href="https://developer.mozilla.org/en-US/docs/Web/API/Element/nextElementSibling"><code class="language-plaintext highlighter-rouge">Element.nextElementSibling</code></a> from the parsed object, however xmldom <a href="https://github.com/xmldom/xmldom/issues/328">had not yet implemented</a> these properties and so the iterators were silently failing.</p>

<p>After having a look at the xmldom library to investigate whether it could be easily patched, I instead simply implemented these properties as methods directly and replaced the SDK code which accesses these properties with my implementation, as shown below:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>function getFirstElementChild(xml) {
  for (var i = 0; i &lt; xml.childNodes.length; i++) {
    if (xml.childNodes[i].hasOwnProperty('tagName')) {
      return xml.childNodes[i];
    }
  }
  return null;
}

function getNextElementSibling(xml) {
  var foundSelf = false;
  for (var i = 0; i &lt; xml.parentNode.childNodes.length; i++) {
    if (xml.parentNode.childNodes[i] === xml) {
      foundSelf = true;
      continue;
    }
    if (foundSelf &amp;&amp; xml.parentNode.childNodes[i].hasOwnProperty('tagName')) {
      return xml.parentNode.childNodes[i];
    }
  }
  return null;
}
</code></pre></div></div>

<h2 id="wrapping-up">Wrapping up</h2>

<p>After all the above changes were made, I was able to produce a version of the version 2 SDK which, from all the tests I’ve made, seems to work as intended within a service worker context.</p>

<p>I’ve made a version of the service worker-compatible SDK available on <a href="https://github.com/iann0036/aws-sdk-serviceworker">GitHub</a>, should you want to compile your own. Refer to the <a href="https://docs.aws.amazon.com/sdk-for-javascript/v2/developer-guide/building-sdk-for-browsers.html">official docs</a> for specific compilation options, as they should work the same.</p>

<p>I got pretty close to abandoning this experiment, but I’m glad I persisted. I learned a lot about the internals of the SDK and got a working alternative in the end. If you liked what I’ve written, or want to tell me how terrible of an idea this was, reach out to me on Twitter at <a href="https://twitter.com/iann0036">@iann0036</a>.</p>]]></content><author><name>Ian Mckay</name></author><summary type="html"><![CDATA[The AWS JavaScript SDK supports Node.js, React Native and web browsers, but what if you're running in a service worker? In this post, I'll explain how I modified version 2 of the AWS JavaScript SDK to run within a service worker context.]]></summary></entry></feed>