[Bug]: Permissions error uploading prepared XML records to S3 #878

TylerHendrickson · 2024-07-31T18:48:58Z

Why is this issue important?

We cannot save data from Grants.gov pertaining to new or modified grant opportunities. This issue currently only affects Staging.

Current State

As of the deployment of #848 to Staging, the SplitGrantsGovXMLDB Lambda function is unable to upload per-grant XML files to the S3 "prepared data" bucket. When the Lambda function attempts to do so, it receives an AccessDenied response from S3 and logs Error uploading prepared grant record to S3 (example logs, Datadog logs query service:grants-ingest "Error uploading prepared grant record to S3" functionname:grants_ingest-splitgrantsgovxmldb env:staging). This issue appears to be due to incorrect/missing permissions in the SplitGrantsGovXMLDB Lambda function execution role.

Background: The changes in #848 introduced a new filename (object key) naming scheme for the S3 "prepared data" bucket. Previously, object keys for Grants.gov XML records always followed a pattern of <id prefix>/<full id>/grants.gov/v2.xml. Now, instead of /v2.xml, object keys may have either of the following suffixes:

/v2.OpportunitySynopsisDetail_1_0.xml, for fully-published grant opportunity records
/v2.OpportunityForecastDetail_1_0.xml, for forecasted grant opportunity records

However, the permissions for SplitGrantsGovXMLDB were never updated to match the new object key convention, so the Lambda function is still only allowed to inspect and upload S3 objects with the /v2.xml suffix.

Note: This issue is also causing errors later in the PublishGrantEvents step, as the missing Grants.gov data is causing validation failures. Resolving the SplitGrantsGovXMLDB permissions issue should also resolve this downstream PublishGrantEvents validation issue.

Expected State

The SplitGrantsGovXMLDB Lambda function is permitted to upload per-grant XML records to the S3 "prepared data" bucket using the new object key suffixes.

Implementation Plan

Update permissions for Lambda execution role

In terraform/modules/SplitGrantsGovXMLDB/main.tf, update the Lambda function's AllowInspectS3PreparedData and AllowS3UploadPreparedData execution policy statements so that the resources condition in both statements' includes the following patterns:

${data.aws_s3_bucket.prepared_data.arn}/*/*/grants.gov/v2.OpportunitySynopsisDetail_1_0.xml
${data.aws_s3_bucket.prepared_data.arn}/*/*/grants.gov/v2.OpportunityForecastDetail_1_0.xml

Refactor Last-Modified Determination in SplitGrantsGovXMLDB handler

(See Option 3 in this comment)

We want to start using the "Grants Prepared Data" DynamoDB Table to determine whether an encountered XML record is new, modified, or unmodified. This replaces the existing behavior of checking S3 for the following conditions:

Object does not exist: XML record is new and should be saved to S3
Object exists and its modification timestamp is older than the XML record's LastUpdatedDate value: XML record is newer than the currently-persisted record and should replace the currently-persisted record in S3
Object exists and its modification timestamp is the same as (or more recent than) the XML record's LastUpdatedDate value: XML record has been seen before and is unchanged since the last time it was encountered by SplitGrantsGovXMLDB

When using DynamoDB, the behavior is basically the same:

DDB item does not exist: XML record is new (unrepresented in DynamoDB) and should be saved to S3
DDB item exists and its LastUpdatedDate value is older than the XML record's LastUpdatedDate value: XML record is newer than the currently-persisted table item, so the XML record should replace the currently-persisted record in S3, which will eventually result in an update to the corresponding DynamoDB table item
DDB item exists and its LastUpdatedDate value is the same as (or more recent than) the XML record's LastUpdatedDate value: XML record has been seen before and is unchanged since the last time it was persisted to DynamoDB.

Relevant Code Snippets

No response

The text was updated successfully, but these errors were encountered:

TylerHendrickson · 2024-08-01T23:22:35Z

Adding some thoughts here related to the Additional considerations note in the issue description:

We need to ensure that this change is backwards-compatible with the old /v2.xml suffix. Otherwise, every Grants.gov XML record that currently exists in the prepared data bucket will be perceived as missing (and requiring an update) the next time SplitGrantsGovXMLDB runs. Details TBD

Options to consider:

Do a one-time operation (in each deployment environment) to bulk-rename S3 object keys – every object with a key matching */*/grants.gov/v2.xml would be renamed to <equivalent>/<equivalent>/grants.gov/v2.OpportunitySynopsisDetail_1_0.xml.
- Pros
  - Generally straightforward
- Cons:
  - Requires a more hands-on/coordinated roll-out to Production
  - Would affect around 80k S3 objects. I'm not particularly worried about cost, but it's hard to say offhand how long this operation would take, and it's probably one that should be scripted out (possibly to the grants-ingest CLI). It could end up being relatively high-effort for something so limited-use.
  - Related to the previous point: this would result in a lot of unnecessary downstream processing unless we disable the S3 event triggers for the new object key convention, and then re-enable them (which adds more coordination overhead for deployment).
  - This is a solvable problem, but we need to be careful about preserving object metadata and tags.
  - Resets object expiration for all Grants.gov records in S3, since the vast majority of objects have been transitioned to Glacier at this point.
Do nothing, and allow every Grants.gov record to be recreated.
- Pros:
  - Low effort
- Cons:
  - Nearly doubles the bucket's object count. However, I'm not so worried about cost here (and we could go in after the fact and mass-delete all objects with keys like */*/grants.gov/v2.xml), so this is a pretty minimal concern.
  - Also has the effect of resetting object expiration for all Grants.gov records.
  - Also results in a lot of downstream processing.
Add a DynamoDB table to serve as an index for mapping Grants.gov records to their respective S3 keys
- Pros:
  - Doesn't necessarily preclude either of the prior two options (particularly "Do nothing").
  - Using DDB to index S3 object metadata and/or S3 key lookup is a fairly well-established pattern.
  - The particular value of an S3 object key becomes more arbitrary, so we can more easily change naming conventions again in the future if we need to.
  - We would be able to store and compare to the Grants.gov record's actual LastUpdatedDate rather than relying on S3 file modification timestamps as we do today, which is admittedly a bit kludgy.
  - Might enable more optimized/batched lookups (currently we run HEAD on every Grant.gov record on a daily basis)
  - If we wanted (we might not) we could possibly make use of the existing DynamoDB table by adding a new item attribute, although it would potentially make lookups less optimized
- Cons:
  - Level of effort

TylerHendrickson · 2024-10-16T16:41:37Z

Completed with #892
Additional fixes #921 #922

TylerHendrickson added the bug Something isn't working label Jul 31, 2024

TylerHendrickson self-assigned this Jul 31, 2024

ClaireValdivia added the Grant Finder label Aug 13, 2024

TylerHendrickson mentioned this issue Aug 27, 2024

Fix: 878 permissions errors #892

Merged

6 tasks

TylerHendrickson mentioned this issue Sep 19, 2024

[Chore]: Upgrade to Go 1.23.x #909

Closed

This was referenced Sep 27, 2024

Fix: Skip s3 upload if record's last-modified time matches DB value #921

Merged

Fix: Handle missing LastUpdatedDate attribute in DynamoDB items #922

Merged

TylerHendrickson closed this as completed Oct 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Permissions error uploading prepared XML records to S3 #878

[Bug]: Permissions error uploading prepared XML records to S3 #878

TylerHendrickson commented Jul 31, 2024 •

edited

Loading

TylerHendrickson commented Aug 1, 2024 •

edited

Loading

TylerHendrickson commented Oct 16, 2024

[Bug]: Permissions error uploading prepared XML records to S3 #878

[Bug]: Permissions error uploading prepared XML records to S3 #878

Comments

TylerHendrickson commented Jul 31, 2024 • edited Loading

Why is this issue important?

Current State

Expected State

Implementation Plan

Relevant Code Snippets

TylerHendrickson commented Aug 1, 2024 • edited Loading

TylerHendrickson commented Oct 16, 2024

TylerHendrickson commented Jul 31, 2024 •

edited

Loading

TylerHendrickson commented Aug 1, 2024 •

edited

Loading