Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metadata Bindings #345

Open
wants to merge 11 commits into
base: page
Choose a base branch
from
Open

Metadata Bindings #345

wants to merge 11 commits into from

Conversation

xmoravec
Copy link

@xmoravec xmoravec commented Jun 16, 2023

This PR implements the metadata layer for sec-certs, including the pipeline processing the binding files, creating binding records in a new MongoDB collection, and the web presentation of the data.

@J08nY J08nY self-requested a review June 17, 2023 19:17
@J08nY J08nY added enhancement New feature or request web Stuff related to the web at seccerts.org labels Jun 17, 2023
@J08nY
Copy link
Member

J08nY commented Jun 21, 2023

@xmoravec Should I look at this and give it a review? Is it ready?

@xmoravec
Copy link
Author

@J08nY yes please, you can review the code :) it should be ready

Copy link
Member

@J08nY J08nY left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several things need to be handled.

Comment on lines +97 to +98
# Secret key to validate jwt tokens of downloaded binding files with.
BINDINGS_PUBLIC_KEY = ""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a secret key or a public key? Comment says one thing, variable name says another. Some more documentation on using this would be helpful.


from .. import mongo

Document = Any
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? Just use Any directly if this is not actually binding to any contract.

Comment on lines +26 to +31
"""Binding object constructor

Args:
cert_id (str): certificate id
header_name (str): header name
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sec-certs main repo and also this repo uses a different docstring format.
Using :param cert_id: and :returns:. Please change the docs to this format. Thanks for adding docs though! And sorry that the original code had barely any.

def __repr__(self) -> str:
return f"Binding {self.cert_id} to {self.header_name}"

def __getitem__(self, key: str) -> Union[str, int, list[object]]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is attribute access not enough?
Also this can be just turned into a getattr(self, key) or this can be done at the callsite of the __getitem__.

str: Transformed URL
"""
parsed_url = urllib.parse.urlparse(url)
path = urllib.parse.unquote(parsed_url.path.strip("/"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also strips the trailing /. Should it?
Will this be used with world-provided/untrusted URLs?
What about directory traversal? I think we should be vary.

dir_path = url.split("/main")[1]
api_url = github_url_to_api(repo_url) + "/contents" + dir_path

response = requests.get(api_url)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We really need to be authenticated to Github in order for this to work, the rate limit is quite steep. Some 60 requests per few hours or so. I think our batch processing, even if only done daily or weekly could go over that with enough bindings.

continue
for item in data["data"]:
timestamp = item["timestamp"]
if not verify_timestamp(timestamp):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this necessary? Either the JWT verification passed, in which case the data is signed (by whoever, I don't know the full system) and we trust it and therefore do not need this secondary verification which is quite weak. Or is this some data sanitization to make sure the format is right?

Comment on lines +324 to +328
if not download_headers:
continue

for cert_id in item["certificate_ids"]:
bindings.append(Binding(cert_id, header_file_name))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These lines should be flipped right? the check should come after the items are added to the bindings list.

f"JWT verification failed for header file {file_name}")
continue
for data_obj in json_obj["data"]:
if not verify_timestamp(data_obj["timestamp"]):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, not sure why we check this.

binding (Binding): An object representing the binding - cert_id and header_data
verbose (bool, optional): Whether to click.echo out extra information. Defaults to False.
"""
cert_id = binding["cert_id"] if "NIST" not in binding["cert_id"] else int(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nasty. The binding should just have another attribute that says whether it is CC or FIPS. Nothing like "NIST-" prefixed cert IDs exists anywhere.

@petrs
Copy link
Member

petrs commented Sep 21, 2023

@xmoravec Hi Eric, do you find the comments from @J08nY actionable and are you willing to incorporate before merging? It would be better to have things merged otherwise this PR may become incompatible soon due to other changes. Thank you for consideration

@xmoravec
Copy link
Author

Hi Petr and Ján,
first of all, I am sorry for the late reply - I was hoping to find time and the capacity to incorporate at least some of the changes.

However, I want to be fair and admit I currently completely lack the capacity to incorporate the changes. Moreover, many of the changes requested have the potential to break some of the functionality and would require further testing and discussions. Aside from the capacity I currently lack the technical means to test the new functionality or to make it compatible with newer versions of sec-certs.

I am sorry to leave my work unfinished. However, I believe getting accustomed to the code and picking it up from here should not be difficult for someone familiar with the sec-certs project, should it be deemed a priority.

Regards,
Erik

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request web Stuff related to the web at seccerts.org
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants