-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metadata Bindings #345
base: page
Are you sure you want to change the base?
Metadata Bindings #345
Conversation
@xmoravec Should I look at this and give it a review? Is it ready? |
@J08nY yes please, you can review the code :) it should be ready |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Several things need to be handled.
# Secret key to validate jwt tokens of downloaded binding files with. | ||
BINDINGS_PUBLIC_KEY = "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a secret key or a public key? Comment says one thing, variable name says another. Some more documentation on using this would be helpful.
|
||
from .. import mongo | ||
|
||
Document = Any |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? Just use Any directly if this is not actually binding to any contract.
"""Binding object constructor | ||
|
||
Args: | ||
cert_id (str): certificate id | ||
header_name (str): header name | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sec-certs main repo and also this repo uses a different docstring format.
Using :param cert_id:
and :returns:
. Please change the docs to this format. Thanks for adding docs though! And sorry that the original code had barely any.
def __repr__(self) -> str: | ||
return f"Binding {self.cert_id} to {self.header_name}" | ||
|
||
def __getitem__(self, key: str) -> Union[str, int, list[object]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is attribute access not enough?
Also this can be just turned into a getattr(self, key)
or this can be done at the callsite of the __getitem__
.
str: Transformed URL | ||
""" | ||
parsed_url = urllib.parse.urlparse(url) | ||
path = urllib.parse.unquote(parsed_url.path.strip("/")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also strips the trailing /
. Should it?
Will this be used with world-provided/untrusted URLs?
What about directory traversal? I think we should be vary.
dir_path = url.split("/main")[1] | ||
api_url = github_url_to_api(repo_url) + "/contents" + dir_path | ||
|
||
response = requests.get(api_url) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We really need to be authenticated to Github in order for this to work, the rate limit is quite steep. Some 60 requests per few hours or so. I think our batch processing, even if only done daily or weekly could go over that with enough bindings.
continue | ||
for item in data["data"]: | ||
timestamp = item["timestamp"] | ||
if not verify_timestamp(timestamp): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this necessary? Either the JWT verification passed, in which case the data is signed (by whoever, I don't know the full system) and we trust it and therefore do not need this secondary verification which is quite weak. Or is this some data sanitization to make sure the format is right?
if not download_headers: | ||
continue | ||
|
||
for cert_id in item["certificate_ids"]: | ||
bindings.append(Binding(cert_id, header_file_name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These lines should be flipped right? the check should come after the items are added to the bindings list.
f"JWT verification failed for header file {file_name}") | ||
continue | ||
for data_obj in json_obj["data"]: | ||
if not verify_timestamp(data_obj["timestamp"]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same, not sure why we check this.
binding (Binding): An object representing the binding - cert_id and header_data | ||
verbose (bool, optional): Whether to click.echo out extra information. Defaults to False. | ||
""" | ||
cert_id = binding["cert_id"] if "NIST" not in binding["cert_id"] else int( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is nasty. The binding should just have another attribute that says whether it is CC or FIPS. Nothing like "NIST-" prefixed cert IDs exists anywhere.
Hi Petr and Ján, However, I want to be fair and admit I currently completely lack the capacity to incorporate the changes. Moreover, many of the changes requested have the potential to break some of the functionality and would require further testing and discussions. Aside from the capacity I currently lack the technical means to test the new functionality or to make it compatible with newer versions of sec-certs. I am sorry to leave my work unfinished. However, I believe getting accustomed to the code and picking it up from here should not be difficult for someone familiar with the sec-certs project, should it be deemed a priority. Regards, |
This PR implements the metadata layer for sec-certs, including the pipeline processing the binding files, creating binding records in a new MongoDB collection, and the web presentation of the data.