-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft: Scrape FIPS algorithm data #276 #409
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, great work (given that this is your first PR into the project 👍). I proposed just minor changes through individual comments. I additionally:
- Added the issue number to the PR description (so that we know what we're working on)
- Suggest that we write some minimalistic tests, e.g. downloading a single page and taking a look at how the obtained data from the page looks like. This would belong here
Also, did you get the same algorithms with this approach as you did with the alternative approach that you proposed recently? If not, what is the source of this discrepancy?
By the way, how long does execution of this method take? What's the most time demanding part? The call to FIPSAlgorithmDataset.parse_alg_data_from_html(page)
could be parallelized if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update 👍.
There are still two questions from my old comment that are hanging. Could you answer them please?
Also, did you get the same algorithms with this approach as you did with the alternative approach that you proposed recently? If not, what is the source of this discrepancy?
By the way, how long does execution of this method take? What's the most time demanding part? The call to FIPSAlgorithmDataset.parse_alg_data_from_html(page) could be parallelized if needed.
Next time, please contribute by creating branch directly in crocs-muni/sec-certs
repository. This way, the automated tests are run against your commits. I manually run the all relevant parts of the pipeline:
- pre-commit: ❌ does not pass, you have to fix a MyPy error (use
np.nan
instead ofnp.NaN
in some modules unrelated to your PR, likely due to breaking change in numpy). and apply some Ruff formatting across files. - tests: ❌
- docs: all ok. ✅
The tests failed in FIPS part with
ERROR tests/fips/test_fips_algorithm_dataset.py::test_alg_dset_lookup_dict - TypeError: Dict: {'alg_number': '5331', 'algorithm_type': 'AES', 'vendor': 'Juniper Networks, Inc.', 'implementatio...
ERROR tests/fips/test_fips_algorithm_dataset.py::test_to_pandas - TypeError: Dict: {'alg_number': '5331', 'algorithm_type': 'AES', 'vendor': 'Juniper Networks, Inc.', 'implementatio...
For more info on automated pipelines and how to setup you environment to comply with our standards, please see https://sec-certs.org/docs/contributing.html#quality-assurance.
Added scraping of additional data(type, description, version, capabilities) for each FIPS algorithm
Edit by @adamjanovsky: This closes #276