Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implement sarif format reporter #1047

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

Conversation

Zxilly
Copy link

@Zxilly Zxilly commented Jun 28, 2024

Closes: #594

I asked earlier if anyone had already worked on this feature and got no reply #594 (comment) , so I added this implementation myself.

But as I declared, I'm not a professional rust developer, so this patch obviously contains a lot of problems caused by a lack of understanding of rust, such as .unwrap abuse, since I'm not sure how to properly convert Error to io::Error.

I enabled Allow edits by maintainers, so if you have any suggestions for syntax changes, please push them directly to the corresponding repository branch, thanks!

@@ -316,6 +317,8 @@ fn run_checks(args: &args::Args) -> proc_exit::ExitResult {
if status_reporter.errors_found() {
errors_found = true;
}

status_reporter.finalize().unwrap();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to find something better to do than unwrap

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just have no idea, ? can not cast error automatic, should i attach a map_err for everyone?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this marked as resolved when nothing has changed?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgeted to update this one, will fixed in the new patch.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log::error!("Error finalizing: {}", err);

Is not an appropriate way of handling this

crates/typos-cli/Cargo.toml Outdated Show resolved Hide resolved
@Zxilly Zxilly force-pushed the sarif branch 3 times, most recently from c22c553 to 50550f1 Compare June 29, 2024 10:51
@Zxilly Zxilly requested a review from epage June 29, 2024 10:51
@Zxilly
Copy link
Author

Zxilly commented Jun 29, 2024

@epage Could you please approve the workflow run to see if anything broken by this?

@coveralls
Copy link

coveralls commented Jun 29, 2024

Pull Request Test Coverage Report for Build 9723606862

Details

  • 0 of 76 (0.0%) changed or added relevant lines in 4 files are covered.
  • 2 unchanged lines in 1 file lost coverage.
  • Overall coverage decreased (-0.7%) to 22.315%

Changes Missing Coverage Covered Lines Changed/Added Lines %
crates/typos-cli/src/bin/typos-cli/args.rs 0 1 0.0%
crates/typos-cli/src/bin/typos-cli/main.rs 0 1 0.0%
crates/typos-cli/src/report.rs 0 2 0.0%
crates/typos-cli/src/bin/typos-cli/report.rs 0 72 0.0%
Files with Coverage Reduction New Missed Lines %
crates/typos-cli/src/bin/typos-cli/main.rs 2 0.0%
Totals Coverage Status
Change from base Build 9627078583: -0.7%
Covered Lines: 534
Relevant Lines: 2393

💛 - Coveralls

Comment on lines 313 to 312
let column_start =
unicode_segmentation::UnicodeSegmentation::graphemes(start.as_ref(), true).count() + 1;
let column_end =
unicode_segmentation::UnicodeSegmentation::graphemes(msg.typo, true).count() + column_start;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Looking at the spec, we should be reporting code points, not graphemes
  2. We need to specify the column unit

If a SARIF producer processes text artifacts and theRun.results (§3.14.23) is non-empty, the run object SHALL contain a property named columnKind whose value is a string that specifies the unit in which the analysis tool measures columns. If a SARIF producer processes text artifacts and theRun.results is empty, columnKind MAY be present.

...

  • "unicodeCodePoints": Each Unicode code point (abstract character) is considered to occupy one column. This means that even a character that is represented in UTF-16 by a surrogate pair is considered to occupy one column.

Copy link
Author

@Zxilly Zxilly Jun 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I set the columnKind as unicodeCodePoints. So the graphemes().count() value should represents the code point correctly, since every call to next should get a new character.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't a char a code point while a grapheme can be made of multiple multiple chars?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As https://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries described, UnicodeSegmentation::graphemes should mean to be unicodeCodePoints in SARIF, because they all represent a user-readable character, even though the underlying layer consists of multiple code points.

SARIF: This means that even a character that is represented in UTF-16 by a surrogate pair is considered to occupy one column.

Unicode: For example, “G” + grave-accent is a user-perceived character: users think of it as a single character, yet is actually represented by two Unicode code points. These user-perceived characters are approximated by what is called a grapheme cluster, which can be determined programmatically.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SARIF: This means that even a character that is represented in UTF-16 by a surrogate pair is considered to occupy one column.

From my understanding, UTF-16 surrogate pairs are a way to represent code points that can't be represented otherwise. Grapheme's are more than just surrogate pairs.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I switch to utf16CodeUnits

@Zxilly
Copy link
Author

Zxilly commented Jul 21, 2024

@epage hello, any updates?

Cargo.lock Outdated Show resolved Hide resolved
@@ -1,13 +1,14 @@
use std::io::{BufRead as _, BufReader, Write as _};
use std::io::{BufRead as _, BufReader, Error, Write as _};

Check failure

Code scanning / clippy

unused import: `Error`

unused import: `Error`
@coveralls
Copy link

coveralls commented Jul 22, 2024

Pull Request Test Coverage Report for Build 10045532744

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 0 of 76 (0.0%) changed or added relevant lines in 4 files are covered.
  • 190 unchanged lines in 2 files lost coverage.
  • Overall coverage decreased (-0.9%) to 22.13%

Changes Missing Coverage Covered Lines Changed/Added Lines %
crates/typos-cli/src/bin/typos-cli/args.rs 0 1 0.0%
crates/typos-cli/src/report.rs 0 2 0.0%
crates/typos-cli/src/bin/typos-cli/main.rs 0 3 0.0%
crates/typos-cli/src/bin/typos-cli/report.rs 0 70 0.0%
Files with Coverage Reduction New Missed Lines %
crates/typos-cli/src/bin/typos-cli/main.rs 2 0.0%
crates/typos/src/tokens.rs 188 2.36%
Totals Coverage Status
Change from base Build 9812709078: -0.9%
Covered Lines: 534
Relevant Lines: 2413

💛 - Coveralls

match status_reporter.finalize() {
Ok(_) => {}
Err(err) => {
log::error!("Error finalizing: {}", err);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error would be redundant with log showing ERROR. finalizing has no user-facing meaning in this context

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generate final result still doesn't have much meaning to the user and reads oddly within context of an error.

Likely we should directly print the error and have error_to_io_error change the error message to format!("failed to generate SARIF output: {err}")

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not seeing this updated


use anstream::stdout;
use serde_sarif::sarif;
use serde_sarif::sarif::Invocation;

Check failure

Code scanning / clippy

unused import: `serde_sarif::sarif::Invocation`

unused import: `serde_sarif::sarif::Invocation`
Comment on lines 321 to 343
if let Err(err) = status_reporter.generate_final_result() {
log::error!("generate final result: {}", err);
return proc_exit::Code::FAILURE.ok();
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has us printing one SARIF json object per argument.

This should either be moved out of the loop or this is jsonlines and we shouldn't use pretty

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's too bad I didn't notice that typos created a reporter for each file. i need to keep elevating the status of the reporter to make sure that the sarif file contains information from all the files.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please take a look at 7a834d0

crates/typos-cli/src/bin/typos-cli/main.rs Outdated Show resolved Hide resolved
crates/typos-cli/src/bin/typos-cli/report.rs Outdated Show resolved Hide resolved
crates/typos-cli/src/bin/typos-cli/report.rs Outdated Show resolved Hide resolved
crates/typos-cli/Cargo.toml Show resolved Hide resolved
Cargo.lock Outdated Show resolved Hide resolved
crates/typos-cli/src/bin/typos-cli/report.rs Outdated Show resolved Hide resolved
crates/typos-cli/src/bin/typos-cli/report.rs Outdated Show resolved Hide resolved
match status_reporter.finalize() {
Ok(_) => {}
Err(err) => {
log::error!("Error finalizing: {}", err);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not seeing this updated

@Zxilly
Copy link
Author

Zxilly commented Jul 22, 2024

I squashed the commits again to clean up changes not related to this PR.

@Zxilly Zxilly force-pushed the sarif branch 2 times, most recently from f9be168 to 5fa2d29 Compare July 23, 2024 12:02
@Zxilly
Copy link
Author

Zxilly commented Aug 1, 2024

The conflict has been resolved.
Note that the winnow upgrade deprecated a certain API that causes linter to display a warning now, which is unrelated to the changes made by this PR.

@Zxilly
Copy link
Author

Zxilly commented Aug 8, 2024

@epage Can we move forward with this patch?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support Sarif reporting format
3 participants