Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Table formatting spec #889

Merged
merged 6 commits into from
Oct 16, 2024

Conversation

mattrunyon
Copy link
Collaborator

@mattrunyon mattrunyon commented Sep 16, 2024

Fixes #792

Not sure who else might want to review this. This is a basic overview of how I see table formatting rules.

Things we might want to consider

  • Global rules for the table such as "format all columns as $x.xx"
  • Should we separate column and row formatting to separate props or just make separate dataclasses for them? One potential downside to multiple props is defining which takes precedence and do we still follow a stop_if_true if there's both a row and column color rule for example.

@mattrunyon mattrunyon self-assigned this Sep 16, 2024
@mattrunyon mattrunyon changed the title Table formatting spec docs: Table formatting spec Sep 16, 2024
Comment on lines 2442 to 2450
column_formatting=[
ui.table_formatting.COLOR("X", "RED"),
ui.table_formatting.COLOR("Y", "BLUE", where="Y % 2 == 0"),
ui.table_formatting.NUMBER_FORMAT("Y", format="0.00"),
ui.table_formatting.DATABAR("Z", value_col="Z", min=0, max=100, positive_color="GREEN", negative_color="RED")
],
row_formatting=[
ui.table_formatting.COLOR(where="X > 5", color="GREEN")
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One downside I see to splitting like this is the lack of control between interleaving the order of column and row formatting.

Doing something that is:
row rule
column rule
row rule

would not be possible

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. I noted in the PR body we might want to just have formatting instead. Also makes it clear what the applied order is

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If separate, I think column beats row in terms of precedence.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that splitting this up causes precedence ambiguity.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might think that row beats column. e.g. make the whole row red because something bad is happening.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If separate I'd lean row beats column. A row is more specific on an individual basis than a column (ignoring all the filters and groups in this case and just thinking about what would more reasonably win if a single row and single column were highlighted). Although the differing opinions here may be a sign they should be combined. Might be tricky and confusing for a user to remember.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would lean column beats row. You might make the whole row shaded yellow or whatever because something seems stale; but you don't want to eliminate the formatting that you would have otherwise applied to your "BUY/SELL" column.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm working on a revised spec that proposes combining into a single argument so there's no ambiguity or desire for column over row or row over column


##### ui.table Formatting Rule Types

There are 3 main types of formatting rules: those which affect basic visual aspects (color, font, etc.), those which affect the display value (number formatting, etc.), and those which affect the cell rendering mode (data bars, buttons, etc.). Multiple different visual rules can be applied at the same time, but only one display value rule and one cell rendering mode rule can be applied at a time. It doesn't make sense to have a cell that is both a data bar and a button, for example. If a rule is applied conditionally, multiple display or cell rendering rules should be allowed to be applied in a column.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would need clear error message states for violating these rules.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if it would be possible for the user to violate these. If they did things would still work, but it would just apply the last (or stop after the first) rule that is true

@dsmmcken
Copy link
Contributor

dsmmcken commented Sep 18, 2024

Maybe charles/chip/raffi might have good insights on things to cover.

Comment on lines 2442 to 2450
column_formatting=[
ui.table_formatting.COLOR("X", "RED"),
ui.table_formatting.COLOR("Y", "BLUE", where="Y % 2 == 0"),
ui.table_formatting.NUMBER_FORMAT("Y", format="0.00"),
ui.table_formatting.DATABAR("Z", value_col="Z", min=0, max=100, positive_color="GREEN", negative_color="RED")
],
row_formatting=[
ui.table_formatting.COLOR(where="X > 5", color="GREEN")
]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If separate, I think column beats row in terms of precedence.


##### ui.table Row Formatting Rules

Row formatting rules apply to entire rows. Only visual rules can be applied to rows. Row formatting rules should have an `where` field which is a conditional expression that determines if the rule should be applied. This expression will be applied as a custom column (which is applied with `update_view`) to the table and may reference columns or use the query language if needed. The conditional is required for row formatting rules because otherwise they would apply to every row in the table.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does row have an update_view, but not a column?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Column rules would optionally have a where applied the same way. I think for rows we should require a where as otherwise the entire table would be formatted.

It might make sense to have certain format types that apply to the whole table though. Something like format all numeric columns as currency.

@dsmmcken
Copy link
Contributor

How much of the existing formatting is this using vs us generating our own column?

How does the multiple queries targeting the same column combine?

@mattrunyon
Copy link
Collaborator Author

mattrunyon commented Sep 23, 2024

How much of the existing formatting is this using vs us generating our own column?

How does the multiple queries targeting the same column combine?

This would use none of our existing formatting code in the engine.

Each where clause would be applied as a separate custom column. We could combine into a single custom column that returns an int representing which rule was true. But the benefit might be negligible until you get into hundreds or thousands of rules on a table.

@mattrunyon
Copy link
Collaborator Author

@chipkent @devonpallison @rbasralian You might be interested in looking at this

Copy link
Member

@mofojed mofojed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like to see an example with stop_if_true usage, and also would like to see the option of having it as just one formatting with data classes for column vs row formatting


Outside of these 2 properties, it is up to the Dataclass to identify what kind of formatting it is and up to the web to apply that formatting. We should likely use a field like `__format_type` that is part of each Dataclass as a key for identifying formatting rules.

On the client, formatting rules should be applied in the order they are defined in the list. This means that if a rule is defined later in the list, it will take precedence over a rule defined earlier in the list. There should be a `stop_if_true` or equivalent property which would not apply any further formatting rules of the same type (e.g. background color) once a matching rule has been applied. This should make it possible to still apply different formatting types even if `stop_if_true` is `True`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Example of stop_if_true usage?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My initial thoughts are that these 2 would be equivalent. You could apply the stop_if_true flag to individual rules like this.

formatting=[
  ColumnColor('X', 'red', where='X % 3 == 0'),
  ColumnColor('X', 'yellow', where='X % 5 == 0'),
  ColumnColor('X', 'green', where='X % 15 == 0')
]

formatting=[
  ColumnColor('X', 'green', where='X % 15 == 0', stop_if_true=True)
  ColumnColor('X', 'red', where='X % 3 == 0'),
  ColumnColor('X', 'yellow', where='X % 5 == 0'),
]

One thing to note is I believe the Excel behavior is to apply all rules in order and they have a checkbox in the GUI when applying rules for "stop if true". Google Sheets stops at the first rule and is not configurable it seems.

Not sure if we have a preference on either behavior. I think the Google Sheets behavior can achieve the same things as Excel, but it is a little simpler for defining rules. First rule gets priority and that's it.

I think we would want to have the priority system per rule type which is a bit different from Sheets and Excel since you define the entire format (color, font color, font style, etc.) in 1 rule

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking through stop_if_true has pointed out some rough UI edges to consider.

Say you have a row that you want to format with a particular background color, font color, font, and whatever else. You end up with code that looks like:

formatting=[
ColumnColor('X', 'green', where='X % 15 == 0', stop_if_true=True),
ColumnFont('X', 'arial', where='X % 15 == 0', stop_if_true=True),
ColumnFontColor('X', 'blue', where='X % 15 == 0', stop_if_true=True)
...
]

This feels very repetitive because all of the related items for a cell repeat where and stop_if_true. Maybe what is here is good enough, but a few minutes should be spent considering this type of case.

@mattrunyon
Copy link
Collaborator Author

Something else I've thought about is how can we/should we support a format that includes multiple rules with 1 condition. Basically something like this Format(col="X", formats=[Color('red'), Value('0.00%')], where="X > 5"). Not sure what a better keyword than formats would be there


Outside of these 2 properties, it is up to the Dataclass to identify what kind of formatting it is and up to the web to apply that formatting. We should likely use a field like `__format_type` that is part of each Dataclass as a key for identifying formatting rules.

On the client, formatting rules should be applied in the order they are defined in the list. This means that if a rule is defined later in the list, it will take precedence over a rule defined earlier in the list. There should be a `stop_if_true` or equivalent property which would not apply any further formatting rules of the same type (e.g. background color) once a matching rule has been applied. This should make it possible to still apply different formatting types even if `stop_if_true` is `True`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking through stop_if_true has pointed out some rough UI edges to consider.

Say you have a row that you want to format with a particular background color, font color, font, and whatever else. You end up with code that looks like:

formatting=[
ColumnColor('X', 'green', where='X % 15 == 0', stop_if_true=True),
ColumnFont('X', 'arial', where='X % 15 == 0', stop_if_true=True),
ColumnFontColor('X', 'blue', where='X % 15 == 0', stop_if_true=True)
...
]

This feels very repetitive because all of the related items for a cell repeat where and stop_if_true. Maybe what is here is good enough, but a few minutes should be spent considering this type of case.

The formatting rules should have 2 main properties:

1. The formatting to apply.
2. An optional conditional expression that will determine if the formatting should be applied.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirming that the conditional can be on columns that are not the column being formatted.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the conditional will be applied as a custom column (uses update_view in the engine). So any expression works as long as it is a boolean result

Comment on lines 2442 to 2450
column_formatting=[
ui.table_formatting.COLOR("X", "RED"),
ui.table_formatting.COLOR("Y", "BLUE", where="Y % 2 == 0"),
ui.table_formatting.NUMBER_FORMAT("Y", format="0.00"),
ui.table_formatting.DATABAR("Z", value_col="Z", min=0, max=100, positive_color="GREEN", negative_color="RED")
],
row_formatting=[
ui.table_formatting.COLOR(where="X > 5", color="GREEN")
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that splitting this up causes precedence ambiguity.

Comment on lines 2442 to 2450
column_formatting=[
ui.table_formatting.COLOR("X", "RED"),
ui.table_formatting.COLOR("Y", "BLUE", where="Y % 2 == 0"),
ui.table_formatting.NUMBER_FORMAT("Y", format="0.00"),
ui.table_formatting.DATABAR("Z", value_col="Z", min=0, max=100, positive_color="GREEN", negative_color="RED")
],
row_formatting=[
ui.table_formatting.COLOR(where="X > 5", color="GREEN")
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might think that row beats column. e.g. make the whole row red because something bad is happening.

Comment on lines 2466 to 2470
2. Display Value (not applicable to rows)
- Number Formatting
- Date Formatting
- DateTime Formatting
- String Formatting
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A user may want to apply a particular display value format (DVF) to all columns of a type. Does it make sense to support this, or would that feature not be used enough?

Comment on lines 2460 to 2465
1. Visual
- Background Color
- Font
- Font Color
- Font Size
- Text Alignment
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As noted in the comment above, users may want to set multiple visual things at once.


##### ui.table Column Formatting Rules

Column formatting rules apply to specific columns. All column formatting rules should have a `column` field which is the column the rule applies to. We could optionally support `columns` (or just a `columns` prop which handles both cases) which would allow for a list of columns that the rule applies to.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The python table API supports one or more columns for most cases. It would feel natural to have that here as well. Variable naming should be consistent with what the table API is doing.

@mattrunyon
Copy link
Collaborator Author

mattrunyon commented Oct 2, 2024

Updated to include 3 new options for a single format prop. Of the options, I like 2 and 3 the best. I don't have a strong opinion between those 2

I also removed the stop_if_true. I think we could just say first matching rule for a type applies or last matching rule takes precedence. Unless there is some case I'm missing where stop_if_true is the only way instead of just ordering the rules by precedence.

@@ -2415,6 +2415,117 @@ ui_table.sort(
| `by` | `str \| Sequence[str]` | The column(s) to sort by. May be a single column name, or a list of column names. |
| `direction` | `TableSortDirection \| Sequence[TableSortDirection] \| None` | The sort direction(s) to use. If provided, that must match up with the columns provided. Defaults to "ASC". |

##### ui.table Formatting Rules

Tables can be formatted via the `format` prop and a variety of Dataclasses representing different formatting options. We choose to use Dataclasses as effectively a more strict dictionary. This allows for better type checking and easier to understand code.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the format prop?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a general name for the formatting arg on ui.table. Instead of having a column_formatting and row_formatting I'm proposing just a format or formatting arg instead


With any of the options we could support some method of applying value formatting to all columns that support it. We could have an argument such as `all_columns=True` that would apply the rule to all columns that support it. This would be useful for number formatting, date formatting, etc.

We could also support regex (or just wildcards) in the `cols` field to apply the rule to multiple columns at once. This could be useful if a user wanted to format all columns that end in percent as a percentage. Something like `FORMAT(cols=".*Percent", format="0.00%")`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. We may want to support both regex and wildcards. Regex would hit the higher end devs that want specific control, and simple wildcards would hit the lower end users that have zero idea what a regex is but can handle simple wildcards.
  2. If we support wildcards, possibly you don't need all_columns=True and could just use cols="*". That raises the question of if regex/wildards apply to all columns or just columns with matching types. I'm not sure which is least confusing. If it applies to all columns, there could be an only_supported_types=True option.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would need some way to differentiate regex from wildcards. Probably just have users use r"/.*/" for regex strings and if there's no leading / then it's not regex.

True we could just support cols="*". I think initially most formatting options can apply to all columns (color, font color, etc.). The main thing that can't is the value formatting. If we have that under a single keyword in the dataclass, then we could assume it applies to all supported column types if there's columns w/ regex/wildcard. Maybe we want to warn if there's explicit column names only and the format doesn't apply to one of the explicitly listed columns.

Comment on lines 2486 to 2491
ui.table_formatting.COLUMN_FORMAT(cols="X", color="RED"),
ui.table_formatting.COLUMN_FORMAT(cols="Y", color="BLUE", where="Y % 2 == 0"),
ui.table_formatting.COLUMN_FORMAT(cols="Y", format="0.00"),
ui.table_formatting.COLUMN_FORMAT(cols=["A", "B"], color="PURPLE", format="0.00%", where="A > 5"),
ui.table_formatting.COLUMN_FORMAT(cols="Z", format=ui.table_formatting.DATABAR(value_col="Z", min=0, max=100, positive_color="GREEN", negative_color="RED"),
ui.table_formatting.ROW_FORMAT(where="X > 5", color="GREEN")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really what you meant by Option 3, or did you have something like the following in mind?

Suggested change
ui.table_formatting.COLUMN_FORMAT(cols="X", color="RED"),
ui.table_formatting.COLUMN_FORMAT(cols="Y", color="BLUE", where="Y % 2 == 0"),
ui.table_formatting.COLUMN_FORMAT(cols="Y", format="0.00"),
ui.table_formatting.COLUMN_FORMAT(cols=["A", "B"], color="PURPLE", format="0.00%", where="A > 5"),
ui.table_formatting.COLUMN_FORMAT(cols="Z", format=ui.table_formatting.DATABAR(value_col="Z", min=0, max=100, positive_color="GREEN", negative_color="RED"),
ui.table_formatting.ROW_FORMAT(where="X > 5", color="GREEN")
ui.table_formatting.FORMAT(cols="X", color="RED"),
ui.table_formatting.FORMAT(cols="Y", color="BLUE", where="Y % 2 == 0"),
ui.table_formatting.FORMAT(cols="Y", format="0.00"),
ui.table_formatting.FORMAT(cols=["A", "B"], color="PURPLE", format="0.00%", where="A > 5"),
ui.table_formatting.FORMAT(cols="Z", format=ui.table_formatting.DATABAR(value_col="Z", min=0, max=100, positive_color="GREEN", negative_color="RED"),
ui.table_formatting.FORMAT(where="X > 5", color="GREEN")

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops yes I copy/pasted and forgot to update that

Comment on lines 2440 to 2444
Option 1 ends up being more verbose (and causes code duplication) if you have multiple rules with the same condition. It also ends up with more potential imports for multiple rules if we recommend users import the dataclasses like `from deephaven.ui.table_formatting import COLOR, NUMBER_FORMAT, DATABAR`

Option 2 is more concise and allows for multiple rules with the same condition without repeating. Two dataclasses is easier for a user to distinguish between the 2 rule types. Only 1 or 2 imports needed.

Option 3 is more concise and allows for multiple rules with the same condition without repeating. One dataclass is easier for a user to understand that the rules are the same type. Only 1 import needed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we trying to decide between the 3 options and just implementing one, or are we planning to implement all 3 so that the user has options on which to use?

If I had to pick one, I'd go with option 2, but I'm not necessarily against supporting all 3.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think pick 1 and if there's enough reason to support other then we can implement that in the future. Just giving multiple syntax options to show what the syntax might look like

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I was designing this from scratch, I like option 3, because conceptually if I wanted to switch from (using existing terminology) a format_row_where to a format_column_where just adding on a column is much nicer that having to change the whole method when the only real difference is the column.
But, I recognize that 2 is closest to what we have now, and would be good to bridge the gap and allow a little more specification for those who want it.
So ideally we'd do both 2 and 3 in my opinion to start.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be fairly easy to support 2 and 3 without duplication on our end. If I'm reading correctly, dataclasses can use inheritance. So we'd basically implement 3 (1 dataclass w/ all fields) and then add thin dataclasses on top for column/row. The column dataclass would just mandate the cols field and inherit everything else from the base

Supporting all 3 is feasible I think if that's desired. We could do some mapping into the corresponding column/row/general structure.

My main gripe with the individual rules as dataclasses is it adds a lot of potential imports someone would have to add. I think we will want dataclasses for more complex value formatting rules like databars, but for something like color it might be a little annoying

t3 = ui.table(
t,
format=[
ui.table_formatting.FORMAT(cols="X", color="RED"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would explore a shorter import, ui.table.format would be nice if we can.

ui.table_formatting.FORMAT(cols="Y", color="BLUE", where="Y % 2 == 0"),
ui.table_formatting.FORMAT(cols="Y", format="0.00"),
ui.table_formatting.FORMAT(cols=["A", "B"], color="PURPLE", format="0.00%", where="A > 5"),
ui.table_formatting.FORMAT(cols="Z", format=ui.table_formatting.DATABAR(value_col="Z", min=0, max=100, positive_color="GREEN", negative_color="RED"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having both format="0.00%" and format=ui.table_formatting.DATABAR seems confusing and also prevents setting a format on the databar.

Would it make sense to accept databar=ui.table.format.databar or kind=databar... kind=button... or something?

@mattrunyon
Copy link
Collaborator Author

Cleaned up design doc to reflect the preferred choice for formatting (Option 3 - single format dataclass w/ kwargs)

@mattrunyon mattrunyon merged commit f79224a into deephaven:main Oct 16, 2024
16 checks passed
@mattrunyon mattrunyon deleted the table-formatting-spec branch October 16, 2024 20:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ui.table spec for formatting
6 participants