-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DSIP-78][Data Quality] Remove data quality module #16794
base: dev
Are you sure you want to change the base?
Conversation
-1,there're no conclusion for now, please don't do this op |
Based on the discussion of issue #16728, we can draw the following conclusions.
No one among the maintainers on the current community is willing to refactor this module. And data-quality module has seriously blocked the progress of #16098 which is is very important for the next version's release. So I think the conclusion of removal is obvious. |
Apache emphasizes achieving consensus. If there are significant differences and we cannot be resolved in the short term, we may choose to temporarily shelve the proposal, allowing much more time for more information before re-discussing. Considering this is a major decision, I suggest you send a vote email to the dev mailing list. |
First of all, the current consensus does not require all people to reach an agreement, but only more than half. For more info you can take a look at apache/comdev-site#189
Like I said, no one among the maintainers on the current community is willing to refactor this module. How to achieve this without anyone willing to take responsibility?
I think issue and dev mail list have the same meaning on this issue. And all active PMC/Committer have been included in this issue. |
Github Issue can't instead of mail, especially for big event. By the way, i don't disagree with removing this module; I’m just concerned about the significant impact it will have on users. What I see is that many users want to keep this module, while the maintainers are inclined to remove it to make refactoring easier. Therefore, this decision requires great caution. If a vote does not take place in the dev mailing list, then it should happen in private, not in an issue thread. |
I don't think this module is a big event. This feature was introduced in PR #4830 Since Feb 21, 2021 without any mailing or github issue discussion. For more than three years, apart from this PR author, no contributor has contributed to this function, and there are endless issue of bugs and improvement, and there is no substantial code change in this function. This author also doesn't want to maintain this function and vote +1 for removal. |
At that time, there was no DSIP mechanism in place, and much of the communication took place during community meetings. The feature I’m referring to took 8-9 months to develop and implement. If this is not a major feature, I believe that statement would be inaccurate. |
I don't agree with this opinion. Since dolphinscheduler is focusing the modern data orchestration platform. Data-Quality is focusing accuracy and consistency of data. The relationship between two of them is equivalent to Flink and Flink-CDC. There are many mature examples at present. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add ddl to drop the dq table.
We can remove the create table ddl in the init sql and give some drop table ddl to users in docs to let users decide whether or not to execute it instead of execute it by default. WDYT? |
How about giving entry point script to remove exists table and package it into binary tarball instead of document, cause we support only two of databases in prod, and we should keep thing easy to use |
I think it will be more traversal for some users to provide users with operating DLLs intuitively since this is a dangerous operation for drop table... |
For proposal only, it seems inconsistency exists, although some of the PMCs in #16728 already agreed to remove it, David is right and he could challenge it and ask to vote in the dev mailing list. So maybe should vote in dev mail thread, And I think the vote result should be the ONLY result of this PR continue or not instead of personal emotions. And for this feature, I think it should better act as a plugin instead of a built-in function. Especially since it has many bugs and CVEs and not team member want to maintain it. So I personally would vote +1 for removing it. And BTW, the time cost for the feature development should not as a standard to measure the importance or not. |
I mean we should add new bash script like https://github.com/apache/dolphinscheduler/blob/dev/dolphinscheduler-tools/src/main/bin/migrate-lineage.sh which is separated form |
Ok. |
I will raise a vote in dev mail list. |
FYI, vote mail in https://lists.apache.org/thread/0tldm33skkbrfgbt01bvd610z5zmb725 |
I voted -1 for removing Data quality before, but the author gave us another option about the plugin instead of the built-in function. I think it's a good way. Actually, ds's data quality is not an out-of-the-box module. The author builds a framework but still needs lots of work that can be used and used simply. The author has another github repo about the greater and more functional data quality project. If we enhance the data quality in DS, it means we will do the same work. So I agree to remove the data quality module in ds, and use a plugin to let users continue to use the better data quality check functions. About removing tables in ds, I agree @zhongjiajie new bash scrip way. |
agree . |
Agree. I voted +1. |
About the Flink real-time stream processing task, we have rewritten it for our platform. And our platform looks like a big data schedule system. @davidzollo told want to have a quick meet about the Real-time part in Ds, We have finished some doc about introducing our Flink, Do you have some time so we can discuss this part? CC @Gallardot |
We remove data-quality related table in init sql but not add it in upgrade sql. It will cause the schema check in CI failed... @zhongjiajie @ruanwenjun |
Quality Gate failedFailed conditions |
Purpose of the pull request
close #16728
Brief change log
Verify this pull request
This pull request is code cleanup without any test coverage.
(or)
This pull request is already covered by existing tests, such as (please describe tests).
(or)
This change added tests and can be verified as follows:
(or)
Pull Request Notice
Pull Request Notice
If your pull request contain incompatible change, you should also add it to
docs/docs/en/guide/upgrede/incompatible.md