Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to set user for test cluster to perform initial health checks #11641

Closed
wants to merge 5 commits into from

Conversation

jmazanec15
Copy link
Member

@jmazanec15 jmazanec15 commented Dec 19, 2023

Description

I am working on adding gradle task for spinning up k-NN+security plugin for integration testing and debugging: opensearch-project/k-NN#1307. Ive been able to get the cluster to come up and pass the health check for the integTest task, but for run had been failing for awhile with a 401 during the wait for yellow health check: https://github.com/opensearch-project/OpenSearch/blob/main/buildSrc/src/main/java/org/opensearch/gradle/testclusters/OpenSearchCluster.java#L559:

/gradlew run -Dsecurity.enabled=true --info
...
* Where:
Build file 'k-NN-1/build.gradle' line: 407

* What went wrong:
Execution failed for task ':run'.
> `cluster{::integTest}` failed to wait for cluster health yellow after 40 SECONDS
    IO error while waiting cluster
    401 Unauthorized

* Try:
> Run with --stacktrace option to get the stack trace.
> Run with --debug option to get more log output.
> Run with --scan to get full insights.
> Get more help at https://help.gradle.org.

The problem is that it was unable to get the username and password from the credentials. These are set during Node start here: https://github.com/opensearch-project/OpenSearch/blob/main/buildSrc/src/main/java/org/opensearch/gradle/testclusters/OpenSearchNode.java#L485-L497. These need to be set via system properties and cannot be passed via cluster.systemProperty call. So, I modified my task to run:

task setCorrectSystemProperties {
    System.setProperty("tests.opensearch.secure", "true")
    System.setProperty("tests.opensearch.username", "admin")
    System.setProperty("tests.opensearch.password", "admin")
}

run {
    useCluster project.testClusters.integTest
    dependsOn buildJniLib, setCorrectSystemProperties

    doFirst {
        // There seems to be an issue when running multi node run or integ tasks with unicast_hosts
        // not being written, the waitForAllConditions ensures it's written
        getClusters().forEach { cluster ->
            cluster.waitForAllConditions()
        }
    }
}

However, I get the following error:

=== Standard error of node `node{::integTest-0}` ===
»   ↓ last 40 non error or warning messages from k-NN-1/build/testclusters/integTest-0/logs/opensearch.stderr.log ↓
» uncaught exception in thread [main]
»  SettingsException[unknown setting [secure] please check that any required plugins are installed, or check the breaking changes documentation for removed settings]
»       at org.opensearch.common.settings.AbstractScopedSettings.validate(AbstractScopedSettings.java:606)
»       at org.opensearch.common.settings.AbstractScopedSettings.validate(AbstractScopedSettings.java:547)
»       at org.opensearch.common.settings.AbstractScopedSettings.validate(AbstractScopedSettings.java:517)
»       at org.opensearch.common.settings.AbstractScopedSettings.validate(AbstractScopedSettings.java:487)
»       at org.opensearch.common.settings.SettingsModule.<init>(SettingsModule.java:178)
»       at org.opensearch.node.Node.<init>(Node.java:580)
»       at org.opensearch.node.Node.<init>(Node.java:410)
»       at org.opensearch.bootstrap.Bootstrap$5.<init>(Bootstrap.java:242)
»       at org.opensearch.bootstrap.Bootstrap.setup(Bootstrap.java:242)
»       at org.opensearch.bootstrap.Bootstrap.init(Bootstrap.java:404)
»       at org.opensearch.bootstrap.OpenSearch.init(OpenSearch.java:180)
»       at org.opensearch.bootstrap.OpenSearch.execute(OpenSearch.java:171)
»       at org.opensearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:104)
»       at org.opensearch.cli.Command.mainWithoutErrorHandling(Command.java:138)
»       at org.opensearch.cli.Command.main(Command.java:101)
»       at org.opensearch.bootstrap.OpenSearch.main(OpenSearch.java:137)
»       at org.opensearch.bootstrap.OpenSearch.main(OpenSearch.java:103)
»  For complete error details, refer to the log at k-NN-1/build/testclusters/integTest-0/logs/integTest.log

This is because in the RunTask, it will pass any system properties with the prefix tests.opensearch as settings to the nodes: https://github.com/opensearch-project/OpenSearch/blob/main/buildSrc/src/main/java/org/opensearch/gradle/testclusters/RunTask.java#L143-L152. This is documented here: https://github.com/opensearch-project/OpenSearch/blob/main/DEVELOPER_GUIDE.md#run-opensearch.

EDIT

I updated the PR to implement the user option in OpenSearchNode. When a user passes in a UserSpec map with "username" and "password" keys, they will be set for the initial credentials. I validated this works for gradlew run for my change in k-NN.

I am not sure if this is the correct solution or not and wanted to get feedback. A few alternatives I could think of:

  1. Make these properties settable via cluster.systemProperty - not sure this information is required inside the node environment
  2. Create setters on cluster to directly set credentials - seems cluster.setCredentials would be misleading. Maybe something like cluster.setCredentialsForHealthCheck, but this may be limiting.
  3. Renaming system property to read the credentials from so they are not prefixed with "tests.opensearch"

@scrawfor99 @cwperks let me know if you have any thoughts.

Related Issues

Resolves #11219
Related change: #8900

Check List

  • New functionality includes testing.
  • All tests pass
  • New functionality has been documented.
  • New functionality has javadoc added
  • Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)
  • Public documentation issue/PR created

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Contributor

❌ Gradle check result for 2458c4f: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@@ -483,16 +483,16 @@ public Stream<String> logLines() throws IOException {
@Override
public synchronized void start() {
LOGGER.info("Starting `{}`", this);
if (System.getProperty("tests.opensearch.secure") != null
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you identified the issue, could we not pass tests.opensearch props to the node instead of modifying properties? I am not sure what the impact this change would make (renaming properties) since we are basically modifying publicly available Gradle plugin

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you identified the issue, could we not pass tests.opensearch props to the node instead of modifying properties?

Yes, we could do this. The main reason I didnt do it was that the credentials did not seem to belong in the system properties for the node itself - in other words, the running node does not need these credentials for anything.

That being said, I was thinking that maybe just adding an option to set the credentials directly via cluster.setCredentialsForHealthCheck or cluster.setCredentials might make the most sense.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or @reta , alternatively, I see that there is a user option in the TestClusterConfiguration: https://github.com/opensearch-project/OpenSearch/blob/main/buildSrc/src/main/java/org/opensearch/gradle/testclusters/TestClusterConfiguration.java#L125.

In the OpenSearchNode, its a no-op: https://github.com/opensearch-project/OpenSearch/blob/main/buildSrc/src/main/java/org/opensearch/gradle/testclusters/OpenSearchNode.java#L706, but Im wondering if we can just put the following (which is what we do in the start with the sys props):

    @Override
    public void user(Map<String, String> userSpec) {
            this.credentials.get(0).put("username", userSpec.get("username")))
            this.credentials.get(0).put("password",  userSpec.get("password")));
    }

Im not really sure what outside of this health check credentials/user would be used for aside from. @cwperks do you think this sounds good?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or @reta , alternatively, I see that there is a user option in the TestClusterConfiguration: https://github.com/opensearch-project/OpenSearch/blob/main/buildSrc/src/main/java/org/opensearch/gradle/testclusters/TestClusterConfiguration.java#L125.

You mean as an alternative mechanism to system properties?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I updated the PR and confirmed it works for my k-NN change locally.

@cwperks
Copy link
Member

cwperks commented Dec 19, 2023

@jmazanec15 The security plugin passes these params into the gradle task using -D like this:

./gradlew -p bwc-test
      bwcTestSuite
      -Dtests.security.manager=false
      -Dtests.opensearch.secure=true
      -Dtests.opensearch.username=${{ inputs.username }}
      -Dtests.opensearch.password=${{ inputs.password }}
      -Dbwc.version.previous=${{ steps.build-previous.outputs.built-version }}
      -Dbwc.version.next=${{ steps.build-next.outputs.built-version }} -i

Are you able to pass these values into the gradle run command?

@jmazanec15
Copy link
Member Author

jmazanec15 commented Dec 19, 2023

@cwperks I think bwc-test will most likely work because it does not have this logic in the task: https://github.com/opensearch-project/OpenSearch/blob/main/buildSrc/src/main/java/org/opensearch/gradle/testclusters/RunTask.java#L143-L152.

Passing via -D should have the same effect as:

    System.setProperty("tests.secure", "true")
    System.setProperty("tests.username", "admin")
    System.setProperty("tests.password", "admin")

I did try setting via gradle commands awhile ago with run but I didnt work.

@jmazanec15 jmazanec15 changed the title Avoid prefixing sys props with tests.opensearch that are not settings Add option to set user for test cluster to perform initial health checks Dec 19, 2023
Copy link
Contributor

❌ Gradle check result for b68d7ba: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

}

if (userSpec.containsKey("username") && userSpec.containsKey("password")) {
this.credentials.get(0).put("username", userSpec.get("username"));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The credentials would conflict with start method I believe

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For start, we would allow users to override if the sys properties are set. We could get rid of reading from sys properties during start and require user to be used, but I think this would break things for other plugins.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am trying to trace where this method is called but the only place I found is OpenSearchCluster::user, could you please help me to figure out who actually populates the userSpec in the first place?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, this would be from the consumer of the gradle plugin. So for instance in k-NN, we configure the test cluster here: https://github.com/opensearch-project/k-NN/blob/main/build.gradle#L259-L280.

For security, it would look something like:

testClusters.integTest {
    testDistribution = "ARCHIVE"

    // Optionally install security
    if (System.getProperty("security.enabled") != null) {
        configurations.zipArchive.asFileTree.each {
            plugin(provider(new Callable<RegularFile>() {
                @Override
                RegularFile call() throws Exception {
                    return new RegularFile() {
                        @Override
                        File getAsFile() {
                            return it
                        }
                    }
                }
            }))
        }

        user(Map.of("username", "admin", "password", "admin"))

        extraConfigFile("admin-cert.pem", new File("$rootDir/src/test/resources/security/admin-cert.pem"))
        extraConfigFile("admin-key.pem", new File("$rootDir/src/test/resources/security/admin-key.pem"))
        extraConfigFile("node-cert.pem", new File("$rootDir/src/test/resources/security/node-cert.pem"))
        extraConfigFile("node-key.pem", new File("$rootDir/src/test/resources/security/node-key.pem"))
        extraConfigFile("root-ca.pem", new File("$rootDir/src/test/resources/security/root-ca.pem"))

        setting("plugins.security.ssl.transport.pemcert_filepath", "node-cert.pem")
        setting("plugins.security.ssl.transport.pemkey_filepath", "node-key.pem")
        setting("plugins.security.ssl.transport.pemtrustedcas_filepath", "root-ca.pem")
        setting("plugins.security.ssl.transport.enforce_hostname_verification", "false")
        setting("plugins.security.ssl.http.enabled", "true")
        setting("plugins.security.ssl.http.pemcert_filepath", "node-cert.pem")
        setting("plugins.security.ssl.http.pemkey_filepath", "node-key.pem")
        setting("plugins.security.ssl.http.pemtrustedcas_filepath", "root-ca.pem")
        setting("plugins.security.allow_unsafe_democertificates", "true")
        setting("plugins.security.allow_default_init_securityindex", "true")
        setting("plugins.security.unsupported.inject_user.enabled", "true")

        setting("plugins.security.authcz.admin_dn", "\n- CN=kirk,OU=client,O=client,L=test, C=de")
        setting('plugins.security.restapi.roles_enabled', '["all_access", "security_rest_api_access"]')
        setting('plugins.security.system_indices.enabled', "true")
        setting('plugins.security.system_indices.indices', '[".plugins-ml-config", ".plugins-ml-connector", ".plugins-ml-model-group", ".plugins-ml-model", ".plugins-ml-task", ".plugins-ml-conversation-meta", ".plugins-ml-conversation-interactions", ".opendistro-alerting-config", ".opendistro-alerting-alert*", ".opendistro-anomaly-results*", ".opendistro-anomaly-detector*", ".opendistro-anomaly-checkpoints", ".opendistro-anomaly-detection-state", ".opendistro-reports-*", ".opensearch-notifications-*", ".opensearch-notebooks", ".opensearch-observability", ".ql-datasources", ".opendistro-asynchronous-search-response*", ".replication-metadata-store", ".opensearch-knn-models", ".geospatial-ip2geo-data*"]')

        setSecure(true)
    }

Copy link
Collaborator

@reta reta Dec 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, this would be from the consumer of the gradle plugin.

Gotcha, thanks a lot @jmazanec15 , it looks quite logical to me to allow user credentials configuration this way, so what about altering OpenSearchNode::start() to consult system properties only if credentials were not set? That should not break anyone I think (hard to imagine cluster and node(s) to have different credentials)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should not break anyone I think (hard to imagine cluster and node(s) to have different credentials)
Btw: its also possible from consuming plugin to call user per node via something like:

cluster.getFirstNode.user()

I think thats an option. However, right now, it looks like the system properties are intended to perform the override:

        if (System.getProperty("tests.opensearch.username") != null) {
            this.credentials.get(0).put("username", System.getProperty("tests.opensearch.username"));
            LOGGER.info("Overwriting username to: " + this.getCredentials().get(0).get("username"));
        }
        if (System.getProperty("tests.opensearch.password") != null) {
            this.credentials.get(0).put("password", System.getProperty("tests.opensearch.password"));
            LOGGER.info("Overwriting password to: " + this.getCredentials().get(0).get("password"));
        }

Id be curious if the security team has any strong preference before doing that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Id be curious if the security team has any strong preference before doing that.

Sure, let's wait for their feedback, thanks!

Copy link
Contributor

github-actions bot commented Dec 19, 2023

Compatibility status:

Checks if related components are compatible with change 0feaa31

Incompatible components

Incompatible components: [https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/performance-analyzer-rca.git]

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/sql.git]

Avoids adding prefix "tests.opensearch." to system properties that
interact with gradle test cluster. Using this prefix causes these
properties to be passed to the nodes as settings:
https://github.com/opensearch-project/OpenSearch/blob/main/DEVELOPER_GUIDE.md#run-opensearch.

Signed-off-by: John Mazanec <[email protected]>
Sets the user option for the OpenSearchNode to set the credentials the
node should use. This allows us to avoid passing in the username and
password as system properties in order to launch the cluster with
security plugin installed. These credentials will be used for the health
check.

Signed-off-by: John Mazanec <[email protected]>
Signed-off-by: John Mazanec <[email protected]>
Copy link
Contributor

❌ Gradle check result for 6f70b12: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 50a027c: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: John Mazanec <[email protected]>
Copy link
Contributor

❕ Gradle check result for 0feaa31: UNSTABLE

  • TEST FAILURES:
      1 org.opensearch.search.SearchWeightedRoutingIT.testMultiGetWithNetworkDisruption_FailOpenEnabled

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Copy link

codecov bot commented Dec 20, 2023

Codecov Report

Attention: Patch coverage is 0% with 11 lines in your changes are missing coverage. Please review.

Project coverage is 71.35%. Comparing base (2c8ee19) to head (0feaa31).
Report is 213 commits behind head on main.

Files Patch % Lines
...opensearch/gradle/testclusters/OpenSearchNode.java 0.00% 11 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #11641      +/-   ##
============================================
- Coverage     71.46%   71.35%   -0.11%     
+ Complexity    59257    59160      -97     
============================================
  Files          4906     4906              
  Lines        278184   278194      +10     
  Branches      40421    40424       +3     
============================================
- Hits         198791   198497     -294     
- Misses        62959    63225     +266     
- Partials      16434    16472      +38     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jmazanec15
Copy link
Member Author

jmazanec15 commented Dec 21, 2023

org.opensearch.search.SearchWeightedRoutingIT.testMultiGetWithNetworkDisruption_FailOpenEnabled Test looks flaky: #10755. This code will not impact that.

@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 30 days with no activity.

@opensearch-trigger-bot opensearch-trigger-bot bot added stalled Issues that have stalled and removed stalled Issues that have stalled labels Jan 23, 2024
@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 30 days with no activity.

@opensearch-trigger-bot opensearch-trigger-bot bot added the stalled Issues that have stalled label Feb 24, 2024
@andrross
Copy link
Member

@jmazanec15 What do we need to take this PR forward? You mentioned wanting feedback from the security team. Can we rope in the relevant folks to provide feedback to hopefully get to a conclusion here?

@opensearch-trigger-bot opensearch-trigger-bot bot removed the stalled Issues that have stalled label Mar 13, 2024
@jmazanec15
Copy link
Member Author

PR was pending review from correct team. We were able to get around it by doing the following for each node in the cluster: https://github.com/opensearch-project/k-NN/blob/main/build.gradle#L95-L102. So we are unblocked.

        cluster.getNodes().forEach { node ->
            var creds = node.getCredentials()
            if (creds.isEmpty()) {
                creds.add(Map.of('username', 'admin', 'password', 'admin'))
            } else {
                creds.get(0).putAll(Map.of('username', 'admin', 'password', 'admin'))
            }
        }

This change would make it so that:

    cluster.user(Map.of("username", "admin", "password", "admin"))

@jmazanec15 jmazanec15 closed this Apr 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working security Anything security related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Unable to spin up security cluster with SSL config
4 participants