Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] org.opensearch.search.sort.FieldSortIT.testIssue6614 is flaky #11347

Closed
reta opened this issue Nov 27, 2023 · 5 comments · Fixed by #12259
Closed

[BUG] org.opensearch.search.sort.FieldSortIT.testIssue6614 is flaky #11347

reta opened this issue Nov 27, 2023 · 5 comments · Fixed by #12259
Assignees
Labels
bug Something isn't working flaky-test Random test failure that succeeds on second run

Comments

@reta
Copy link
Collaborator

reta commented Nov 27, 2023

Describe the bug
The test case org.opensearch.search.sort.FieldSortIT.testIssue6614 is flaky:

org.opensearch.search.sort.FieldSortIT.testIssue6614 {p0={"search.concurrent_segment_search.enabled":"true"}}

java.lang.AssertionError: Unexpected ShardFailures: [[idx_24][2] failed, reason [BroadcastShardOperationFailedException[]; nested: RemoteTransportException[[node_s4][127.0.0.1:42895][indices:admin/refresh[s][p]]]; nested: RefreshFailedEngineException[Refresh failed]; nested: FileSystemException[/var/jenkins/workspace/gradle-check/search/server/build/testrun/internalClusterTest/temp/org.opensearch.search.sort.FieldSortIT_7E94A27DAB4FE81-001/tempDir-002/node_s4/nodes/0/indices/HUroXZumRjCNQDGxtHp-8A/2/index/_1_Lucene90_0.dvd: Too many open files]; ], [idx_24][2] failed, reason [BroadcastShardOperationFailedException[]; nested: RemoteTransportException[[node_s4][127.0.0.1:42895][indices:admin/refresh[s][p]]]; nested: RefreshFailedEngineException[Refresh failed]; nested: FileSystemException[/var/jenkins/workspace/gradle-check/search/server/build/testrun/internalClusterTest/temp/org.opensearch.search.sort.FieldSortIT_7E94A27DAB4FE81-001/tempDir-002/node_s4/nodes/0/indices/HUroXZumRjCNQDGxtHp-8A/2/index/_1_Lucene90_0.dvd: Too many open files]; ], [idx_24][1] failed, reason [BroadcastShardOperationFailedException[]; nested: RemoteTransportException[[node_s5][127.0.0.1:34397][indices:admin/refresh[s]]]; nested: RemoteTransportException[[node_s5][127.0.0.1:34397][indices:admin/refresh[s][p]]]; nested: RefreshFailedEngineException[Refresh failed]; nested: FileSystemException[/var/jenkins/workspace/gradle-check/search/server/build/testrun/internalClusterTest/temp/org.opensearch.search.sort.FieldSortIT_7E94A27DAB4FE81-001/tempDir-002/node_s5/nodes/0/indices/HUroXZumRjCNQDGxtHp-8A/1/index/_1_Lucene90_0.dvd: Too many open files]; ], [idx_24][1] failed, reason [BroadcastShardOperationFailedException[]; nested: RemoteTransportException[[node_s5][127.0.0.1:34397][indices:admin/refresh[s]]]; nested: RemoteTransportException[[node_s5][127.0.0.1:34397][indices:admin/refresh[s][p]]]; nested: RefreshFailedEngineException[Refresh failed]; nested: FileSystemException[/var/jenkins/workspace/gradle-check/search/server/build/testrun/internalClusterTest/temp/org.opensearch.search.sort.FieldSortIT_7E94A27DAB4FE81-001/tempDir-002/node_s5/nodes/0/indices/HUroXZumRjCNQDGxtHp-8A/1/index/_1_Lucene90_0.dvd: Too many open files]; ], [idx_24][0] failed, reason [BroadcastShardOperationFailedException[]; nested: RemoteTransportException[[node_s3][127.0.0.1:35941][indices:admin/refresh[s]]]; nested: RemoteTransportException[[node_s3][127.0.0.1:35941][indices:admin/refresh[s][p]]]; nested: RefreshFailedEngineException[Refresh failed]; nested: FileSystemException[/var/jenkins/workspace/gradle-check/search/server/build/testrun/internalClusterTest/temp/org.opensearch.search.sort.FieldSortIT_7E94A27DAB4FE81-001/tempDir-002/node_s3/nodes/0/indices/HUroXZumRjCNQDGxtHp-8A/0/index/_1_Lucene90_0.dvd: Too many open files]; ], [idx_24][0] failed, reason [BroadcastShardOperationFailedException[]; nested: RemoteTransportException[[node_s3][127.0.0.1:35941][indices:admin/refresh[s]]]; nested: RemoteTransportException[[node_s3][127.0.0.1:35941][indices:admin/refresh[s][p]]]; nested: RefreshFailedEngineException[Refresh failed]; nested: FileSystemException[/var/jenkins/workspace/gradle-check/search/server/build/testrun/internalClusterTest/temp/org.opensearch.search.sort.FieldSortIT_7E94A27DAB4FE81-001/tempDir-002/node_s3/nodes/0/indices/HUroXZumRjCNQDGxtHp-8A/0/index/_1_Lucene90_0.dvd: Too many open files]; ], [idx_24][4] failed, reason [BroadcastShardOperationFailedException[]; nested: RemoteTransportException[[node_s5][127.0.0.1:34397][indices:admin/refresh[s]]]; nested: RemoteTransportException[[node_s5][127.0.0.1:34397][indices:admin/refresh[s][p]]]; nested: RefreshFailedEngineException[Refresh failed]; nested: FileSystemException[/var/jenkins/workspace/gradle-check/search/server/build/testrun/internalClusterTest/temp/org.opensearch.search.sort.FieldSortIT_7E94A27DAB4FE81-001/tempDir-002/node_s5/nodes/0/indices/HUroXZumRjCNQDGxtHp-8A/4/index/_1.kdi: Too many open files]; ], [idx_24][4] failed, reason [BroadcastShardOperationFailedException[]; nested: RemoteTransportException[[node_s5][127.0.0.1:34397][indices:admin/refresh[s]]]; nested: RemoteTransportException[[node_s5][127.0.0.1:34397][indices:admin/refresh[s][p]]]; nested: RefreshFailedEngineException[Refresh failed]; nested: FileSystemException[/var/jenkins/workspace/gradle-check/search/server/build/testrun/internalClusterTest/temp/org.opensearch.search.sort.FieldSortIT_7E94A27DAB4FE81-001/tempDir-002/node_s5/nodes/0/indices/HUroXZumRjCNQDGxtHp-8A/4/index/_1.kdi: Too many open files]; ]]
Expected: <0>
     but: was <8>
	at __randomizedtesting.SeedInfo.seed([7E94A27DAB4FE81:D9A678479F741FE5]:0)
	at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
	at org.opensearch.test.hamcrest.OpenSearchAssertions.assertNoFailures(OpenSearchAssertions.java:377)
	at org.opensearch.test.OpenSearchIntegTestCase.refresh(OpenSearchIntegTestCase.java:1487)
	at org.opensearch.test.OpenSearchIntegTestCase.indexRandomForMultipleSlices(OpenSearchIntegTestCase.java:1768)
	at org.opensearch.test.OpenSearchIntegTestCase.indexRandom(OpenSearchIntegTestCase.java:1724)
	at org.opensearch.test.OpenSearchIntegTestCase.indexRandom(OpenSearchIntegTestCase.java:1620)
	at org.opensearch.test.OpenSearchIntegTestCase.indexRandom(OpenSearchIntegTestCase.java:1604)
	at org.opensearch.search.sort.FieldSortIT.testIssue6614(FieldSortIT.java:229)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
	at java.base/java.lang.reflect.Method.invoke(Method.java:580)
	at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:938)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:974)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:988)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
	at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
	at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
	at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
	at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
	at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
	at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:947)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:832)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:883)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:894)
	at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
	at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
	at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
	at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
	at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
	at java.base/java.lang.Thread.run(Thread.java:1583)
org.opensearch.search.sort.FieldSortIT.testIssue6614 {p0={"search.concurrent_segment_search.enabled":"false"}}

UncategorizedExecutionException[Failed execution]; nested: FileSystemException[/var/jenkins/workspace/gradle-check/search/server/build/testrun/internalClusterTest/temp/org.opensearch.search.sort.FieldSortIT_8A88631ACFFFDB9B-001/tempDir-002/node_s0/nodes/0/indices/vGIaFaF6RsKJ2xalzfJMqw/0/index/_9.fdm: Too many open files];
	at __randomizedtesting.SeedInfo.seed([8A88631ACFFFDB9B:54C7517A8A3F3AFF]:0)
	at app//org.opensearch.action.support.AdapterActionFuture.unwrapEsException(AdapterActionFuture.java:102)
	at app//org.opensearch.action.support.AdapterActionFuture.actionGet(AdapterActionFuture.java:57)
	at app//org.opensearch.action.ActionRequestBuilder.get(ActionRequestBuilder.java:73)
	at app//org.opensearch.test.OpenSearchIntegTestCase.indexRandomForMultipleSlices(OpenSearchIntegTestCase.java:1737)
	at app//org.opensearch.test.OpenSearchIntegTestCase.indexRandom(OpenSearchIntegTestCase.java:1686)
	at app//org.opensearch.test.OpenSearchIntegTestCase.indexRandom(OpenSearchIntegTestCase.java:1582)
	at app//org.opensearch.test.OpenSearchIntegTestCase.indexRandom(OpenSearchIntegTestCase.java:1566)
	at app//org.opensearch.search.sort.FieldSortIT.testIssue6614(FieldSortIT.java:229)
	at [email protected]/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
	at [email protected]/java.lang.reflect.Method.invoke(Method.java:580)
	at app//com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750)
	at app//com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:938)
	at app//com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:974)
	at app//com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:988)
	at app//com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at app//org.junit.rules.RunRules.evaluate(RunRules.java:20)
	at app//org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
	at app//org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
	at app//org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
	at app//org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
	at app//org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
	at app//org.junit.rules.RunRules.evaluate(RunRules.java:20)
	at app//com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at app//com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
	at app//com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
	at app//com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
	at app//com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:947)
	at app//com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:832)
	at app//com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:883)
	at app//com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:894)
	at app//org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
	at app//com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at app//org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
	at app//com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
	at app//com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
	at app//com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at app//com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at app//org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
	at app//org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
	at app//org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
	at app//org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
	at app//org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
	at app//org.junit.rules.RunRules.evaluate(RunRules.java:20)
	at app//com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at app//com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
	at [email protected]/java.lang.Thread.run(Thread.java:1583)
Caused by: java.nio.file.FileSystemException: /var/jenkins/workspace/gradle-check/search/server/build/testrun/internalClusterTest/temp/org.opensearch.search.sort.FieldSortIT_8A88631ACFFFDB9B-001/tempDir-002/node_s0/nodes/0/indices/vGIaFaF6RsKJ2xalzfJMqw/0/index/_9.fdm: Too many open files
	at org.apache.lucene.tests.mockfile.HandleLimitFS.onOpen(HandleLimitFS.java:67)
	at org.apache.lucene.tests.mockfile.HandleTrackingFS.callOpenHook(HandleTrackingFS.java:82)
	at org.apache.lucene.tests.mockfile.HandleTrackingFS.newOutputStream(HandleTrackingFS.java:163)
	at java.nio.file.Files.newOutputStream(Files.java:227)
	at org.apache.lucene.store.FSDirectory$FSIndexOutput.<init>(FSDirectory.java:394)
	at org.apache.lucene.store.FSDirectory$FSIndexOutput.<init>(FSDirectory.java:387)
	at org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:220)
	at org.apache.lucene.tests.store.MockDirectoryWrapper.createOutput(MockDirectoryWrapper.java:717)
	at org.apache.lucene.store.FilterDirectory.createOutput(FilterDirectory.java:75)
	at org.opensearch.index.store.ByteSizeCachingDirectory.createOutput(ByteSizeCachingDirectory.java:153)
	at org.apache.lucene.store.FilterDirectory.createOutput(FilterDirectory.java:75)
	at org.apache.lucene.store.LockValidatingDirectoryWrapper.createOutput(LockValidatingDirectoryWrapper.java:43)
	at org.apache.lucene.store.TrackingDirectoryWrapper.createOutput(TrackingDirectoryWrapper.java:41)
	at org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsWriter.<init>(Lucene90CompressingStoredFieldsWriter.java:126)
	at org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsFormat.fieldsWriter(Lucene90CompressingStoredFieldsFormat.java:140)
	at org.apache.lucene.codecs.lucene90.Lucene90StoredFieldsFormat.fieldsWriter(Lucene90StoredFieldsFormat.java:154)
	at org.apache.lucene.index.StoredFieldsConsumer.initStoredFieldsWriter(StoredFieldsConsumer.java:50)
	at org.apache.lucene.index.StoredFieldsConsumer.startDocument(StoredFieldsConsumer.java:57)
	at org.apache.lucene.index.IndexingChain.startStoredFields(IndexingChain.java:512)
	at org.apache.lucene.index.IndexingChain.processDocument(IndexingChain.java:543)
	at org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:242)
	at org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:432)
	at org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1545)
	at org.apache.lucene.index.IndexWriter.softUpdateDocument(IndexWriter.java:1862)
	at org.opensearch.index.engine.InternalEngine.deleteInLucene(InternalEngine.java:1547)
	at org.opensearch.index.engine.InternalEngine.delete(InternalEngine.java:1360)
	at org.opensearch.index.shard.IndexShard.delete(IndexShard.java:1295)
	at org.opensearch.index.shard.IndexShard.applyDeleteOperation(IndexShard.java:1269)
	at org.opensearch.index.shard.IndexShard.applyDeleteOperationOnPrimary(IndexShard.java:1209)
	at org.opensearch.action.bulk.TransportShardBulkAction.executeBulkItemRequest(TransportShardBulkAction.java:613)
	at org.opensearch.action.bulk.TransportShardBulkAction$2.doRun(TransportShardBulkAction.java:469)
	at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
	at org.opensearch.action.bulk.TransportShardBulkAction.performOnPrimary(TransportShardBulkAction.java:533)
	at org.opensearch.action.bulk.TransportShardBulkAction.dispatchedShardOperationOnPrimary(TransportShardBulkAction.java:414)
	at org.opensearch.action.bulk.TransportShardBulkAction.dispatchedShardOperationOnPrimary(TransportShardBulkAction.java:124)
	at org.opensearch.action.support.replication.TransportWriteAction$1.doRun(TransportWriteAction.java:235)
	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:911)
	at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at java.lang.Thread.run(Thread.java:1583)

To Reproduce

./gradlew ':server:internalClusterTest' --tests "org.opensearch.search.sort.FieldSortIT" -Dtests.method="testIssue6614 {p0={"search.concurrent_segment_search.enabled":"false"}}"
./gradlew ':server:internalClusterTest' --tests "org.opensearch.search.sort.FieldSortIT" -Dtests.method="testIssue6614 {p0={"search.concurrent_segment_search.enabled":"true"}}" 

Expected behavior
The test should always pass

Plugins
Standard

Screenshots
If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):

  • CI

Additional context

@Poojita-Raj
Copy link
Contributor

Poojita-Raj commented Feb 7, 2024

Looking through the test failures, the failures are not taking place at the same spot.
Main cause is "Too many open files" which we observe mostly in the indexRandomForMultipleSlices method in OpenSearchIntegTestCase when it indexes random bogus docs, deletes them and does a refresh. Generally fails on this refresh.
We also see "Too many open files" failure on IndexRandom when it assertsNoFailures on shards.
Other exceptions are (1) Suite timeout and (2) Ack Response failed.

Looking into the main failure of "Too many open files" since the others seem like manifestations of the same root cause.

@Poojita-Raj
Copy link
Contributor

From printing the ProcessProbe.getInstance().getMaxFileDescriptorCount() -> we see max fd count is 10240.

@Poojita-Raj
Copy link
Contributor

Below are some of the important configurable params for this test:
numIndices = 0 to 25
numShards= between 0 to 10 shards
numReplicas = between 0 and cluster().numDataNodes() - 1
numDataNodes = between 1 to 3

250*3 = 750 = This will be the max number of shards for this test which could result in "too many open files".

Reran test with all numIndices set to the highest value and noticed reproducible failures on all the linked failure seeds. Tried lower values for numIndices - at numIndices set to 18, all the reproducible failure seeds pass.

@peternied
Copy link
Member

@Poojita-Raj Looks like we've seen this resurface - it is possible that this failure happened because the filesystem/constraints on the Jenkins worker are different than where you validated the max configuration?

Naive question; is there value in the number of max indices being 25 vs 5 seems like a quick fix?

java.lang.AssertionError: 
Expected: an empty iterable
     but: [<RemoteTransportException[[node_s2][127.0.0.1:37805][indices:data/write/bulk[s][p]]]; nested: FileSystemException[/var/jenkins/workspace/gradle-check/search/server/build/testrun/internalClusterTest/temp/org.opensearch.search.sort.FieldSortIT_BF0E230010DCABB-001/tempDir-002/node_s2/nodes/0/indices/a4BOv7HXQfqPTsawdgthhQ/3/index/_1.fdt: Too many open files];>,<RemoteTransportException[[node_s2][127.0.0.1:37805][indices:data/write/bulk[s]]]; nested: RemoteTransportException[[node_s2][127.0.0.1:37805][indices:data/write/bulk[s][p]]]; nested: FileSystemException[/var/jenkins/workspace/gradle-check/search/server/build/testrun/internalClusterTest/temp/org.opensearch.search.sort.FieldSortIT_BF0E230010DCABB-001/tempDir-002/node_s2/nodes/0/indices/a4BOv7HXQfqPTsawdgthhQ/3/index/_0_Lucene90FieldsIndex-doc_ids_0.tmp: Too many open files];>,<RemoteTransportException[[node_s1][127.0.0.1:46193][indices:data/write/bulk[s]]]; nested: RemoteTransportException[[node_s1][127.0.0.1:46193][indices:data/write/bulk[s][p]]]; nested: FileSystemException[/var/jenkins/workspace/gradle-check/search/server/build/testrun/internalClusterTest/temp/org.opensearch.search.sort.FieldSortIT_BF0E230010DCABB-001/tempDir-002/node_s1/nodes/0/indices/a4BOv7HXQfqPTsawdgthhQ/7/index/_0.fdm: Too many open files];>,<RemoteTransportException[[node_s2][127.0.0.1:37805][indices:data/write/bulk[s]]]; nested: RemoteTransportException[[node_s2][127.0.0.1:37805][indices:data/write/bulk[s][p]]]; nested: FileSystemException[/var/jenkins/workspace/gradle-check/search/server/build/testrun/internalClusterTest/temp/org.opensearch.search.sort.FieldSortIT_BF0E230010DCABB-001/tempDir-002/node_s2/nodes/0/indices/a4BOv7HXQfqPTsawdgthhQ/0/index/_0.fdt: Too many open files];>,<RemoteTransportException[[node_s0][127.0.0.1:36253][indices:data/write/bulk[s]]]; nested: RemoteTransportException[[node_s0][127.0.0.1:36253][indices:data/write/bulk[s][p]]]; nested: FileSystemException[/var/jenkins/workspace/gradle-check/search/server/build/testrun/internalClusterTest/temp/org.opensearch.search.sort.FieldSortIT_BF0E230010DCABB-001/tempDir-002/node_s0/nodes/0/indices/a4BOv7HXQfqPTsawdgthhQ/8/index/_1_Lucene90FieldsIndexfile_pointers_3.tmp: Too many open files];>]

@reta
Copy link
Collaborator Author

reta commented Jun 19, 2024

Closing in favour of #14287

@reta reta closed this as completed Jun 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working flaky-test Random test failure that succeeds on second run
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants