Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closing immediately after dialing successfully throws "writable end state is "writing" not "ready" #72

Open
marcus-pousette opened this issue Jan 27, 2024 · 11 comments

Comments

@marcus-pousette
Copy link

marcus-pousette commented Jan 27, 2024

I migrated some test code previously using mplex to yamux and ran into issues for tests where I shutdown nodes immediately after dialing.

I get this error "writable end state is "writing" not "ready"

When this error is thrown read status is 'closing'

The "writeStatus" becomes 'writing' when the node is shutting down here here https://github.com/libp2p/js-libp2p/blob/c9ed1c7d62e9af974789eb753d6f8e3c6410df94/packages/utils/src/abstract-stream.ts#L155)

I am not expecting some uncaught errors to be thrown when I am shutting down a node after dialing and if there is a stream creation is in progress

Adding a delay after dialing before closing will stop the error to be thrown.

I am using libp2p (1.2.1) @libp2p/tcp (9.0.13), yamux (6.0.1)

Here is the callstack for when the error is throwing.

YamuxStream.sink (/Users/admin/git/peerbit/node_modules/@libp2p/utils/dist/src/abstract-stream.js:77)
byteStream (/Users/admin/git/peerbit/node_modules/it-byte-stream/dist/src/index.js:41)
lpStream (/Users/admin/git/peerbit/node_modules/it-length-prefixed-stream/dist/src/index.js:37)
Module.handle (/Users/admin/git/peerbit/node_modules/@libp2p/multistream-select/dist/src/handle.js:56)
<anonymous> (/Users/admin/git/peerbit/node_modules/libp2p/dist/src/upgrader.js:264)
process.processTicksAndRejections (internal/process/task_queues:95)
Promise.then (Unknown Source:0)
onIncomingStream (/Users/admin/git/peerbit/node_modules/libp2p/dist/src/upgrader.js:262)
incomingStream (/Users/admin/git/peerbit/node_modules/@chainsafe/libp2p-yamux/dist/src/muxer.js:476)
handleStreamMessage (/Users/admin/git/peerbit/node_modules/@chainsafe/libp2p-yamux/dist/src/muxer.js:410)
handleFrame (/Users/admin/git/peerbit/node_modules/@chainsafe/libp2p-yamux/dist/src/muxer.js:361)
YamuxMuxer.sink (/Users/admin/git/peerbit/node_modules/@chainsafe/libp2p-yamux/dist/src/muxer.js:94)
processTicksAndRejections (internal/process/task_queues:95)
await (Unknown Source:0)
processTicksAndRejections (internal/process/task_queues:95)
await (Unknown Source:0)
processTicksAndRejections (internal/process/task_queues:95)
await (Unknown Source:0)
_createConnection (/Users/admin/git/peerbit/node_modules/libp2p/dist/src/upgrader.js:381)
upgradeOutbound (/Users/admin/git/peerbit/node_modules/libp2p/dist/src/upgrader.js:234)
processTicksAndRejections (internal/process/task_queues:95)
await (Unknown Source:0)
processTicksAndRejections (internal/process/task_queues:95)
await (Unknown Source:0)
processTicksAndRejections (internal/process/task_queues:95)
await (Unknown Source:0)
processTicksAndRejections (internal/process/task_queues:95)
await (Unknown Source:0)
dial (/Users/admin/git/peerbit/node_modules/@libp2p/tcp/dist/src/index.js:106)
await (Unknown Source:0)
dial (/Users/admin/git/peerbit/node_modules/libp2p/dist/src/transport-manager.js:81)
queue.add.peerId.peerId (/Users/admin/git/peerbit/node_modules/libp2p/dist/src/connection-manager/dial-queue.js:153)
processTicksAndRejections (internal/process/task_queues:95)
await (Unknown Source:0)
run (/Users/admin/git/peerbit/node_modules/@libp2p/utils/dist/src/queue/job.js:56)
tryToStartAnother (/Users/admin/git/peerbit/node_modules/@libp2p/utils/dist/src/queue/index.js:79)
add (/Users/admin/git/peerbit/node_modules/@libp2p/utils/dist/src/queue/index.js:121)
dial (/Users/admin/git/peerbit/node_modules/libp2p/dist/src/connection-manager/dial-queue.js:123)
openConnection (/Users/admin/git/peerbit/node_modules/libp2p/dist/src/connection-manager/index.js:309)
dial (/Users/admin/git/peerbit/node_modules/libp2p/dist/src/libp2p.js:212)
connect (/Users/admin/git/peerbit/packages/transport/libp2p-test-utils/src/session.ts:36)
<anonymous> (/Users/admin/git/peerbit/packages/transport/stream/src/__tests__/stream.test.ts:2896)
processTicksAndRejections (internal/process/task_queues:95)
await (Unknown Source:0)
Promise.then.completed (node_modules/jest-circus/build/utils.js:298)
callAsyncCircusFn (node_modules/jest-circus/build/utils.js:231)
_callCircusTest (node_modules/jest-circus/build/run.js:316)
await (Unknown Source:0)
_runTest (node_modules/jest-circus/build/run.js:252)
await (Unknown Source:0)
_runTestsForDescribeBlock (node_modules/jest-circus/build/run.js:126)
await (Unknown Source:0)
_runTestsForDescribeBlock (node_modules/jest-circus/build/run.js:121)
await (Unknown Source:0)
run (node_modules/jest-circus/build/run.js:71)
await (Unknown Source:0)
runAndTransformResultsToJestFormat (node_modules/jest-circus/build/legacy-code-todo-rewrite/jestAdapterInit.js:122)
jestAdapter (/Users/admin/git/peerbit/node_modules/jest-circus/build/legacy-code-todo-rewrite/jestAdapter.js:79)
@achingbrain
Copy link
Collaborator

achingbrain commented Feb 8, 2024

I'm having problems replicating this - here's what I'm running. It starts two nodes, registers a handler on one, dials the protocol handler from the other then shuts both nodes down.

import { createLibp2p } from 'libp2p'
import { tcp } from '@libp2p/tcp'
import { noise } from '@chainsafe/libp2p-noise'
import { yamux } from '@chainsafe/libp2p-yamux'

const PROTOCOL = '/test/protocol'

async function createNode () {
  return createLibp2p({
    transports: [
      tcp()
    ],
    connectionEncryption: [
      noise()
    ],
    streamMuxers: [
      yamux()
    ],
    addresses: {
      listen: [
        '/ip4/0.0.0.0/tcp/0'
      ]
    },
    connectionManager: {
      minConnections: 0
    }
  })
}

console.info('start')
const [
  node1,
  node2
] = await Promise.all([
  createNode(),
  createNode()
])

await node2.handle(PROTOCOL, () => {})

console.info('dial')
await node1.dialProtocol(node2.getMultiaddrs(), PROTOCOL)

console.info('stop')
await node1.stop()
console.info('start')
await node1.start()
console.info('ok')

package.json:

{
  "type": "module",
  "dependencies": {
    "@chainsafe/libp2p-noise": "^15.0.0",
    "@chainsafe/libp2p-yamux": "^6.0.2",
    "@libp2p/tcp": "^9.0.15",
    "libp2p": "^1.2.3"
  }
}

@marcus-pousette
Copy link
Author

marcus-pousette commented Feb 8, 2024

I have not succesfully succeeded to reproduce with a minimal setup, starting from yours yet. But I have a failing test here, if I remove the delay.

I will try out and come back if I figure out what kind of configuration gives this error.

@achingbrain
Copy link
Collaborator

Ah, ok - my setup above was a bit wrong then, I've changed it to stop & start the dialing node as per the failing test. It still seems to work though.

I've also checked out the peerbit repo, the test passes for me without the delay:

 RUNS  packages/transport/stream/src/__tests__/stream.test.ts
 PASS  packages/transport/stream/src/__tests__/stream.test.ts (107.608 s)

but I do get an (unrelated?) test failure with what looks like an internal Jest error:

 FAIL  packages/clients/peerbit-server/node/src/__tests__/api.integration.test.ts
  ● Test suite failed to run

    DataCloneError: function transformRequest(data, headers) {
        const contentType = headers.getContentType() || '';
        const ha...<omitted>... } could not be cloned.

      at messageParent (node_modules/jest-worker/build/workers/messageParent.js:24:34)

@achingbrain
Copy link
Collaborator

Peerbit is a pretty neat project btw - you should PR the libp2p README to add the logo.

@marcus-pousette
Copy link
Author

marcus-pousette commented Feb 9, 2024

Ah, ok - my setup above was a bit wrong then, I've changed it to stop & start the dialing node as per the failing test. It still seems to work though.

I've also checked out the peerbit repo, the test passes for me without the delay:

 RUNS  packages/transport/stream/src/__tests__/stream.test.ts
 PASS  packages/transport/stream/src/__tests__/stream.test.ts (107.608 s)

but I do get an (unrelated?) test failure with what looks like an internal Jest error:

 FAIL  packages/clients/peerbit-server/node/src/__tests__/api.integration.test.ts
  ● Test suite failed to run

    DataCloneError: function transformRequest(data, headers) {
        const contentType = headers.getContentType() || '';
        const ha...<omitted>... } could not be cloned.

      at messageParent (node_modules/jest-worker/build/workers/messageParent.js:24:34)

That is interesting. Then it is something flaky going on depending on how fast the computer runs.

But check this one

import { createLibp2p } from 'libp2p'
import { tcp } from '@libp2p/tcp'
import { noise } from '@chainsafe/libp2p-noise'
import { yamux } from '@chainsafe/libp2p-yamux'

export const delay = (ms: number) => {
    return new Promise<void>((res, rej) => {
        setTimeout(() => {
            res();
        }, ms);
    });
};


it('test flaky', async () => {
    async function createNode() {
        const node = await createLibp2p<any>({
            transports: [
                tcp()
            ],
            connectionEncryption: [
                noise()
            ],
            streamMuxers: [
                yamux()
            ],
            addresses: {
                listen: [
                    '/ip4/127.0.0.1/tcp/0'
                ]
            },

        })

        return node;
    }

    console.info('start')
    const [
        node1,
        node2
    ] = await Promise.all([
        createNode(),
        createNode()
    ])

    console.info('dial')
    node2.dial(node1.getMultiaddrs()).catch((e) => { }) // dont wait here
    await delay(5) // but put a little bit of delay
    console.info('stop')
    await node1.stop();
    console.log("start again")
    await node1.start()
    console.log("stop all")
    await Promise.all([node1, node2].map(x => x.stop()))
    console.log("done")

})

When I run this on my computer with

delay(5)

the test never finishes (timeout at 60seconds)

delay(0) works

delay(100) works

It is maybe related to the issue

@marcus-pousette
Copy link
Author

Peerbit is a pretty neat project btw - you should PR the libp2p README to add the logo.

Thank you! Appreciate it 🙏 Will do!

@wemeetagain
Copy link
Member

is this still an issue?

@tabcat
Copy link
Contributor

tabcat commented Oct 10, 2024

Was able to replicate and created a separate repo at https://github.com/tabcat/delay5

@tabcat
Copy link
Contributor

tabcat commented Oct 10, 2024

This seems isolated to the tcp transport. I was not able to replicate with webSockets swapped for tcp on the ws branch. Before I tried swapping yamux for mplex which also failed the delay 5 test.

@tabcat
Copy link
Contributor

tabcat commented Oct 11, 2024

Was unable to replicate with the listed versions and the issue I found using the latest versions looks unrelated.

@tabcat
Copy link
Contributor

tabcat commented Oct 11, 2024

I ran peerbit start and stop tests without the delay on latest version of master and they passed 5/5 times. This does not seem to be an issue anymore. Would you mind confirming? @marcus-pousette

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants