Skip to content

Commit

Permalink
netlink refactoring and various fixes (#2112)
Browse files Browse the repository at this point in the history
* nbd netlink refactoring (#1963)

- fix context leak when nl_socket_modify_cb fails
- use clojure instead of a static method + custom context as a socket callback
- move TNetlinkDevice methods outside of the class definition to reduce nesting
- use nl_send_auto + nl_wait_for_ack instead of nl_send_sync to differentiate between send and recv errors
- remove "Device" suffixes from TNetlinkDevice methods

* nbd netlink refactoring part 2 (#1971)

- rename Timeout -> RequestTimeout
- rename DeadConnectionTimeout -> ConnectionTimeout
- move Send method from TNetlinkMessage to TNetlinkSocket
- clarify comments regarding ioctl device timeout
- get rid of ShouldStop atomic
- close client socket regardless of deleteDevice flag
- replace hard-coded 1 day ioctl device timeout with connection timeout
  which serves the same purpose
- make Start idempotent

* replace ExtactValue with GetValue (#1984)

* do not call Start on already started nbd endpoint (#1985)
  • Loading branch information
tpashkin authored Sep 23, 2024
1 parent e6d8b1a commit b7ecfb4
Show file tree
Hide file tree
Showing 8 changed files with 247 additions and 213 deletions.
2 changes: 1 addition & 1 deletion cloud/blockstore/config/server.proto
Original file line number Diff line number Diff line change
Expand Up @@ -183,7 +183,7 @@ message TServerConfig
// NBD request timeout in seconds, only makes sense if using netlink
optional uint32 NbdRequestTimeout = 112;

// NBD connection timeout in seconds, only makes sense if using netlink
// NBD connection timeout in seconds
optional uint32 NbdConnectionTimeout = 113;

// Endpoint Proxy unix socket path. Triggers proxy device factory usage
Expand Down
5 changes: 4 additions & 1 deletion cloud/blockstore/libs/daemon/common/bootstrap.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -552,10 +552,13 @@ void TBootstrapBase::Init()
EndpointProxyClient);
}

// The only case we want kernel to retry requests is when the socket is dead
// due to nbd server restart. And since we can't configure ioctl device to
// use a new socket, request timeout effectively becomes connection timeout
if (!nbdDeviceFactory) {
nbdDeviceFactory = NBD::CreateDeviceFactory(
Logging,
TDuration::Days(1)); // timeout
Configs->ServerConfig->GetNbdConnectionTimeout()); // timeout
}

EndpointManager = CreateEndpointManager(
Expand Down
15 changes: 7 additions & 8 deletions cloud/blockstore/libs/endpoint_proxy/server/server.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -575,8 +575,6 @@ struct TServer: IEndpointProxyServer
Scheduler,
ep.RequestStats,
volumeStats);

ep.Client->Start();
STORAGE_INFO(request.ShortDebugString().Quote()
<< " - Started DurableClient");

Expand Down Expand Up @@ -635,18 +633,19 @@ struct TServer: IEndpointProxyServer
TDuration::Days(1), // connection timeout
true); // reconfigure device if exists
} else {
// For netlink devices we have to balance request timeout between
// time it takes to fail request for good and resend it if socket
// is dead due to proxy restart. We can't configure ioctl device
// to use a fresh socket, so there is no point configuring it
// The only case we want kernel to retry requests is when the socket
// is dead due to nbd server restart. And since we can't configure
// ioctl device to use a new socket, request timeout effectively
// becomes connection timeout
ep.NbdDevice = NBD::CreateDevice(
Logging,
*ep.ListenAddress,
request.GetNbdDevice(),
TDuration::Days(1)); // request timeout
TDuration::Days(1)); // timeout
}

auto status = ep.NbdDevice->Start().ExtractValue();
auto future = ep.NbdDevice->Start();
const auto& status = future.GetValue();
if (HasError(status)) {
STORAGE_ERROR(request.ShortDebugString().Quote()
<< " - Unable to start nbd device: "
Expand Down
Loading

0 comments on commit b7ecfb4

Please sign in to comment.