Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[k8s] when open ssl in graph, can not start up #217

Closed
jinyingsunny opened this issue Jul 21, 2023 · 3 comments
Closed

[k8s] when open ssl in graph, can not start up #217

jinyingsunny opened this issue Jul 21, 2023 · 3 comments
Labels
affects/none PR/issue: this bug affects none version. process/fixed Process of bug severity/none Severity of bug type/bug Type: something is unexpected

Comments

@jinyingsunny
Copy link

关键配置:

spec:
  sslCerts:
    serverSecret: "server-s"  # secret引入的证书,证书已创建
    caSecret: "ca-s"  # secret引入的证书,证书已创建
  graphd:
    config:
      enable_graph_ssl: "true"
      enable_http2_routing: "false"
      path_batch_size: "1000"
      stream_timeout_ms: "30000"

以下是graph启动报错:

root@k8s-master:/home/sunny.liu/k8s_file# kubectl -n nebula logs nebula-graphd-0
++ hostname
++ hostname
+ exec /usr/local/nebula/bin/nebula-graphd --flagfile=/usr/local/nebula/etc/nebula-graphd.conf --meta_server_addrs=nebula-metad-0.nebula-metad-headless.nebula.svc.cluster.local:9559 --local_ip=nebula-graphd-0.nebula-graphd-svc.nebula.svc.cluster.local --ws_ip=nebula-graphd-0.nebula-graphd-svc.nebula.svc.cluster.local --daemonize=false
I20230721 03:22:59.189301     1 GraphDaemon.cpp:135] Starting Graph HTTP Service
I20230721 03:22:59.191583    11 WebService.cpp:130] Web service started on HTTP[19669]
I20230721 03:22:59.191615     1 GraphDaemon.cpp:149] Number of networking IO threads: 4
I20230721 03:22:59.191618     1 GraphDaemon.cpp:158] Number of worker threads: 4
I20230721 03:22:59.191628     1 GraphDaemon.cpp:174] Starting black box worker.
I20230721 03:22:59.193837     1 MetaClient.cpp:89] Create meta client to "nebula-metad-0.nebula-metad-headless.nebula.svc.cluster.local":9559
I20230721 03:22:59.193862     1 MetaClient.cpp:90] root path: /usr/local/nebula, data path size: 0
I20230721 03:22:59.220309     1 MetaClient.cpp:4064] Load leader of "nebula-storaged-0.nebula-storaged-headless.nebula.svc.cluster.local":9779 in 1 space
I20230721 03:22:59.220324     1 MetaClient.cpp:4064] Load leader of "nebula-storaged-1.nebula-storaged-headless.nebula.svc.cluster.local":9779 in 1 space
I20230721 03:22:59.220335     1 MetaClient.cpp:4064] Load leader of "nebula-storaged-2.nebula-storaged-headless.nebula.svc.cluster.local":9779 in 1 space
I20230721 03:22:59.220338     1 MetaClient.cpp:4070] Load leader ok
I20230721 03:22:59.221055     1 MetaClient.cpp:180] Register time task for heartbeat!
I20230721 03:22:59.235059     1 GraphSessionManager.cpp:337] Total of 1 sessions are loaded
I20230721 03:22:59.235294     1 MemoryUtils.cpp:171] MemoryTracker set static ratio: 0.8
I20230721 03:22:59.235589     1 Snowflake.cpp:17] WorkerId init success: 1
terminate called after throwing an instance of 'std::runtime_error'
  what():  `certs' is not a regular file or a directory.
*** Aborted at 1689909779 (Unix time, try 'date -d @1689909779') ***
*** Signal 11 (SIGSEGV) (0x0) received by PID 1 (pthread TID 0x7f4dc756b700) (linux TID 43) (code: 128), stack trace: ***
/usr/local/nebula/bin/nebula-graphd(_ZN5folly10symbolizer17getStackTraceSafeEPmm+0x31)[0x260f3c1]
/usr/local/nebula/bin/nebula-graphd(_ZN5folly10symbolizer21SafeStackTracePrinter15printStackTraceEb+0x26)[0x25fb506]
/usr/local/nebula/bin/nebula-graphd[0x25f9497]
/lib64/libpthread.so.0(+0xf62f)[0x7f4dd913062f]
/lib64/libc.so.6(abort+0x297)[0x7f4dd8d8abc7]
/usr/local/nebula/bin/nebula-graphd[0x1226302]
/usr/local/nebula/bin/nebula-graphd(_ZN10__cxxabiv111__terminateEPFvvE+0x5)[0x2b47305]
/usr/local/nebula/bin/nebula-graphd(_ZSt9terminatev+0x10)[0x2b47370]
/usr/local/nebula/bin/nebula-graphd(__cxa_throw+0x43)[0x2b474c3]
/usr/local/nebula/bin/nebula-graphd[0x11c52e0]
/usr/local/nebula/bin/nebula-graphd[0x12ae0f9]
/usr/local/nebula/bin/nebula-graphd[0x2bc4bbf]
/lib64/libpthread.so.0(+0x7ea4)[0x7f4dd9128ea4]
/lib64/libc.so.6(clone+0x6c)[0x7f4dd8e5196c]
(safe mode, symbolizer not available)

配置如下:

root@k8s-master:/home/sunny.liu/k8s_file# kubectl -n nebula get nc nebula -o yaml
apiVersion: apps.nebula-graph.io/v1alpha1
kind: NebulaCluster
metadata:
  creationTimestamp: "2023-07-21T02:59:56Z"
  generation: 3
  name: nebula
  namespace: nebula
  resourceVersion: "1095336"
  uid: bd9bb824-a629-43b9-995d-bed3a824cb5e
spec:
  enablePVReclaim: true
  exporter:
    image: vesoft/nebula-stats-exporter
    maxRequests: 20
    replicas: 1
    version: latest
  graphd:
    config:
      enable_graph_ssl: "true"
      logtostderr: "1"
      redirect_stdout: "false"
      stderrthreshold: "0"
    image: reg.vesoft-inc.com/rc/nebula-graphd-ent
    replicas: 1
    resources:
      limits:
        cpu: "1"
        memory: 1Gi
      requests:
        cpu: 200m
        memory: 500Mi
    version: v3.5.0-sc
  imagePullPolicy: Always
  imagePullSecrets:
  - name: image-nebula-ent-sc-secret
  metad:
    dataVolumeClaim:
      resources:
        requests:
          storage: 2Gi
      storageClassName: local-path
    image: reg.vesoft-inc.com/rc/nebula-metad-ent
    licenseManagerURL: 10.107.98.210:9119
    logVolumeClaim:
      resources:
        requests:
          storage: 1Gi
      storageClassName: local-path
    replicas: 1
    resources:
      limits:
        cpu: "1"
        memory: 1Gi
      requests:
        cpu: 300m
        memory: 500Mi
    version: v3.5.0-sc
  nodeSelector:
    nebula: cloud
  reference:
    name: statefulsets.apps
    version: v1
  schedulerName: default-scheduler
  sslCerts:
    caCert: tls.crt
    caSecret: ca-s
    clientCACert: ca.crt
    clientCert: tls.crt
    clientKey: tls.key
    serverCert: tls.crt
    serverKey: tls.key
    serverSecret: server-s
  storaged:
    dataVolumeClaims:
    - resources:
        requests:
          storage: 2Gi
      storageClassName: local-path
    enableAutoBalance: true
    image: reg.vesoft-inc.com/rc/nebula-storaged-ent
    logVolumeClaim:
      resources:
        requests:
          storage: 1Gi
      storageClassName: local-path
    replicas: 3
    resources:
      limits:
        cpu: "1"
        memory: 1Gi
      requests:
        cpu: 300m
        memory: 500Mi
    version: v3.5.0-sc
  topologySpreadConstraints:
  - topologyKey: kubernetes.io/hostname
    whenUnsatisfiable: ScheduleAnyway
status:
  conditions:
  - lastTransitionTime: "2023-07-21T03:14:00Z"
    lastUpdateTime: "2023-07-21T03:14:00Z"
    message: Workload is in progress
    reason: WorkloadNotUpToDate
    status: "False"
    type: Ready
  graphd:
    phase: Update
    version: v3.5.0-sc
    workload:
      collisionCount: 0
      currentRevision: nebula-graphd-d4d658dd
      observedGeneration: 3
      replicas: 1
      updateRevision: nebula-graphd-574b665b99
      updatedReplicas: 1
  metad:
    phase: Running
    version: v3.5.0-sc
    workload:
      collisionCount: 0
      currentRevision: nebula-metad-766f4bd96f
      observedGeneration: 1
      readyReplicas: 1
      replicas: 1
      updateRevision: nebula-metad-766f4bd96f
      updatedReplicas: 1
  storaged:
    hostsAdded: true
    phase: Running
    version: v3.5.0-sc
    workload:
      collisionCount: 0
      currentRevision: nebula-storaged-6f66bbc87f
      observedGeneration: 1
      readyReplicas: 3
      replicas: 3
      updateRevision: nebula-storaged-6f66bbc87f
      updatedReplicas: 3
  version: 3.5.0-sc-ent

Describe the bug (required)

开启后就失败,怎么改都报错
Your Environments (required)

nebula-ent-sc-rc包
镜像是:
image: reg.vesoft-inc.com/rc/nebula-storaged-ent
version: v3.5.0-sc

@jinyingsunny jinyingsunny added the type/bug Type: something is unexpected label Jul 21, 2023
@github-actions github-actions bot added affects/none PR/issue: this bug affects none version. severity/none Severity of bug labels Jul 21, 2023
@jinyingsunny
Copy link
Author

乔雷补充的日志:

I20230721 04:41:49.974695   130 AsyncSSLSocket.cpp:1272] SSL_accept returned: -1
I20230721 04:41:49.975649   130 AsyncSSLSocket.cpp:1208] AsyncSSLSocket::handleAccept() this=0x7fafcc313700, fd=folly::NetworkSocket(274), state=2, sslState=2, events=2
I20230721 04:41:49.975822   130 AsyncSSLSocket.cpp:1864] AsyncSSLSocket::sslVerifyCallback() this=0x7fafcc313700, fd=folly::NetworkSocket(274), preverifyOk=1
I20230721 04:41:49.975854   130 AsyncSSLSocket.cpp:1864] AsyncSSLSocket::sslVerifyCallback() this=0x7fafcc313700, fd=folly::NetworkSocket(274), preverifyOk=1
I20230721 04:41:49.976163   130 AsyncSSLSocket.cpp:1292] AsyncSSLSocket 0x7fafcc313700: fd folly::NetworkSocket(274) successfully accepted; state=2, sslState=5, events=0
I20230721 04:41:49.976176   130 FizzAcceptorHandshakeHelper.cpp:171] Client did not select a next protocol
I20230721 04:41:49.976621   122 HBProcessor.cpp:36] Receive heartbeat from "nebula-sc-storaged-0.nebula-sc-storaged-headless.default.svc.cluster.local":9779, role = STORAGE
I20230721 04:41:49.976745   122 HBProcessor.cpp:53] Machine "nebula-sc-storaged-0.nebula-sc-storaged-headless.default.svc.cluster.local":9779 is not registered
I20230721 04:41:50.982069   122 HBProcessor.cpp:36] Receive heartbeat from "nebula-sc-storaged-0.nebula-sc-storaged-headless.default.svc.cluster.local":9779, role = STORAGE
I20230721 04:41:50.982165   122 HBProcessor.cpp:53] Machine "nebula-sc-storaged-0.nebula-sc-storaged-headless.default.svc.cluster.local":9779 is not registered
I20230721 04:41:51.615051    52 RaftPart.cpp:2203] [Port: 9560, Space: 0, Part: 0] Send heartbeat
I20230721 04:41:51.986585   122 HBProcessor.cpp:36] Receive heartbeat from "nebula-sc-storaged-0.nebula-sc-storaged-headless.default.svc.cluster.local":9779, role = STORAGE
I20230721 04:41:51.986677   122 HBProcessor.cpp:53] Machine "nebula-sc-storaged-0.nebula-sc-storaged-headless.default.svc.cluster.local":9779 is not registered
I20230721 04:41:52.707355   131 AsyncSSLSocket.cpp:353] actual destruction of AsyncSSLSocket(this=0x7fafcb9d8000, evb=0x7fafcb95f000, fd=folly::NetworkSocket(-1), state=3, sslState=9, events=0)
I20230721 04:41:52.708853   131 FizzAcceptorHandshakeHelper.cpp:125] Fizz handshake error with (peer=[::ffff:10.244.0.0]:36504, local=[::ffff:10.244.1.47]:9559) after 1 ms; 1119 bytes received & 1049 bytes sent: fizz::FizzVerificationException: client certificate failure: certificate verification failed: invalid CA certificate
I20230721 04:41:52.708899   131 Acceptor.cpp:476] Acceptor=0x7fafb4a9c928 onEmpty()
I20230721 04:41:52.989961   122 HBProcessor.cpp:36] Receive heartbeat from "nebula-sc-storaged-0.nebula-sc-storaged-headless.default.svc.cluster.local":9779, role = STORAGE
I20230721 04:41:52.990046   122 HBProcessor.cpp:53] Machine "nebula-sc-storaged-0.nebula-sc-storaged-headless.default.svc.cluster.local":9779 is not registered
I20230721 04:41:53.352020    53 RaftPart.cpp:2203] [Port: 9560, Space: 0, Part: 0] Send heartbeat
I20230721 04:41:55.160094    54 RaftPart.cpp:2203] [Port: 9560, Space: 0, Part: 0] Send heartbeat
I20230721 04:41:57.247464    51 RaftPart.cpp:2203] [Port: 9560, Space: 0, Part: 0] Send heartbeat
I20230721 04:41:57.628710    55 DiskManager.cpp:163] Refresh filesystem info of "/usr/local/nebula/data/meta"
I20230721 04:41:58.841428   132 AsyncSSLSocket.cpp:353] actual destruction of AsyncSSLSocket(this=0x7fafca7d1e00, evb=0x7fafca754000, fd=folly::NetworkSocket(-1), state=3, sslState=9, events=0)
I20230721 04:41:58.843006   132 FizzAcceptorHandshakeHelper.cpp:125] Fizz handshake error with (peer=[::ffff:10.244.0.0]:43626, local=[::ffff:10.244.1.47]:9559) after 1 ms; 1119 bytes received & 1049 bytes sent: fizz::FizzVerificationException: client certificate failure: certificate verification failed: invalid CA certificate

@MegaByte875
Copy link
Contributor

排查后 nebula-go 需要设置 tlsConfig.MaxVersion = tls.VersionTLS12,重新编译后运行 nebula 集群可以成功启动

@MegaByte875
Copy link
Contributor

MegaByte875 commented Jul 28, 2023

#222

@github-actions github-actions bot added the process/fixed Process of bug label Jul 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects/none PR/issue: this bug affects none version. process/fixed Process of bug severity/none Severity of bug type/bug Type: something is unexpected
Projects
None yet
Development

No branches or pull requests

2 participants