Skip to content
This repository has been archived by the owner on Dec 3, 2021. It is now read-only.

gNMIc Lesson #339

Open
Mierdin opened this issue Jul 17, 2020 · 23 comments
Open

gNMIc Lesson #339

Mierdin opened this issue Jul 17, 2020 · 23 comments

Comments

@Mierdin
Copy link
Member

Mierdin commented Jul 17, 2020

Very cool new project called gNMIc, which offers a CLI for gNMI.

https://gnmic.kmrd.dev/

An NRE Labs lesson on this seems very feasible, and IMO @hellt should have right of first refusal for this.

@hellt any ideas for a simple topology that would be effective in helping to illustrate the capabilities and help folks get up to speed on the tool? Also, ideas on general topic areas that might go into an effective lesson outline?

@hellt
Copy link
Contributor

hellt commented Jul 17, 2020

Thanks @Mierdin
thats a nice venue to try out gnmic, indeed.
At a bare minimum a single network element would do, to explore the multi-target capabilities we should have 2 nodes. For a multi-vendor setting we might want to introduce multiple vendors.

As to the topology, the nodes can be completely isolated on the dataplane, as the networking aspects are not relevant for the gNMI protocol operations. The only common connectivity which is needed is the management network.

@Mierdin
Copy link
Member Author

Mierdin commented Jul 17, 2020

Agreed. We have both Junos and Cumulus currently, but I don't believe Cumulus supports GNMI out of the box unless we load up some kind of server there ourselves.

We are also of course always on the lookout for new images and I'd be happy to help with that if that's something you're interested in contributing.

@hellt
Copy link
Contributor

hellt commented Jul 20, 2020

then its fine to just have two vMXes connected to each other with a single interface to allow some pings to flow between them

How do I contribute this lesson, I am quite new to nre.labs so will gladly take any ref points.

@Mierdin
Copy link
Member Author

Mierdin commented Jul 21, 2020

Currently the only Junos flavor is vQFX, but I have been working on cRPD support and I'm hoping to have that available within the next few weeks. For this, I think cRPD would be a much better bet for what we're trying to do here.

Since you're new, I'd definitely start here. Some of that might be a bit boring, since you clearly know how to "github" but there's also some stuff specific to NRE Labs you might find useful.

I think a good first step is to build an endpoint image that has gnmic installed. If you want to take a crack at this for your first PR, feel free. Since its written in Go, the best bet is likely to do a multi-stage build of some kind so we can compile from source first, and then bring the binaries over to a simpler image. In case you haven't done this before, I do this to build antidote itself (which powers NRE Labs) if you are interested in a working example.

From there, we just need to run sshd so users can connect to a working terminal. You can take a look at our utility image for inspiration, but I would recommend just borrowing all the auth and ssh config stuff only, since we probably don't want/need all of the Python stuff from that image.

Let's see if we can tackle that first, and then hopefully once that's done, I'll be done with the work I described here and we can figure out content.

@hellt
Copy link
Contributor

hellt commented Jul 24, 2020

Thanks, I've read through the most of the getting started guides and I wonder if its really needed to create an endpoint image for gnmic.

What if I leverage a gen purpose utility image and will also teach learners how to leverage gnmic installer to download the latest and/or a specific version of it? I think that is useful as well, since that part is needed have they decided to install gnmic outside of the nre.labs environment.

@Mierdin
Copy link
Member Author

Mierdin commented Jul 25, 2020

As a policy, the platform doesn't allow connections outside of the lesson environment, which is why the documentation is oriented around everything being self-contained.

That said, if you would prefer the short route to doing this, you should be able to construct a simple Dockerfile that uses antidotelabs/utility as the base, and take whatever steps are needed to install gnmic, and you should be good to go.

@hellt
Copy link
Contributor

hellt commented Jul 25, 2020

ok, that fact escaped me.
Please find the endpoint image PR here #342

@hellt
Copy link
Contributor

hellt commented Jul 26, 2020

@Mierdin I saw a message that you will have a proper vacation soon, if there are any steps that I can preemptively take to create gnmic lesson before you go - you can count on me

@Mierdin
Copy link
Member Author

Mierdin commented Jul 27, 2020

Thanks for mentioning that - we won't be doing a new full release before I go, so don't worry too much about trying to cram this in the next few weeks. I've been working on new infra to be able to support the new images we'll need for lessons like this, but it's too much to try to get done right before I go away for a while, so I'd rather play it safe and get as close to the finish line as I can before I go, but wait until I get back to actually cross it 😄

That said, I'd like to make sure folks like you are able to move forward in my absence. The preview service is currently running on the "old" cluster that's currently powering the main nrelabs.io site. In order to let you use it to preview your content in a PR, I'd need to get it running on the new cluster. I'm also spending today hunting down some pointers on gNMI with cRPD (it's a pretty new feature) to ensure it's a good target for this lesson. If we run into issues, I don't think adding a vMX image would be a problem, just would be a little extra work, so I'd like to try cRPD first and see if we can get away with that.

Regardless of all that, you are welcome to, at any time, open a PR for the new lesson content. You can use the antidote CLI tool to generate a skeleton lesson, or if you wish, you can use this lesson meta file I put together for some basic testing:

---
name: Telemetry At Your Fingertips with gNMIc
slug: gnmic-telemetry
category: tools
diagram: ""
video: ""
tier: prod
description: In this lesson, we'll explore the use of a tool called gNMIc to make sense of gNMI-based operations at the command-line.
shortDescription: gnmic
tags:
- telemetry

endpoints:

- name: junos1
  image: crpd
  additionalPorts: [51051]
  presentations:
  - name: cli
    port: 22
    type: ssh

- name: gnmic
  image: gnmic
  presentations:
  - name: cli
    port: 22
    type: ssh

stages:
- description: First Steps
  guideType: markdown
  stageVideo: ""

authors:
- name: Roman Dodin
  link: https://github.com/hellt

If you go that route, you'll want to ensure you still run the antidote validate <curriculum directory> command to make sure everything's valid. You're welcome to start this PR any time, but no guarantees on if the preview service will be meaningful to you until I swing it over and ensure it works on the new cluster. Provided I am able to get to it (hoping so), I will give you a heads up once that work is done. Until then you can keep pushing content to your PR the way you think it should work, and we can address any problems once previews are functioning on the new cluster.

@Mierdin
Copy link
Member Author

Mierdin commented Jul 30, 2020

FYI as I posted in https://discuss.nrelabs.io/t/new-kata-cluster-is-live-seeking-feedback/287/3, the preview service is now running on the new cluster and validated this with a quick temporary test using #346

That said, I've only just started tinkering around with gNMI on the cRPD image (as mentioned it's really new) so not sure how much work is left to do there. I've confirmed the image version supports it, so I'm fairly confident it's a configuration issue (which can be provided as part of the lesson using the regular stage configuration methods). If I am able to get more info I'll post here.

@hellt
Copy link
Contributor

hellt commented Jul 31, 2020

Thanks!
I guess if I create a PR with the lesson skeleton you pasted above I would be able to get it running on a testing cluster with some connectivity between the gnmic and crpd?

@Mierdin
Copy link
Member Author

Mierdin commented Aug 2, 2020

Yes, though you should use #346 as reference instead, there are a few other things that need to be done beyond the lesson metadata file - but that is the bulk of it.

@hellt
Copy link
Contributor

hellt commented Aug 2, 2020 via email

@Mierdin
Copy link
Member Author

Mierdin commented Mar 23, 2021

@hellt Just wanted to drop a quick update. I've been working on enabling builds for endpoint images within the CI pipeline, and believe I am ready for someone else to test it. This makes it so that you don't have to contribute an image first, and then the content separately, which is a silly constraint I've wanted to solve for a while, and finally got around to it. You should just need to open a PR with both the image and lesson changes needed, and the preview system will take it from there.

If you still have the time/interest, I think a gNMIc lesson is a great candidate for this. I also hunted down the configuration needed for cRPD to support gNMI. You'll want to modify the additionalPorts field to use port 50051, and then cRPD will need the following stanzas added:

set system services extension-service request-response grpc clear-text port 50051
set system services extension-service request-response grpc skip-authentication

The ability to auto-build these images is really recently added, so there will probably be wrinkles to iron out but I'm happy to help you through it if you're willing to be the guinea pig :) I haven't even announced it formally or documented it properly yet, but wanted to see if you'd be willing to put it through its paces first.

@hellt
Copy link
Contributor

hellt commented Mar 23, 2021

Hey @Mierdin
yes, I think it will be possible to make a nice lesson out of it
I would like to take a pause till first weeks of April, since I might have by that date another containerized open NOS to introduce to that lesson.

I think that pluralism in NOS selection will make it even more educating

@Mierdin
Copy link
Member Author

Mierdin commented Mar 24, 2021

@hellt No worries, sounds great! Totally on board with adding a new containerized NOS. Let me know if there are any base images/disks that need to be kept private, like we've done with cRPD; we'll add those to the private GCP storage bucket that our build pipeline has access to.

@hellt
Copy link
Contributor

hellt commented Aug 15, 2021

Hi @Mierdin
It's been taking us longer that I'd expected, but finally it's getting all together.

In continuation of our multivendor gnmi lesson, where do I start to make SR Linux containerized NOS a citizen of nrelabs?

@Mierdin
Copy link
Member Author

Mierdin commented Aug 16, 2021

Woot! This makes me happy. And more good news is that since SR Linux is openly available, this makes the process that much easier. The contribution process is much the same as I mentioned further up. You'll want to start here: https://docs.nrelabs.io/creating-contributing/getting-started but in summary, here are the steps:

  1. Clone this repo and use the antidote tool to bootstrap a new lesson - this is just a skeleton so you'll need to add configs/content/etc but it's a good starting point so you can start seeing your previews in the PR you'll open. Feel free to stay minimal for now - this can always get re-done, and I think it would be more useful to make sure the sr linux image works well in NRE Labs first before spending a lot of time on lesson content, etc. So, a simple lesson which has a single SSH presentation to an SR linux endpoint with a single stage, and a mostly blank lesson guide should be fine.
  2. Add an image to the images/ directory for the new sr linux image. This will involve creating a new Dockerfile with some sensible configurations
  3. Commit your changes in a branch and open a PR. This will kick off some GH actions workflows that build your new image and start a temporary instance of NRE Labs you can use to preview what you have thus far.

Once you're able to do this, I should be able to guide you further. And if you have any questions at all, don't hesitate to ask.

@hellt
Copy link
Contributor

hellt commented Aug 17, 2021 via email

@Mierdin
Copy link
Member Author

Mierdin commented Aug 17, 2021

We use multus, which by default uses netX naming scheme. Looking at latest multus docs, it appears this became configurable at some point, which is good news but a) not sure if we're running a version that lets us do this and b) there would have to be platform modifications to expose this option and also to facilitate a multus upgrade if needed - they tend to break things between even minor versions. There is a networkInterfaces field in the image metadata file that I have intended to use for this purpose (currently unused) so I'm generally on board with the change if this is needed; would just take some time.

On the other hand, the image flavor untrusted runs a container in a Kata VM, which should give you full reign to rename interfaces as needed, so if this is possible for you to inject a script before the entrypoint (which other endpoints already do anyways, including crpd) that might be at least a quicker way to go.

Let me know what you think - my suggestion is for you to look into figuring how how hard it would be to make the image compatible with the existing paradigm of eth0, net0, net1, net2, etc, while I look into the scope of changes needed to make this more flexible.

@Mierdin
Copy link
Member Author

Mierdin commented Aug 18, 2021

@hellt Good news is that we're running a version of Multus that allows me to specify the interface name - just did a quick pod test on our cluster and it works great. Working on a patch to antidote-core now to finally make use of the networkInterfaces field in the image definition to expose this.

Quick question - is it still okay that eth0 is the first interface?

@hellt
Copy link
Contributor

hellt commented Aug 18, 2021 via email

@Mierdin
Copy link
Member Author

Mierdin commented Sep 16, 2021

Okay, the antidote-core PR is merged and I loaded that code into the preview system so it should be ready to use there. Let me know if you run into any issues.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants