Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: generic api to run promql query for metrics #2073

Merged
merged 20 commits into from
Oct 14, 2024

Conversation

adarsh0728
Copy link

@adarsh0728 adarsh0728 commented Sep 19, 2024

The PR aims to provide a generic API to run PromQL to fetch metrics for the UI.

The overall goal is to improve debuggability experience for platform and application engineers.

@adarsh0728 adarsh0728 linked an issue Sep 19, 2024 that may be closed by this pull request
@adarsh0728 adarsh0728 changed the title feat #2058: generic api to run promql query for metrics feat: generic api to run promql query for metrics Sep 19, 2024
Comment on lines 1207 to 1209

func buildTimeSeriesData(ctx context.Context, v1api v1.API, data MetricSpecData, query string, format PrometheusResponseFormat) []any {
var timeSeriesData = make([]any, 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add code comments

ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel()

var timeSeriesData = make([][]float64, 0)
Copy link
Contributor

@veds-g veds-g Sep 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please fix lint errors

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 172 to 173
// Get the time series data for a specific metric.
r.POST("/namespaces/:namespace/pipelines/:pipeline/getMetricData", handler.GetMetricData)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we planning to cover monovertex with the same endpoint?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as of now, planning to have a separate endpoint for monovertex (for same endpoint we may have to add another field in the body and that could make API more cluttered). Any suggestions are welcome

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets update the comment to include for pipeline for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we provide a query generic query service which works for any metrics?

Signed-off-by: adarsh0728 <[email protected]>
Copy link

codecov bot commented Sep 25, 2024

Codecov Report

Attention: Patch coverage is 0.68027% with 146 lines in your changes missing coverage. Please review.

Project coverage is 63.39%. Comparing base (3dbed43) to head (6e91261).
Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
server/apis/v1/promql_builder.go 0.00% 64 Missing ⚠️
server/apis/v1/handler.go 0.00% 47 Missing ⚠️
server/apis/v1/response_metrics.go 0.00% 34 Missing ⚠️
server/routes/routes.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2073      +/-   ##
==========================================
- Coverage   64.25%   63.39%   -0.87%     
==========================================
  Files         324      327       +3     
  Lines       30650    31354     +704     
==========================================
+ Hits        19695    19877     +182     
- Misses       9913    10435     +522     
  Partials     1042     1042              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@adarsh0728 adarsh0728 marked this pull request as ready for review September 25, 2024 13:46
Comment on lines 10 to 12

type MetricMetaData struct {
NumaMetricName string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
type MetricMetaData struct {
NumaMetricName string
type MetricMetadata struct {
NumaMetricName string

Comment on lines 1277 to 1279
client, err := api.NewClient(api.Config{
Address: h.prometheusServerUrl,
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will end up creating a new client for every api invocation. Can we store it inside the handler struct and reuse it?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, will resolve this.

Comment on lines 1285 to 1287
v1api := v1.NewAPI(client)
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. Lets not create a new one for every invocation.

Copy link
Contributor

@yhl25 yhl25 Sep 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also make sure they are safe to be used across go routines.

@veds-g
Copy link
Contributor

veds-g commented Sep 25, 2024

Lets add a flag in config map for server.disable.prometheusChart, similar to server.readonly. Pass it via /sysinfo, so that it can be used in UI to decide when to render charts.

@@ -80,5 +82,7 @@ func NewServerCommand() *cobra.Command {
command.Flags().StringVar(&serverAddr, "server-addr", sharedutil.LookupEnvStringOr("NUMAFLOW_SERVER_ADDRESS", "https://localhost:8443"), "The external address of the Numaflow server.")
command.Flags().StringVar(&corsAllowedOrigins, "cors-allowed-origins", sharedutil.LookupEnvStringOr("NUMAFLOW_SERVER_CORS_ALLOWED_ORIGINS", ""), "The values for allowed cors AllowOrigins header field, separated by comma.")
command.Flags().StringVar(&daemonClientProtocol, "daemon-client-protocol", sharedutil.LookupEnvStringOr("NUMAFLOW_SERVER_DAEMON_CLIENT_PROTOCOL", "grpc"), "The protocol used to connect to the Pipeline daemon service from Numaflow UX server, defaults to 'grpc'.")
command.Flags().StringVar(&prometheusServerUrl, "prometheus-server-url", sharedutil.LookupEnvStringOr("NUMAFLOW_SERVER_PROMETHEUS_URL", ""), "The prometheus server url for querying metrics data.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using args, please mount the configmap as a volume.

@@ -129,6 +129,12 @@ metadata:
name: numaflow-cmd-params-config
---
apiVersion: v1
data: null
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

null?

Comment on lines 172 to 173
// Get the time series data for a specific metric.
r.POST("/namespaces/:namespace/pipelines/:pipeline/getMetricData", handler.GetMetricData)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we provide a query generic query service which works for any metrics?

@adarsh0728 adarsh0728 requested a review from yhl25 October 1, 2024 11:45

type PrometheusClient struct {
// prometheus metric config from yaml
ConfigData []map[string]any
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Abstract to structs.

// prometheus server url in the config
ServerUrl string `yaml:"url"`
// patterns in the config
Patterns []map[string]any `yaml:"patterns"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Abstract to structs.

@@ -165,6 +165,8 @@ func v1Routes(ctx context.Context, r gin.IRouter, dexObj *v1.DexObject, localUse
r.GET("/namespaces/:namespace/mono-vertices/:mono-vertex/metrics", handler.GetMonoVertexMetrics)
// Get the health information of a mono vertex.
r.GET("/namespaces/:namespace/mono-vertices/:mono-vertex/health", handler.GetMonoVertexHealth)
// Get the time series data for a specific metric.
r.POST("/metricData", handler.GetMetricData)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/metrics-proxy

@@ -206,5 +206,6 @@ func CreateAuthRouteMap(baseHref string) authz.RouteMap {
"GET:" + baseHref + "api/v1/namespaces/:namespace/mono-vertices/:mono-vertex/metrics": authz.NewRouteInfo(authz.ObjectMonoVertex, true),
"POST:" + baseHref + "api/v1/namespaces/:namespace/mono-vertices": authz.NewRouteInfo(authz.ObjectMonoVertex, true),
"GET:" + baseHref + "api/v1/namespaces/:namespace/mono-vertices/:mono-vertex/health": authz.NewRouteInfo(authz.ObjectMonoVertex, true),
"POST:" + baseHref + "api/v1/metricData": authz.NewRouteInfo(authz.ObjectAll, true),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kohlisid - please review

config/namespace-install.yaml Show resolved Hide resolved
config/namespace-install.yaml Show resolved Hide resolved
name: numaflow-metric-server-config
data:
config.yaml: |
url: http://my-release-kube-prometheus-prometheus.default.svc.cluster.local:9090
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Give commented out examples.

@whynowy whynowy marked this pull request as draft October 2, 2024 06:07
var fieldsMap = make(map[string]any)

var (
requestBody map[string]any
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall me abstract the input data into a struct? map[string]any means no restriction.

"gopkg.in/yaml.v2"
)

type PrometheusClient struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This struct has no method, maybe there's a better way abstract it?

)

// read prometheus metric config yaml from volume mount path
data, err = os.ReadFile("/etc/numaflow/metrics/config.yaml")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can make this file path into a constant/config instead of hardcoding.

}

// get expression for the pattern
expr := pattern.Expression
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe don't need this, we are checkingpattern.Expressiondirectly above as well, can be passed to subMatch directly.


// find index and pattern from config based on req pattern name
for i, p := range h.prometheusClient.ConfigData {
if p.Name == requestBody["name"] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a case where requestBody["name"] does not exist?

}

// throw error if pattern name not found in the config
if index == -1 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can directly compare with pattern != nil right for the case it is not found.
Index might not be required here, I don't see it getting further in the code. Is that correct?

return fmt.Sprintf("%f", v)
case int64:
return fmt.Sprintf("%d", v)
case map[string]any:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is any allowed? Do we have any specific types that we support?

Comment on lines 158 to 159
rangeVector: 5m
aggregator: sum
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this overridable?

@whynowy
Copy link
Member

whynowy commented Oct 4, 2024

I created a feature branch dbt, instead of attempting merging to main, please merge it to the feature branch. After getting the e2e working (frontend, backend), then create a PR to main.

@adarsh0728 adarsh0728 changed the base branch from main to dbt October 7, 2024 03:13
@vigith vigith marked this pull request as ready for review October 9, 2024 15:41
@adarsh0728
Copy link
Author

Merging this into feature branch. Will create separate PRs for other issues. Once e2e is done, will create a PR into main.

@adarsh0728 adarsh0728 merged commit e56b41b into numaproj:dbt Oct 14, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Backend: define an API to run PromQL query from the Numaflow Server
6 participants