Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

APIs to support metadata object creation and querying #2049

Draft
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

npalaska
Copy link
Member

@npalaska npalaska commented Dec 17, 2020

Address the issue #2048

API support for creating or querying User Metadata objects.
According to the discussion in #2048 we have decided to implement the dashboard graphql interface using the REST APIs. All the API endpoints are either protected with an optional or strict user authentication. E.g. A user need to be logged-in in order to create the metadata object, public metadata object can not be created. However, at any time, user can publish the their metadata objects to make them public. Public metadata objects are read-only, therefore a non logged-in user can get a list of public metadata objects but can not modify or delete it.

This PR handles the basic API support such as Metadata object creation (POST), Metadata session query (GET, DELETE, PUT) with optional Pbench Authorization header included as described below.

Client can send any text blob as metadata with a user defined key in the json request when creating a metadata object. Dashboard will assign some key such as "saved metadata" (or whatever we want to call it) and eventually "favorites" to persist the currently in-browser favorites metadata objects list. However, key can be anything and not just restricted to saved/favorites.

  1. Create metadata:
    • POST /v1/metadata
    • Handles Pbench Metadata object creation via JSON request.
    • Authorization is required to create metadata objects. Non logged-in users can not create metadata objects.
    • Request Payload:
    headers={Authorization:   Bearer <Pbench authentication token (user received upon login)>}
    json={
        "key": "<user defined key for the metadata object>"
        "value": "<Any user defined text blob>" E.g.'{"config": "<config>", "description": "<description>"}' but doesn't have to be a stringified json.
       }
    
  2. Get Metadatas:
    • GET /v1/metadata/<metadata_key>
    • Returns all the Metadata objects of a specified metadata key associated with a logged in user and still in metadata table.
    • If authorization is not provided public metadata objects of a specified key are returned.
    • The key need to be a part of the URI.
    • Request Payload:
    headers={Authorization:   Bearer <Pbench authentication token (user received upon login)>} # optional
    
  3. Query single Metadata object:
    • GET /v1/metadata/<metadata_id>
    • Need metadata object id in the URI.
    • Returns Metadata object as a json that is associated with a requested metadata id.
    • If authorization is not provided only public metadata objects can be queried.
    • Request Payload:
    headers={Authorization:   Bearer <Pbench authentication token (user received upon login)>} # optional
    
  4. Update single metadata object:
    • PUT /v1/metadata
    • Need metadata object id in the URI.
    • Updates the Metadata object with a requested parameters in json payload.
    • Update is restricted on the protected fields such as created or metadata id
    • Update is also not allowed on public metadata objects except the admin users.
    • Returns the Metadata object as a json with updated values.
    • Request Payload:
    headers={Authorization:   Bearer <Pbench authentication token (user received upon login)>}
    example json={
        "value":  "<New text blob>"
        ...
       }
    
  5. Delete single Metadata object:
    • DELETE /v1/metadata/<metadata_id>
    • Need metadata object id in the URI.
    • Deletes the Metadata object associated with a requested metadata id.
    • Authorization header is required to delete the metadata objects, a user can not delete the public metadata except admin users.
    • Request Payload:
    headers={Authorization:   Bearer <Pbench authentication token (user received upon login)>}
    
  6. Publish metadata object:
    • POST /v1/metadata/<metadata_id>/publish
    • Need metadata object id in the URI.
    • Publish user's metadata object of given id for public access. Once published the action can not be reverted.
    • Authorization header is required to publish the metadata objects.
    • Request Payload:
    headers={Authorization:   Bearer <Pbench authentication token (user received upon login)>}
    
  • Defines a graphql mutation createMetadata that mimic the current front-end createUrl mutation.
  • Add two simple queries getMetadataById and getMetadataByUserid
  • Defines the graphql schema on the mutation and queries
  • Expose the /user/metadata endpoint where a post request does not accept the graphql query but a simple JSON request
  • Endpoint parses the JSON request to make a valid graphql query to execute against the schema and returns the flask response back to client

@gurbirkalsi gurbirkalsi self-requested a review January 14, 2021 20:47
lib/pbench/server/api/resources/graphql_api.py Outdated Show resolved Hide resolved
lib/pbench/server/api/resources/graphql_schema.py Outdated Show resolved Hide resolved
lib/pbench/server/api/resources/models.py Outdated Show resolved Hide resolved
@portante portante changed the base branch from master to main January 28, 2021 22:05
@npalaska npalaska force-pushed the graphql-model branch 2 times, most recently from 371962b to ba71d77 Compare March 9, 2021 18:11
@npalaska npalaska changed the title First pass at implementing graphql schemas in server APIs to support Metadata creation and querying Mar 9, 2021
@npalaska npalaska changed the title APIs to support Metadata creation and querying APIs to support Session metadata creation and querying Mar 9, 2021
@npalaska npalaska marked this pull request as ready for review March 11, 2021 05:51
@webbnh
Copy link
Member

webbnh commented Mar 11, 2021

Nikhil, I'm coming late to this game, so my apologies if I'm poking at settled issues, but I have a few questions/comments on the description of this PR before I look at the code.

  • The descriptions of the two POST requests don't indicate that they return anything. Shouldn't they be returning the ID of the session just created?
  • The DELETE request requires an authentication token. Does this mean that there is no way to delete public/publicly-created metadata objects? Is that a problem, or do they age out and get automatically removed somehow?
  • The PUT request seems to have the same issue. (And, there's a spurious example in the request example.)

And, just for my education, why are there separate URIs for creating and listing "favorite Metadata" and "saved Metadata", when there are common URIs for getting, updating, and deleting them? That is, it seems like "favorite" objects are just a particular flavor of Metadata objects...should the server care about the distinction (or can that be dealt with on the client side), and, if so, should the distinction be made through an embedded field or a request parameter instead of offering a separate URI for them?

@dbutenhof
Copy link
Member

Address the issue #2048

API support for creating or querying Metadata or dashboard saved sessions.

I think we should get away from the term "sessions" here, which I don't think really makes sense in the generalized context. Also, "Metadata or dashboard saved sessions"?? Saved sessions are one kind of metadata we're supporting in this API.

We're saving "values" or "documents" associated with the metadata type (or "key").

Client defined custom metadatas fields such as configs/description are saved as a text blobs which need to be sent with post request as a json string. Metadata are either favorite or just saved depending on which url we use when creating a Metadata as follows:

You seem to have settled on having the client provide/retrieve JSON rather than making them render a string, which -- without having looked at the code yet -- I resume means you've used SQLAlchemy's JSON column support. I'd already checked that it's apparently supported for both Postgresql and sqlite3, so that should be good.

In this case, I don't think "text blobs" is a good description; we're apparently working with JSON documents, so we can just say that.

I also hadn't anticipated locking the metadata keys to just "saved" and "favorites" ... it seems that a client ought to be able to add additional metadata keys without requiring server changes. Then again, with Flask URI templates, this might be exactly what you're doing?

Do we really want to shorten the concept of "saved session" to just "saved"?? It sounds very generic! (Of course, if we're handling opaque JSON documents, the server implementation actually is very generic, and if you're really taking the "metadata key" from a URI template rather than hardcoding "saved" and "favorites", then we can change the names at any time independent of this PR. (And that's what I'd prefer: we just have "key" and "value", without any server interpretation.)

  1. Create favorite metadata:
    Handles Pbench favorite Metadata session creation via JSON request
    POST /v1/metadata/favorite
    headers={Authorization:   Bearer <Pbench authentication token (user received upon login)>} # optional
    json={
        "value": '{"config": "<config>", "description": "<description>"}'
       }
    

I doubt that "favorite" metadata will have "config"/"description" fields in the JSON, and the way this is written (especially that it's the same in both saved sessions and favorites) suggests that the field values are required and perhaps enforced by the server. I'd rather not imply that. (Although the PR description isn't product documentation, it will become a permanent part of the git log.)

  1. Query single Metadata session:
    Need metadata session id in the URI
    Returns Metadata session object as a json associated with a requested metadata id
    GET /v1/metadata/<metadata_id>
    headers={Authorization:   Bearer <Pbench authentication token (user received upon login)>}
    

Again, "session" feels awkward here. We have "saved sessions" and "favorites". Here we're querying the value of (literally) a metadata table row ID. I don't think that's exactly how we want to advertise it; but "metadata item ID"? "metadata id"? Hmm.

  • Defines a graphql mutation createMetadata that mimic the current front-end createUrl mutation.
  • Add two simple queries getMetadataById and getMetadataByUserid
  • Defines the graphql schema on the mutation and queries
  • Expose the /user/metadata endpoint where a post request does not accept the graphql query but a simple JSON request
  • Endpoint parses the JSON request to make a valid graphql query to execute against the schema and returns the flask response back to client

Didn't we dump the GraphQL schema? This seems extraneous and potentially confusing.

@webbnh
Copy link
Member

webbnh commented Mar 11, 2021

Didn't we dump the GraphQL schema? This seems extraneous and potentially confusing.

I took the "strike-through" to mean that these items had been dumped...but I've started reviewing the code, and at least elements of them seem to be present there. (E.g., what does lib/pbench/server/api/resources/graphql_schema.py line 74 do?)

Copy link
Member

@webbnh webbnh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff here, but I have lots of questions (some small, some large) and the usual handful of nits.

lib/pbench/server/api/resources/graphql_schema.py Outdated Show resolved Hide resolved
lib/pbench/server/api/resources/graphql_schema.py Outdated Show resolved Hide resolved
lib/pbench/server/api/resources/graphql_schema.py Outdated Show resolved Hide resolved
lib/pbench/server/api/resources/metadata_api.py Outdated Show resolved Hide resolved
lib/pbench/server/api/resources/metadata_api.py Outdated Show resolved Hide resolved
"User trying to update fields that are not present in the metadata database. Fields: {}",
non_existent,
)
abort(400, message="Invalid fields in update request payload")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't want to take pity on the requestor and indicate which fields are invalid?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah we can add that

Copy link
Member Author

@npalaska npalaska Mar 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the second thought however, should we let the requester figure out which fields are invalid, this is in accordance with our design that we try not to give too many details to the requester, we just tell them what went wrong

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anything around security, that's true; not so sure we want to extend that to non-security failures. I'd rather be helpful where it makes sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the client can issue this request over and over with different fields, it would be mechanically straightforward for the client developer to figure out what part of the request is bad...but, it would be annoying for him/her. So, really, I see no reason why we shouldn't just indicate the offending field(s) in the message and save all the trouble.

Comment on lines 273 to 313
# Check if the metadata payload contain fields that are either protected or
# not present in the metadata db. If any key in the payload does not match
# with the column name we will abort the update request.
non_existent = set(post_data.keys()).difference(
set(Metadata.__table__.columns.keys())
)
if non_existent:
self.logger.warning(
"User trying to update fields that are not present in the metadata database. Fields: {}",
non_existent,
)
abort(400, message="Invalid fields in update request payload")
protected = set(post_data.keys()).intersection(set(Metadata.get_protected()))
for field in protected:
if getattr(metadata_session, field) != post_data[field]:
self.logger.warning(
"User trying to update the non-updatable fields. {}: {}",
field,
post_data[field],
)
abort(403, message="Invalid update request payload")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like I've seen this code before...I suggest refactoring it into a utility routine, perhaps in some common parent class?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah its a bit redundant

lib/pbench/server/database/models/metadata.py Outdated Show resolved Hide resolved
lib/pbench/server/database/models/metadata.py Outdated Show resolved Hide resolved
lib/pbench/server/database/models/metadata.py Outdated Show resolved Hide resolved
@npalaska
Copy link
Member Author

You seem to have settled on having the client provide/retrieve JSON rather than making them render a string, which -- without having looked at the code yet -- I resume means you've used SQLAlchemy's JSON column support. I'd already checked that it's apparently supported for both Postgresql and sqlite3, so that should be good.

In this case, I don't think "text blobs" is a good description; we're apparently working with JSON documents, so we can just say that.

I also hadn't anticipated locking the metadata keys to just "saved" and "favorites" ... it seems that a client ought to be able to add additional metadata keys without requiring server changes. Then again, with Flask URI templates, this might be exactly what you're doing?

Well right now it actually is a text blob we require value to be a text and that text can be a json string, thats why I gave an example as "value": '{"config": "<config>", "description": "<description>"}' but it can be any string, its upto the client how they want to mange the metadata. I have also looked into SQLAlchemy's JSON column support and we can certainly implement that.

The reason for adding favorite and saved key values to URI is because how the flask manages URI endpoint definition with add_resources. If we want to have a POST endpoint as just /metadata and GET endpoint on metadata/favorite to fetch only favorites then we have to have two separate resource classes to serve them since they are two separate URIs. And now that I think about it I think we should do that even if it means having to create 2 separate resource classes for them because that gives the user flexibility to add any keys of their choice when they POST the metadata value.

I doubt that "favorite" metadata will have "config"/"description" fields in the JSON, and the way this is written (especially that it's the same in both saved sessions and favorites) suggests that the field values are required and perhaps enforced by the server. I'd rather not imply that. (Although the PR description isn't product documentation, it will become a permanent part of the git log.)

Yeah I agree I should fix the wording, the value field can accept any blob of text and it doesnt have to be a json string, its just a client defined text metadata, which client can fetch and parse it.

@npalaska
Copy link
Member Author

  • The descriptions of the two POST requests don't indicate that they return anything. Shouldn't they be returning the ID of the session just created?

Yes I will update the description

  • The DELETE request requires an authentication token. Does this mean that there is no way to delete public/publicly-created metadata objects? Is that a problem, or do they age out and get automatically removed somehow?

Ah, yeah thats an interesting question I think they should age out and get automatically removed rather than allowing everyone to delete the public metadata objects, but a good point we can discuss it.

  • The PUT request seems to have the same issue. (And, there's a spurious example in the request example.)

Yeah should we allow public metadata objects to be editable? I am not sure

And, just for my education, why are there separate URIs for creating and listing "favorite Metadata" and "saved Metadata", when there are common URIs for getting, updating, and deleting them? That is, it seems like "favorite" objects are just a particular flavor of Metadata objects...should the server care about the distinction (or can that be dealt with on the client side), and, if so, should the distinction be made through an embedded field or a request parameter instead of offering a separate URI for them?

2 separate URIs for favorite and saved metadata is because of the way how flask manages the endpoint definition with add_resource. We could have a URI POST /metadata to create metadata with any key and a GET on /metadata/<key string> but then we have to have different resource classes serving these 2 different URI pattern. I can't have one resource class with post and get methods serving 2 different URI patterns. That's why I used POST on /metadata/<key string> to match the URI pattern. However being said that I think we do need POST /metadata to create metadata and GET /metadata/<key string> to get metadata with specified key even if we have to create 2 separate classes for them because its hindering our ability to use flexible key values.

lib/pbench/server/api/__init__.py Show resolved Hide resolved
lib/pbench/server/api/resources/graphql_schema.py Outdated Show resolved Hide resolved
lib/pbench/server/api/resources/metadata_api.py Outdated Show resolved Hide resolved
lib/pbench/server/api/resources/metadata_api.py Outdated Show resolved Hide resolved
lib/pbench/server/api/resources/metadata_api.py Outdated Show resolved Hide resolved
lib/pbench/server/api/resources/metadata_api.py Outdated Show resolved Hide resolved
lib/pbench/server/api/resources/metadata_api.py Outdated Show resolved Hide resolved
lib/pbench/server/api/resources/metadata_api.py Outdated Show resolved Hide resolved
lib/pbench/server/database/models/metadata.py Outdated Show resolved Hide resolved
lib/pbench/server/database/models/metadata.py Outdated Show resolved Hide resolved
@dbutenhof
Copy link
Member

  • The descriptions of the two POST requests don't indicate that they return anything. Shouldn't they be returning the ID of the session just created?

Yes I will update the description

  • The DELETE request requires an authentication token. Does this mean that there is no way to delete public/publicly-created metadata objects? Is that a problem, or do they age out and get automatically removed somehow?

Ah, yeah thats an interesting question I think they should age out and get automatically removed rather than allowing everyone to delete the public metadata objects, but a good point we can discuss it.

Currently a user can click/unclick the favorite button on any dataset, at any time. There's no ownership, so it doesn't matter if it belongs to "someone else". Again, this behavior is really a quirk of our evolutionary cycle and won't be possible once we're done because there won't be any unowned datasets. I suppose that doesn't necessarily mean that an un-logged-in user won't be able to favorite any published datasets, though; which would still have to be shared by all un-logged-in viewers.

Anyway, yes, absolutely anything owned by "public" (or the "null user", or just "owned context") has no protections, and can be created, modified, or deleted at whim by anyone else who's not logged in. It's a stupid job, but it's our job, and we're going to do it.

So, yeah, delete needs to work on unowned metadata without authorization. Whether for saved views or for favorites.

@npalaska
Copy link
Member Author

so @dbutenhof yes I should remove all the session references that I am using but do you think its fine to keep the key as a part of the URI because I feel its not flexible enough. You have to hit the certain URI consisting of metadata_key value to create a metadata of that key type. For example to create favorite metadata object we would hit post /metadata/favorite we could certainly have an endpoint /metadata/<key string> to create metadata of any key type but I am not sure if thats a good practice. We could achieve the same result by accepting a json key metadata_key with post request serving on /metadata

Also if we are to allow any key type, we probably can't use the MetadataKeys enum (enum will lock the list of keys we can accept). In a way thats good rather than flooding our database with all sorts of user defined keys but I am confused.

Another question is on the GET request for /metadata/<key string> should we be accepting any key and return 200 with the metadata if we find the data with that key or return 200 without any metadata if we don't find any data?

@dbutenhof
Copy link
Member

so @dbutenhof yes I should remove all the session references that I am using but do you think its fine to keep the key as a part of the URI because I feel its not flexible enough. You have to hit the certain URI consisting of metadata_key value to create a metadata of that key type. For example to create favorite metadata object we would hit post /metadata/favorite we could certainly have an endpoint /metadata/<key string> to create metadata of any key type but I am not sure if thats a good practice. We could achieve the same result by accepting a json key metadata_key with post request serving on /metadata

Also if we are to allow any key type, we probably can't use the MetadataKeys enum (enum will lock the list of keys we can accept). In a way thats good rather than flooding our database with all sorts of user defined keys but I am confused.

Another question is on the GET request for /metadata/<key string> should we be accepting any key and return 200 with the metadata if we find the data with that key or return 200 without any metadata if we don't find any data?

Yes, the Enum implies a limited set of keys, but that's what you've already implemented. My vision was that we simply support an open set of key=value pairs tied to the user (or anti-user). That means we just use the key string they passed in (probably lowercased) and there's no way to enum-ify or digit-ify it. If you'd like to change to that model, I'd be thrilled. You chose the ENUM, and I'm just saying if we go that way I'd rather use the ENUM column type and have a readable string in the database rather than a numeric equivalent.

And while I think the idea of /metadata/<key> endpoint hierarchy is cute and expressive, we need a JSON body anyway for the value; and having having POST/PUT take {"key": "<key>", "value": "<value>"} JSON is just fine as well.

@dbutenhof
Copy link
Member

Another question is on the GET request for /metadata/<key string> should we be accepting any key and return 200 with the metadata if we find the data with that key or return 200 without any metadata if we don't find any data?

Missed this one. Yeah; if you ask for all "foo"s and there is no "foo", is that success or failure? I think that we're basically returning

[
    {"key": "value1"},
    {"key": "value2"}
]

or perhaps just

{
    "key": [
        "value1",
        "value2"
    ]
}

or even (since they asked for a specific key) just

[
    "value1",
    "value2"
]

And in the latter case, having no "key" values for the current user/anti-user might just return 200 with

[]

and be done. The theory being that asking for the values for "mykeythathasnovalues" is an empty list.

But another option would be to handle a query for a key that has no values as HTTP 204 (no content). Successful, but you get zip. The empty array might be easier for the client to handle, though.

@npalaska
Copy link
Member Author

Anyway, yes, absolutely anything owned by "public" (or the "null user", or just "owned context") has no protections, and can be created, modified, or deleted at whim by anyone else who's not logged in. It's a stupid job, but it's our job, and we're going to do it.

So, yeah, delete needs to work on unowned metadata without authorization. Whether for saved views or for favorites.

Hmm, it feels a little odd though, I mean anyone can write a web crawler and keep deleting the public datasets and we would have no idea whats happening. Either we can strictly force user to login to create and use metadata objects or at least provide a basic protection against unwarranted deletes on public metadatas that's there for the legacy purpose, isn't?

Comment on lines +396 to +399
Post request for publishing the metadata object for public access.
This requires a Pbench auth token in the header field. Only authorized users can make their metadata public.
Right now once the metadata object is made public it can not be reverted back to the private entity
TODO: Is it desirable to be able to revert the object back as a private entity
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't quite what I thought we were talking about. The idea wasn't to make a user's metadata accessible without authorization, but rather to have a mechanism to build a URI that would essentially embed the metadata directly without going to the DB.

For example, something like http:/server.example.com/dashboard/sharedview?token=<read-only-token>&config=<encodedsavedviewconfigstring>

Now at least part of this could be a "metadata" server API to package it up, if we want. I think the dashboard would have a "share" button for each current page, and would generate a URI including the information necessary to re-open that view for anyone with the URI, including the React context and document IDs being displayed. There might also be a "share" link on the Sessions view to package up the saved view and create a URI the same way.

If we generate the link in the server a non-UI client could use it, too, but I think we'll still need dashboard integration.

On the other hand... this would additionally enable a "publish view" that's somewhat similar to publishing a dataset, and maybe that's not entirely a bad thing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we didn't discuss about it, this was a prototype idea that I thought would be good to have. As you correctly pointed out it enables user to publish a certain view if they want to and that view would be accessible to all the logged-in and non logged-in users. If users are comfortable with publishing their dataset, they may want to publish their metadata view as well for others to access (and even for themselves without logging in next time).

I am not sure if this would add much of a value for the user experience, but an option we could give. However, I am certainly open for any suggestions.

class UserMetadata(Database.Base):
""" Metadata Model for storing user metadata details """

# TODO: Think about the better name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like a perfectly reasonable table name to me.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @npalaska, this might be a silly question but what do we mean by user metadata here, in context with the dashboard? I believe that the user (via dashboard) has to send some data to the REST API but do not know what it is

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aquibbaig, I believe the intention is to provide Server-side storage for any user state which should persist between login sessions, anything which the Dashboard might want to store on behalf of the user, such as favorite datasets or dataset tags or labels, views, window geometries, default languages, or other settings, etc., any context which is associated with a user rather than a dataset.

Storing user-generated favorites/tags/labels with datasets doesn't work well if the user doesn't own the dataset or if different users want to mark the same dataset differently. Users might want to "favorite" a dataset published by someone else. Since the notion of "favorite datasets" directs what the user sees, it's an attribute of the user, not the dataset. The metadata gives the Dashboard a place to store this stuff so that it can be fetched each time the user logs in.

The "metadata" API is supposed to be very generic, to give it maximum flexibility for supporting the Dashboard and any other client uses that we haven't imagined yet.

updated = Column(DateTime, nullable=False, default=datetime.datetime.now())
value = Column(Text, unique=False, nullable=False)
key = Column(String(128), nullable=False)
user_id = Column(Integer, ForeignKey("users.id", ondelete="CASCADE"), nullable=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One problem with this "publish by setting user_id == None" is that once it's been published there's no way anyone can delete it or update it.

If we really want to do this (instead of or in addition to the generated "share view" URI), I think you should add an "access" like what I did in the ES schema; and publishing doesn't set the owner to None but just changes "access" from private to public. That way the original owner of the published item can still change it or eventually delete it. (And we can also keep track of who created it.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And, "access" could evolve into an ACL.

"""
Post request for creating metadata instance for a user.

The url requires a metadata session key such as /metadata/<key>
the only keys acceptable to create the metadata sessions are favourite/saved
The url requires a metadata object key such as /metadata/<key>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be no longer true, since you removed the post argument and the template string from the URI.

"Metadata {}: Logged in user_id {} is different than the one provided in the URI {}",
action,
current_user_id,
"Metadata user verification: Logged in user_id {} is different than the one provided in the URI {}",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

User isn't provided "in the URI". You're comparing the user_id in the metadata object against the token in the Authorization header.

except Exception:
self.logger.exception(
"Exception occurred in the GET request while querying the Metadata model, id: {}",
id,
)
abort(500, message="INTERNAL ERROR")

# Verify if the metadata session id in the url belongs to the logged in user
self.verify_metadata(metadata_session, "get")
# Verify if the metadata object id in the url belongs to the logged in user
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, "url" isn't really right.

"User trying to update fields that are not present in the metadata database. Fields: {}",
non_existent,
)
abort(400, message="Invalid fields in update request payload")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anything around security, that's true; not so sure we want to extend that to non-security failures. I'd rather be helpful where it makes sense.

Copy link
Member

@portante portante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we consider this comment first before proceeding?

@npalaska
Copy link
Member Author

Can we consider this comment first before proceeding?

@portante The session module from the dashboard is same as the metadata object module (sqlalchemy table) defined here. This table has a value column that can store a text blob and this blog can be anything dashboard/client wants it to be along with a user defined key to segregate different types of metadata objects. The reason for using a sql table is to persist the metadata. These metadatas can be saved/favorite view/sessions that user wants to save so that they can get consistent view when they login everytime. You seem to be suggesting that we use the cashing mechanism instead of persistent storage. I am not sure how and why the metadata objects should be short lived. A user would want to persist their choice of favorite/saved views to get a consistent dashboard experience isn't it?

Earlier the dashboard was using the Graphql interface to create/query the metadata objects (session module from dashboard). However the only difference is we decided to implement it using the REST protocol keeping everything else same.

@portante
Copy link
Member

Can we consider this comment first before proceeding?

@portante The session module from the dashboard is same as the metadata object module (sqlalchemy table) defined here.

Can we move this higher-level discussion to the issue #2048 instead of in this PR? I'll be answering the above comment there.

resource_class_args=(config, logger, token_auth),
)
api.add_resource(
QueryMetadata,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering what's the difference between "GetMetadata" and "QueryMetadata"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GetMetadata returns all metadata values for the authorized user, while QueryMetadata specifies a metadata id and returns just that value.

Having two classes somewhat obscures the relationship, but if you look at the URI registration in __init__.py you'll see that they're both GET /metadata with effectively an optional id ... that is, GetMetadata maps to just GET /metadata while QueryMetadata maps to GET /metadata/<id>. I think Nikhil looked into combining those and found that Flask didn't provide a straightforward way to make the <id> parameter optional; the pattern wouldn't match if it was omitted, so he had to add the second class to make it work either way.

@Uchedanieln
Copy link

Optional

@dbutenhof
Copy link
Member

Optional

I'm not sure how to interpret this comment, but I'll add a little commentary...

This PR is very old and will likely never be finished: we're moving to Keycloak for user authentication and I expect that "user metadata" (along with roles and groups) will be maintained in our Keycloak broker instance rather than building our own SQL tables locally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Of and relating to application programming interfaces to services and functions enhancement Server
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

Implement metadata support in server (to replace Graphql)
7 participants