Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query Always Returns undefined (Summary references hallucinated data that does not match Dashboard) #14

Open
jasonstinnett opened this issue May 7, 2024 · 7 comments

Comments

@jasonstinnett
Copy link

I'm facing an issue where the correct values for Query Title, Query Note, and Query Filters are all being successfully passed to Gemini. However, Query is always undefined.

On the front end, this the extension displays a summary for each tile on the dashboard. However the data mentioned in the summary & next steps is always made up and does not match the actual data on the dashboard.

Here's what I've tested to date:

  • Expanded the logging in the websocket-service to confirm which data was being successfully passed to the service & Gemini
  • Redeploying the latest main branch code as of 5/6/2024. (I originally deployed the repo in mid-March)
  • Tested adding the extension to several different UDDs:
    • Tried copies of production dashboards from different projects
    • Tried dashboards built on top of more and less complicated explores
    • Tried creating a dashboard with a single graph and a single filter, using a very simple explore
  • Tried disabling the 'Disallow Numeric Query IDs (Recommended by Google Cloud)' Legacy Feature
  • Reviewed the README to ensure I did not miss any steps

Are there any other details I can provide to help troubleshoot, or any ideas for other approaches that I can test?

@LukaFontanilla
Copy link
Collaborator

@jasonstinnett, given the latest redeploy I assume your websocket server is using the gemini-1.0-pro-001 model version? If not, I would first recommend instantiating that model.

The second thing I would recommend is clearing localStorage in the browser. The dashboard metadata get's cached, so if the summary extension was added to a report and run. Then new tiles & filters were added and removed, the summary would still pull the cached metadata until it expired or was cleared manually (although that wouldn't explain total hallucinations).

Additionally I might suggest adding some logging to the websocket server, specifically to the run looker query function to ensure first, that the query metadata matches what you'd expect, then logging the data to ensure the csv contains the expected underlying data. I would recommend testing this first on a one tile dashboard that doesn't pull from a merged result query. Reasoning for this, to determine where the issue is stemming from (incorrect configuration of Looker query OR hallucination on LLM side)

@jasonstinnett
Copy link
Author

@LukaFontanilla appreciate the input. Confirming I'm using the gemini-1.0.pro-001 model. I tried clearing storage and that did not resolve the issue.

I do not believe the issue is with Gemini.

Based on my logging, I was thinking the issue originates on the frontend here. It seems like 'undefined' (line 166) is always being passed to the websocket service.

Note that all other data points are being passed correctly - here's an example when I log the context variable in the websocket service from a test dashboard.

Note this test dashboard has a single bar chart tile and a single filter (the extension does not generate anything unless I have a filter present). The test dashboard uses an explore that includes a single BigQuery table and no additional transformations.

{"context":"
            Dashboard Detail:  

            Query Details:  "Query Title: Which Domains are Most Represented? 
 Query Note: This chart shows which domains are most represented in this dataset. 
 Query Fields: sample_web_data.domain,sample_web_data.count 
 Query Data: undefined 
"}

Am I correctly understanding that 'undefined' value is coming from the frontend?

Either way, I believe we've isolated it to an issue with querying Looker.

@LukaFontanilla
Copy link
Collaborator

@jasonstinnett to confirm, from the frontend if you log the queries variables here you receive undefined? If so, can you then log the variables destructured here? If the frontend is passing undefined to the websocket server, then the dashboard tile metadata fetching is the problem. However if the dashboard metadata and tile info is getting passed, then it perhaps is an issue with the Looker API auth or Query invocation on the server side.

@ryan-gould-accenture
Copy link

Hi @jasonstinnett did you ever resolve this? I seem to be running into a similar issue when I am using a looker core instance. I see when I log the metadata that the relevant queries are present but when anything is actually generated it is hallucinated data. I think that this may be related to an Auth issue but curious if you found a fix. Thanks.

@jasonstinnett
Copy link
Author

jasonstinnett commented Jun 24, 2024

@ryan-gould-accenture this has been on my backburner for the past month, but I do have time set aside this week to test some of Luka's most recent advice. I can provide an update either way at the end of the week.

@jasonstinnett
Copy link
Author

@LukaFontanilla Thank you for the frontend line references. In the end it was a simple an auth issue... Incorrect client_id / client_secret in my looker.ini for the websocket service.

I propose refactoring the code so that we do not send the request to Gemini if runLookerQuery raises an exception. It seems like Gemini will not be able to generate a helpful and informed response without that data.

With this approach we avoid the scenario where it appears the system is working, but the LLM is not doing a good job (a response is generated, the first part of the response references actual details from the dashboard, but then the analysis / recommendations are hallucinated). It also saves a small amount of Vertex cost and guides the developer directly to the issue.

I’d love your feedback on this approach and can submit a PR.

Here are some additional findings, in case it helps others in the future:

  • I learned that the backend uses the values of Dashboard Detail, Query Title, Query Note, and Query Fields as-is from the frontend. The only datapoint that the backend retrieves from the API is the Query Data . That explains why, in the context example at the end of my previous message, most of the data points are populated correctly and only Query Data shows ‘undefined’.
  • Once I understood the division of labor it was clear that the problem was on the backend. I initially saw the mixture of correct and missing data and assumed it was a more nuanced issue.
  • In Cloud Logging you’ll see a HTML 404 error page logged as your exception. It seems this is how the Looker API responds to invalid credentials (I did not interpret this as an auth issue but that might be a lack of experience with the Looker API.)
    cloud-logging-404-html

cc @ryan-gould-accenture hopefully those details help you troubleshoot. Looking in Cloud Logging for the 404 html is probably the easiest way to confirm that it's an auth issue (or some other issue in the websocket service).

@LukaFontanilla
Copy link
Collaborator

@jasonstinnett, I'm aligned on this PR. My suggestions for the PR would be to communicate this to the frontend in some way. Something like adding a new websocket event message and emitting a clear error from the server when the Looker API fails (either from auth or something else), having that logged to the browser console would be nice indeed.

Btw, thank you for sharing your findings to aid others. Appreciate it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants