Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PG instrumentation does not parse quoted table names in queries into the 'db.collection.name' attribute #1173

Open
mostfunkyduck opened this issue Sep 17, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@mostfunkyduck
Copy link

Description of the bug
The PG instrumentation library is using a regex here to extract the table name from queries in order to set the db.collection.name trace attribute. This regex does not match quoted table names, for instance SELECT * from "foo" will not return a match.

Share details about your runtime

Operating system details: Linux, Ubuntu 24.04
RUBY_ENGINE: "ruby"
RUBY_VERSION: "3.3.4"
RUBY_DESCRIPTION: "ruby 3.3.4 (2024-07-09 revision be1089c8ec) [x86_64-linux]"

This also happens on various production instances, I can get that information if needed.

Share a simplified reproduction if possible

I've been reproducing this by adding this to the tests for pg on a local fork

diff --git a/instrumentation/pg/test/opentelemetry/instrumentation/pg/instrumentation_test.rb b/instrumentation/pg/test/opentelemetry/instrumentation/pg
/instrumentation_test.rb
index cc50728e..7c1c2590 100644
--- a/instrumentation/pg/test/opentelemetry/instrumentation/pg/instrumentation_test.rb
+++ b/instrumentation/pg/test/opentelemetry/instrumentation/pg/instrumentation_test.rb
@@ -275,6 +275,13 @@ describe OpenTelemetry::Instrumentation::PG::Instrumentation do
       client.query('DROP TABLE test_table') # Drop table to avoid conflicts
     end

+    it 'extracts table name with quotes' do
+      client.query('CREATE TABLE "test_quotes_table" (personid int, name VARCHAR(50))')
+
+      _(span.attributes['db.collection.name']).must_equal '"test_quotes_table"'
+      client.query('DROP TABLE test_quotes_table') # Drop table to avoid conflicts
+    end
+
     describe 'when db_statement is obfuscate' do
       let(:config) { { db_statement: :obfuscate } }

This seems to fix it, though I'm pretty green at ruby and the inner workings of this library, so I don't know if there are knock-on effects here.

diff --git a/instrumentation/pg/lib/opentelemetry/instrumentation/pg/patches/connection.rb b/instrumentation/pg/lib/opentelemetry/instrumentation/pg/pat
ches/connection.rb
index 98814950..6330abee 100644
--- a/instrumentation/pg/lib/opentelemetry/instrumentation/pg/patches/connection.rb
+++ b/instrumentation/pg/lib/opentelemetry/instrumentation/pg/patches/connection.rb
@@ -15,7 +15,7 @@ module OpenTelemetry
         # Module to prepend to PG::Connection for instrumentation
         module Connection # rubocop:disable Metrics/ModuleLength
           # Capture the first word (including letters, digits, underscores, & '.', ) that follows common table commands
-          TABLE_NAME = /\b(?:FROM|INTO|UPDATE|CREATE\s+TABLE(?:\s+IF\s+NOT\s+EXISTS)?|DROP\s+TABLE(?:\s+IF\s+EXISTS)?|ALTER\s+TABLE(?:\s+IF\s+EXISTS)?)
\s+([\w\.]+)/i
+          TABLE_NAME = /\b(?:FROM|INTO|UPDATE|CREATE\s+TABLE(?:\s+IF\s+NOT\s+EXISTS)?|DROP\s+TABLE(?:\s+IF\s+EXISTS)?|ALTER\s+TABLE(?:\s+IF\s+EXISTS)?)
\s+(["\w\.]+)/i

           PG::Constants::EXEC_ISH_METHODS.each do |method|
             define_method method do |*args, &block|
@mostfunkyduck mostfunkyduck added the bug Something isn't working label Sep 17, 2024
@hannahramadan
Copy link
Contributor

Hi @mostfunkyduck! Thank you submitting this issue and sharing what you did to get things working properly. Would you be interested in making a contribution?

For testing, the file instrumentation/pg/test/fixtures/sql_table_name.json feeds each entry into this test. I think we could update the fixtures file to keep things consistent. Perhaps adding something like:

    {
      "name": "table_name_with_single_quotes",
      "sql": "SELECT columns FROM 'test_table'"
    },
    {
      "name": "table_name_with_double_quotes",
      "sql": "SELECT columns FROM \"test_table\""
    }

For the updated regex, is it desirable to include the quotes in the end result? Your suggestion would take care of this (although could be updated to recognize single quotes too).

If we don't want to include the quotes, and to account for both single and double quotes, we could do:

TABLE_NAME = /\b(?:FROM|INTO|UPDATE|CREATE\s+TABLE(?:\s+IF\s+NOT\s+EXISTS)?|DROP\s+TABLE(?:\s+IF\s+EXISTS)?|ALTER\s+TABLE(?:\s+IF\s+EXISTS)?)\s+["']?([\w\.]+)["']?/i

Would love to hear your thoughts ◡̈

@mostfunkyduck
Copy link
Author

I can take a shot at it! Just need to go through the contributor's guide.

We probably should not include the quotes for consistency's sake - two queries on the same table should produce the same attribute regardless of whether quotes are used - so I'll keep them outside the capture.

As far as single quotes, this SO post quotes the postgres manual as saying that single quotes are only for string constants, so they'd be a syntax error if they were wrapping the table name.

@hannahramadan
Copy link
Contributor

@mostfunkyduck - excited to see your contribution come through 🎉 I agree on keeping the quotes out and its also great to learn about single quotes v double quotes in this context, thank you for sharing that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants