Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Searching for email addresses #224

Open
Jaco-Pretorius opened this issue Feb 18, 2015 · 13 comments
Open

Searching for email addresses #224

Jaco-Pretorius opened this issue Feb 18, 2015 · 13 comments

Comments

@Jaco-Pretorius
Copy link

I'm trying to search for users based on either name or email address. As far as I can tell postgresql interprets the '@' symbol as a special character and breaks my search.

Here is my setup on the search:

  pg_search_scope :search_by_name_and_email,
    against: { name: "A", email: "B" },
    using: { tsearch: { prefix: true }}

When I pass through a query of 'user@ex' I get the following sql:

SELECT "users".*, ((ts_rank((setweight(to_tsvector('simple', coalesce("users"."name"::text, '')), 'A') || setweight(to_tsvector('simple', coalesce("users"."email"::text, '')), 'B')), (to_tsquery('simple', ''' ' || 'user@ex' || ' ''' || ':*')), 0))) AS pg_search_rank FROM "users"  WHERE (((setweight(to_tsvector('simple', coalesce("users"."name"::text, '')), 'A') || setweight(to_tsvector('simple', coalesce("users"."email"::text, '')), 'B')) @@ (to_tsquery('simple', ''' ' || 'user@ex' || ' ''' || ':*'))))  ORDER BY pg_search_rank DESC, "users"."id" ASC

It seems like postgresql is breaking up my query at the '@' token, based on the following:

select * from ts_parse('default', 'user@ex');
 tokid | token 
-------+-------
     1 | user
    12 | @
     1 | ex
(3 rows)
select * from ts_debug('simple', 'user@ex');
   alias   |   description   | token | dictionaries | dictionary | lexemes 
-----------+-----------------+-------+--------------+------------+---------
 asciiword | Word, all ASCII | user  | {simple}     | simple     | {user}
 blank     | Space symbols   | @     | {}           |            | 
 asciiword | Word, all ASCII | ex    | {simple}     | simple     | {ex}
(3 rows)

Should I be escaping my query in some way, or does pg_search simply not support searching by fields which contain an '@' symbol? (I'm guessing we would have the same problem with other 'special' characters in posgresql, such as '&', '|' or '_'.

@nertzy
Copy link
Collaborator

nertzy commented Feb 18, 2015

If PostgreSQL is treating @ as a space, then I think it should be added to our DISALLOWED_CHARACTERS array. Shouldn't be too difficult to add a few test cases and then update the implementation. I'd want to make sure queries for longer, more complete email addresses still work, because I believe the tsearch parser recognizes full email addresses.

If you or anyone can make a pull request, it would be appreciated as well.

I might be able to get to this soon. Otherwise

Grant Hutchins

On Feb 18, 2015, at 2:30 PM, Jaco Pretorius [email protected] wrote:

I'm trying to search for users based on either name or email address. As far as I can tell postgresql interprets the '@' symbol as a special character and breaks my search.

Here is my setup on the search:

pg_search_scope :search_by_name_and_email,
against: { name: "A", email: "B" },
using: { tsearch: { prefix: true }}
When I pass through a query of 'user@ex' I get the following sql:

SELECT "users"., ((ts_rank((setweight(to_tsvector('simple', coalesce("users"."name"::text, '')), 'A') || setweight(to_tsvector('simple', coalesce("users"."email"::text, '')), 'B')), (to_tsquery('simple', ''' ' || 'user@ex' || ' ''' || ':')), 0))) AS pg_search_rank FROM "users" WHERE (((setweight(to_tsvector('simple', coalesce("users"."name"::text, '')), 'A') || setweight(to_tsvector('simple', coalesce("users"."email"::text, '')), 'B')) @@ (to_tsquery('simple', ''' ' || 'user@ex' || ' ''' || ':*')))) ORDER BY pg_search_rank DESC, "users"."id" ASC
It seems like postgresql is breaking up my query at the '@' token, based on the following:

select * from ts_parse('default', 'user@ex');
tokid | token
-------+-------
1 | user
12 | @
1 | ex
(3 rows)
select * from ts_debug('simple', 'user@ex');
alias | description | token | dictionaries | dictionary | lexemes
-----------+-----------------+-------+--------------+------------+---------
asciiword | Word, all ASCII | user | {simple} | simple | {user}
blank | Space symbols | @ | {} | |
asciiword | Word, all ASCII | ex | {simple} | simple | {ex}
(3 rows)
Should I be escaping my query in some way, or does pg_search simply not support searching by fields which contain an '@' symbol? (I'm guessing we would have the same problem with other 'special' characters in posgresql, such as '&', '|' or '_'.


Reply to this email directly or view it on GitHub.

@Jaco-Pretorius
Copy link
Author

Hey @nertzy thanks for the quick reply. I actually did dig around (before posting above) and tried adding the '@' to DISALLOWED_CHARACTERS, but it didn't make a difference :-/

@pboling
Copy link

pboling commented Aug 6, 2015

We just ran into this on version 0.7.6. Hacked solution was to clean the query param in the controller:

    filtered_query = params[:query] ? params[:query].split(/\W/) : nil
    if filtered_query.present?
     model.search...

Then we decided to upgrade to latest version 1.0.4, and the issue has apparently been resolved, hack is no longer needed. I think this issue can be closed.

@amarshall
Copy link
Contributor

@Jaco-Pretorius Does upgrading to newer PgSearch solve this for you too?

@real-ashwin
Copy link

Is there a workaround for this? I am using Postgres 9.4.4 and pg_search-1.0.5

@danielbonnell
Copy link

danielbonnell commented Oct 18, 2016

I'm using pg_search 1.0.6 and Postgres 9.4. Having the same issue. If I have [email protected] as the value in my email column, and my query is bob, I get an empty array. The only way I get a match is if my query is exactly [email protected] or is bob. If I try to search for anything after the @ sign, no results are returned. So for example, example.com or @example.com return nothing.

@elithecho
Copy link

I'm still having issues with this as well using Postgres 9.4 and 1.06
including @ will not match any emails

@vinhboy
Copy link

vinhboy commented Jan 19, 2017

In case anyone comes here with the same problem @danielbonnell has in the comment above

The answer is that trigram search just simply does not work like that.

Here is a good explanation on why it won't work.

You're just going to have to use good-old LIKE

@danielbonnell
Copy link

@vinhboy I fortunately realized I didn't need trigram for my use case. So I changed my setup and now it works as expected.

class Contact < ActiveRecord::Base
    pg_search_scope :search_for, against: [:first_name, :last_name, :email],
        using: {
            tsearch: { prefix: true, any_word: true }
        },
        ranked_by: '(contacts.engagement)',
        order_within_rank: 'contacts.last_sign_in_at DESC'
end

@sbatyuk
Copy link

sbatyuk commented Aug 13, 2019

Here is one potential solution: https://stackoverflow.com/a/35669178

@sp2410
Copy link

sp2410 commented Sep 6, 2022

To solve this, I used a cached column to store tsvector on creation time where I stored results of to_tsvector('simple', regexp_split_to_array(email, E'\[.@]')::text). Then I used the above using query to match.

@cabello
Copy link

cabello commented Dec 9, 2022

@sp2410 if I try this method I get an error as soon as I try to create the column, how did you bypass that?

ALTER TABLE users
        ADD COLUMN searchable tsvector GENERATED ALWAYS AS (
          setweight(to_tsvector('simple', regexp_split_to_array(coalesce(email,''), E'\[.@]')::text), 'A') ||
          setweight(to_tsvector('simple', coalesce(last_name,'')), 'B') ||
          setweight(to_tsvector('simple', coalesce(first_name,'')), 'C')
        ) STORED;

Error:

PG::InvalidObjectDefinition: ERROR:  generation expression is not immutable

As soon as I remove the regex_split_to_array call, the migration resumes working.

@jbennett
Copy link

PR #509 would help with this by just using a basic ilike search. Let me know if this would work for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests