Sunspot is a Ruby library for expressive, powerful interaction with the Solr search engine. Sunspot is built on top of the RSolr library, which provides a low-level interface for Solr interaction; Sunspot provides a simple, intuitive, expressive DSL backed by powerful features for indexing objects and searching for them.
Sunspot is designed to be easily plugged in to any ORM, or even non-database-backed objects such as the filesystem.
This README provides a high level overview; class-by-class and method-by-method documentation is available in the API reference.
For questions about how to use Sunspot in your app, please use the Sunspot Mailing List or search Stack Overflow.
Add to Gemfile:
gem 'sunspot_rails'
gem 'sunspot_solr' # optional pre-packaged Solr distribution for use in development
Bundle it!
bundle install
Generate a default configuration file:
rails generate sunspot_rails:install
If sunspot_solr
was installed, start the packaged Solr distribution
with:
bundle exec rake sunspot:solr:start # or sunspot:solr:run to start in foreground
Add a searchable
block to the objects you wish to index.
class Post < ActiveRecord::Base
searchable do
text :title, :body
text :comments do
comments.map { |comment| comment.body }
end
boolean :featured
integer :blog_id
integer :author_id
integer :category_ids, :multiple => true
double :average_rating
time :published_at
time :expired_at
string :sort_title do
title.downcase.gsub(/^(an?|the)/, '')
end
end
end
text
fields will be full-text searchable. Other fields (e.g.,
integer
and string
) can be used to scope queries.
Post.search do
fulltext 'best pizza'
with :blog_id, 1
with(:published_at).less_than Time.now
order_by :published_at, :desc
paginate :page => 2, :per_page => 15
facet :category_ids, :author_id
end
Given an object Post
setup in earlier steps ...
# All posts with a `text` field (:title, :body, or :comments) containing 'pizza'
Post.search { fulltext 'pizza' }
# Posts with pizza, scored higher if pizza appears in the title
Post.search do
fulltext 'pizza' do
boost_fields :title => 2.0
end
end
# Posts with pizza, scored higher if featured
Post.search do
fulltext 'pizza' do
boost(2.0) { with(:featured, true) }
end
end
# Posts with pizza *only* in the title
Post.search do
fulltext 'pizza' do
fields(:title)
end
end
# Posts with pizza in the title (boosted) or in the body (not boosted)
Post.search do
fulltext 'pizza' do
fields(:body, :title => 2.0)
end
end
Solr allows searching for phrases: search terms that are close together.
In the default query parser used by Sunspot (edismax), phrase searches are represented as a double quoted group of words.
# Posts with the exact phrase "great pizza"
Post.search do
fulltext '"great pizza"'
end
If specified, query_phrase_slop sets the number of words that may appear between the words in a phrase.
# One word can appear between the words in the phrase, so "great big pizza"
# also matches, in addition to "great pizza"
Post.search do
fulltext '"great pizza"' do
query_phrase_slop 1
end
end
Phrase boosts add boost to terms that appear in close proximity; the terms do not have to appear in a phrase, but if they do, the document will score more highly.
# Matches documents with great and pizza, and scores documents more
# highly if the terms appear in a phrase in the title field
Post.search do
fulltext 'great pizza' do
phrase_fields :title => 2.0
end
end
# Matches documents with great and pizza, and scores documents more
# highly if the terms appear in a phrase (or with one word between them)
# in the title field
Post.search do
fulltext 'great pizza' do
phrase_fields :title => 2.0
phrase_slop 1
end
end
Fields not defined as text
(e.g., integer
, boolean
, time
,
etc...) can be used to scope (restrict) queries before full-text
matching is performed.
# Posts with a blog_id of 1
Post.search do
with(:blog_id, 1)
end
# Posts with an average rating between 3.0 and 5.0
Post.search do
with(:average_rating, 3.0..5.0)
end
# Posts with a category of 1, 3, or 5
Post.search do
with(:category_ids, [1, 3, 5])
end
# Posts published since a week ago
Post.search do
with(:published_at).greater_than(1.week.ago)
end
# Posts not in category 1 or 3
Post.search do
without(:category_ids, [1, 3])
end
# All examples in "positive" also work negated using `without`
# Passing an empty array is equivalent to a no-op, allowing you to replace this...
Post.search do
with(:category_ids, id_list) if id_list.present?
end
# ...with this
Post.search do
with(:category_ids, id_list)
end
# Posts that do not have an expired time or have not yet expired
Post.search do
any_of do
with(:expired_at).greater_than(Time.now)
with(:expired_at, nil)
end
end
# Posts with blog_id 1 and author_id 2
Post.search do
all_of do
with(:blog_id, 1)
with(:author_id, 2)
end
end
Disjunctions and conjunctions may be nested
Post.search do
any_of do
with(:blog_id, 1)
all_of do
with(:blog_id, 2)
with(:category_ids, 3)
end
end
end
Scopes/restrictions can be combined with full-text searching. The scope/restriction pares down the objects that are searched for the full-text term.
# Posts with blog_id 1 and 'pizza' in the title
Post.search do
with(:blog_id, 1)
fulltext("pizza")
end
All results from Solr are paginated
The results array that is returned has methods mixed in that allow it to operate seamlessly with common pagination libraries like will_paginate and kaminari.
By default, Sunspot requests the first 30 results from Solr.
search = Post.search do
fulltext "pizza"
end
# Imagine there are 60 *total* results (at 30 results/page, that is two pages)
results = search.results # => Array with 30 Post elements
search.total # => 60
results.total_pages # => 2
results.first_page? # => true
results.last_page? # => false
results.previous_page # => nil
results.next_page # => 2
results.out_of_bounds? # => false
results.offset # => 0
To retrieve the next page of results, recreate the search and use the
paginate
method.
search = Post.search do
fulltext "pizza"
paginate :page => 2
end
# Again, imagine there are 60 total results; this is the second page
results = search.results # => Array with 30 Post elements
search.total # => 60
results.total_pages # => 2
results.first_page? # => false
results.last_page? # => true
results.previous_page # => 1
results.next_page # => nil
results.out_of_bounds? # => false
results.offset # => 30
A custom number of results per page can be specified with the
:per_page
option to paginate
:
search = Post.search do
fulltext "pizza"
paginate :page => 1, :per_page => 50
end
Faceting is a feature of Solr that determines the number of documents that match a given search and an additional criterion. This allows you to build powerful drill-down interfaces for search.
Each facet returns zero or more rows, each of which represents a particular criterion conjoined with the actual query being performed. For field facets, each row represents a particular value for a given field. For query facets, each row represents an arbitrary scope; the facet itself is just a means of logically grouping the scopes.
# Posts that match 'pizza' returning counts for each :author_id
search = Post.search do
fulltext "pizza"
facet :author_id
end
search.facet(:author_id).rows.each do |facet|
puts "Author #{facet.value} has #{facet.count} pizza posts!"
end
# Posts faceted by ranges of average ratings
search = Post.search do
facet(:average_rating) do
row(1.0..2.0) do
with(:average_rating, 1.0..2.0)
end
row(2.0..3.0) do
with(:average_rating, 2.0..3.0)
end
row(3.0..4.0) do
with(:average_rating, 3.0..4.0)
end
row(4.0..5.0) do
with(:average_rating, 4.0..5.0)
end
end
end
# e.g.,
# Number of posts with rating within 1.0..2.0: 2
# Number of posts with rating within 2.0..3.0: 1
search.facet(:average_rating).rows.each do |facet|
puts "Number of posts with rating within #{facet.value}: #{facet.count}"
end
# Posts faceted by range of average ratings
Sunspot.search(Post) do
facet :average_rating, :range => 1..5, :range_interval => 1
end
By default, Sunspot orders results by "score": the Solr-determined
relevancy metric. Sorting can be customized with the order_by
method:
# Order by average rating, descending
Post.search do
fulltext("pizza")
order_by(:average_rating, :desc)
end
# Order by relevancy score and in the case of a tie, average rating
Post.search do
fulltext("pizza")
order_by(:score, :desc)
order_by(:average_rating, :desc)
end
# Randomized ordering
Post.search do
fulltext("pizza")
order_by(:random)
end
Solr 3.1 and above
Solr supports sorting on multiple fields using custom functions. Supported operators and more details are available on the Solr Wiki
To sort results by a custom function use the order_by_function
method.
Functions are defined with prefix notation:
# Order by sum of two example fields: rating1 + rating2
Post.search do
fulltext("pizza")
order_by_function(:sum, :rating1, :rating2, :desc)
end
# Order by nested functions: rating1 + (rating2*rating3)
Post.search do
fulltext("pizza")
order_by_function(:sum, :rating1, [:product, :rating2, :rating3], :desc)
end
# Order by fields and constants: rating1 + (rating2 * 5)
Post.search do
fulltext("pizza")
order_by_function(:sum, :rating1, [:product, :rating2, '5'], :desc)
end
# Order by average of three fields: (rating1 + rating2 + rating3) / 3
Post.search do
fulltext("pizza")
order_by_function(:div, [:sum, :rating1, :rating2, :rating3], '3', :desc)
end
Solr 3.3 and above
Solr supports grouping documents, similar to an SQL GROUP BY
. More
information about result grouping/field collapsing is available on the
Solr Wiki.
Grouping is only supported on string
fields that are not
multivalued. To group on a field of a different type (e.g., integer),
add a denormalized string
type
class Post < ActiveRecord::Base
searchable do
# Denormalized `string` field because grouping can only be performed
# on string fields
string(:blog_id_str) { |p| p.blog_id.to_s }
end
end
# Returns only the top scoring document per blog_id
search = Post.search do
group :blog_id_str
end
search.group(:blog_id_str).matches # Total number of matches to the query
search.group(:blog_id_str).groups.each do |group|
puts group.value # blog_id of the each document in the group
# By default, there is only one document per group (the highest
# scoring one); if `limit` is specified (see below), multiple
# documents can be returned per group
group.results.each do |result|
# ...
end
end
Additional options are supported by the DSL:
# Returns the top 3 scoring documents per blog_id
Post.search do
group :blog_id_str do
limit 3
end
end
# Returns document ordered within each group by published_at (by
# default, the ordering is score)
Post.search do
group :blog_id_str do
order_by(:average_rating, :desc)
end
end
# Facet count is based on the most relevant document of each group
# matching the query (>= Solr 3.4)
Post.search do
group :blog_id_str do
truncate
end
facet :blog_id_str, :extra => :any
end
Sunspot 2.0 only
Sunspot 2.0 supports geospatial features of Solr 3.1 and above.
Geospatial features require a field defined with latlon
:
class Post < ActiveRecord::Base
searchable do
# ...
latlon(:location) { Sunspot::Util::Coordinates.new(lat, lon) }
end
end
# Searches posts within 100 kilometers of (32, -68)
Post.search do
with(:location).in_radius(32, -68, 100)
end
# Searches posts within 100 kilometers of (32, -68) with `bbox`. This is
# an approximation so searches run quicker, but it may include other
# points that are slightly outside of the required distance
Post.search do
with(:location).in_radius(32, -68, 100, :bbox => true)
end
# Searches posts within the bounding box defined by the corners (45,
# -94) to (46, -93)
Post.search do
with(:location).in_bounding_box([45, -94], [46, -93])
end
# Orders documents by closeness to (32, -68)
Post.search do
order_by_geodist(:location, 32, -68)
end
Solr 4 and above
Solr joins allow you to filter objects by joining on additional documents. More information can be found on the Solr Wiki.
class Photo < ActiveRecord::Base
searchable do
text :caption, :default_boost => 1.5
time :created_at
integer :photo_container_id
end
end
class PhotoContainer < ActiveRecord::Base
searchable do
text :name
join(:caption, :type => :string, :join_string => 'from=photo_container_id to=id')
join(:photos_created, :type => :time, :join_string => 'from=photo_container_id to=id', :as => 'created_at_d')
end
end
PhotoContainer.search do
with(:caption, 'blah')
with(:photos_created).between(Date.new(2011,3,1), Date.new(2011,4,1))
end
Highlighting allows you to display snippets of the part of the document that matched the query.
The fields you wish to highlight must be stored.
class Post < ActiveRecord::Base
searchable do
# ...
text :body, :stored => true
end
end
Highlighting matches on the body
field, for instance, can be achieved
like:
search = Post.search do
fulltext "pizza" do
highlight :body
end
end
# Will output something similar to:
# Post #1
# I really love *pizza*
# *Pizza* is my favorite thing
# Post #2
# Pepperoni *pizza* is delicious
search.hits.each do |hit|
puts "Post ##{hit.primary_key}"
hit.highlights(:body).each do |highlight|
puts " " + highlight.format { |word| "*#{word}*" }
end
end
Solr can return some statistics on indexed numeric fields. Fetching statistics
for average_rating
:
search = Post.search do
stats :average_rating
end
puts "Minimum average rating: #{search.stats(:average_rating).min}"
puts "Maximum average rating: #{search.stats(:average_rating).max}"
search = Post.search do
stats :average_rating, :blog_id
end
It's possible to facet field stats on another field:
search = Post.search do
stats :average_rating do
facet :featured
end
end
search.stats(:average_rating).facet(:featured).rows do |row|
puts "Minimum average rating for featured=#{row.value}: #{row.min}"
end
Take care when requesting facets on a stats field, since all facet results are returned by Solr!
search = Post.search do
stats :average_rating do
facet :featured
end
stats :blog_id do
facet :average_rating
end
end
TODO
Sunspot can extract related items using more_like_this. When searching for similar items, you can pass a block with the following options:
- fields :field_1[, :field_2, ...]
- minimum_term_frequency ##
- minimum_document_frequency ##
- minimum_word_length ##
- maximum_word_length ##
- maximum_query_terms ##
- boost_by_relevance true/false
class Post < ActiveRecord::Base
searchable do
# The :more_like_this option must be set to true
text :body, :more_like_this => true
end
end
post = Post.first
results = Sunspot.more_like_this(post) do
fields :body
minimum_term_frequency 5
end
TODO
To specify that a field should be boosted in relation to other fields for all queries, you can specify the boost at index time:
class Post < ActiveRecord::Base
searchable do
text :title, :boost => 5.0
text :body
end
end
Stored fields keep an original (untokenized/unanalyzed) version of their contents in Solr.
Stored fields allow data to be retrieved without also hitting the underlying database (usually an SQL server). They are also required for highlighting and more like this queries.
Stored fields come at some performance cost in the Solr index, so use them wisely.
class Post < ActiveRecord::Base
searchable do
text :body, :stored => true
end
end
# Retrieving stored contents without hitting the database
Post.search.hits.each do |hit|
puts hit.stored(:body)
end
Sunspot simply stores the type and primary key of objects in Solr. When results are retrieved, those primary keys are used to load the actual object (usually from an SQL database).
# Using #results pulls in the records from the object-relational
# mapper (e.g., ActiveRecord + a SQL server)
Post.search.results.each do |result|
puts result.body
end
To access information about the results without querying the underlying
database, use hits
:
# Using #hits gives back all information requested from Solr, but does
# not load the object from the object-relational mapper
Post.search.hits.each do |hit|
puts hit.stored(:body)
end
If you need both the result (ORM-loaded object) and Hit
(e.g., for
faceting, highlighting, etc...), you can use the convenience method
each_hit_with_result
:
Post.search.each_hit_with_result do |hit, result|
# ...
end
If you are using Rails, objects are automatically indexed to Solr as a
part of the save
callbacks.
There are a number of ways to index manually within Ruby:
# On a class itself
Person.reindex
Sunspot.commit
# On mixed objects
Sunspot.index [post1, item2]
Sunspot.index person3
Sunspot.commit
# With autocommit
Sunspot.index! [post1, item2, person3]
If you make a change to the object's "schema" (code in the searchable
block),
you must reindex all objects so the changes are reflected in Solr:
bundle exec rake sunspot:solr:reindex
# or, to be specific to a certain model with a certain batch size:
bundle exec rake sunspot:solr:reindex[500,Post] # some shells will require escaping [ with \[ and ] with \]
# to skip the prompt asking you if you want to proceed with the reindexing:
bundle exec rake sunspot:solr:reindex[,,true] # some shells will require escaping [ with \[ and ] with \]
TODO
The default Sunspot Session is not thread-safe. If used in a multi-threaded environment (such as sidekiq), you should configure Sunspot to use the ThreadLocalSessionProxy:
Sunspot.session = Sunspot::SessionProxy::ThreadLocalSessionProxy.new
To add or modify parameters sent to Solr, use adjust_solr_params
:
Post.search do
adjust_solr_params do |params|
params[:q] += " AND something_s:more"
end
end
TODO
TODO
Configure Sunspot by creating a config/sunspot.yml file or by setting a SOLR_URL
or a WEBSOLR_URL
environment variable.
The defaults are as follows.
development:
solr:
hostname: localhost
port: 8982
log_level: INFO
test:
solr:
hostname: localhost
port: 8981
log_level: WARNING
You may want to use SSL for production environments with a username and password. For example, set SOLR_URL
to https://username:[email protected]/solr
.
You can examine the value of Sunspot::Rails::Configuration.solr_url
at runtime.
Install the required gem dependencies:
cd /path/to/sunspot/sunspot
bundle install
Start a Solr instance on port 8983:
bundle exec sunspot-solr start -p 8983
# or `bundle exec sunspot-solr run -p 8983` to run in foreground
Run the tests:
bundle exec rake spec
If desired, stop the Solr instance:
bundle exec sunspot-solr stop
Install the gem dependencies for sunspot
:
cd /path/to/sunspot/sunspot
bundle install
Start a Solr instance on port 8983:
bundle exec sunspot-solr start -p 8983
# or `bundle exec sunspot-solr run -p 8983` to run in foreground
Navigate to the sunspot_rails
directory:
cd ../sunspot_rails
Run the tests:
rake spec # all Rails versions
rake spec RAILS=3.1.1 # specific Rails version only
If desired, stop the Solr instance:
cd ../sunspot
bundle exec sunspot-solr stop
Install the yard
and redcarpet
gems:
$ gem install yard redcarpet
Uninstall the rdiscount
gem, if installed:
$ gem uninstall rdiscount
Generate the documentation from topmost directory:
$ yardoc -o docs */lib/**/*.rb - README.md
- Using Sunspot, Websolr, and Solr on Heroku (mrdanadams)
- Full Text Searching with Solr and Sunspot (Collective Idea)
- Full-text search in Rails with Sunspot (Tropical Software Observations)
- Sunspot Full-text Search for Rails/Ruby (The Rail World)
- A Few Sunspot Tips (spiral_code)
- Sunspot: A Solr-Powered Search Engine for Ruby (Linux Magazine)
- Sunspot Showed Me the Light (ben koonse)
- RubyGems.org — A case study in upgrading to full-text search (Websolr)
- How to Implement Spatial Search with Sunspot and Solr (Code Quest)
- Sunspot 1.2 with Spatial Solr Plugin 2.0 (joelmats)
- rails3 + heroku + sunspot : madness (anhaminha)
- heroku + websolr + sunspot (Onemorecloud)
- How to get full text search working with Sunspot (Hobo Cookbook)
- Full text search with Sunspot in Rails (hemju)
- Using Sunspot for Free-Text Search with Redis (While I Pondered...)
- Fuzzy searching in SOLR with Sunspot (pipe :to => /dev/null)
- Default scope with Sunspot (Cloudspace)
- Index External Models with Sunspot/Solr (Medihack)
- Testing with Sunspot and Cucumber (Collective Idea)
- Cucumber and Sunspot (opensoul.org)
- Testing Sunspot with Cucumber (spiral_code)
- Running cucumber features with sunspot_rails (Kabisa Blog)
- Testing Sunspot with Test::Unit (Type Slowly)
- Sunspot Quickstart (WebSolr)
- Solr, and Sunspot (YT!)
- The Saga of the Switch (mrb -- includes comparison of Sunspot and Ultrasphinx)
- Conditional Indexing with Sunspot (mikepack)
Sunspot is distributed under the MIT License, copyright (c) 2008-2013 Mat Brown