Building site search with Ferret
I recently had the opportunity to build a search feature using the Ferret project.
So I wanted to document what I learned while it was still fresh in mind. First, getting Ferret and acts_as_ferret installed was pretty simple and straight forward. This blog post got me off to a good start.
The first index it created was ok but I was looking for more relevant results. The boost option made the difference I was looking for. Here is how I set it up in my model.
#use acts_as_ferret plug in to assist in creating the index
acts_as_ferret :fields => { :main_tags => { :boost => 4, :store => :yes },
:description => { :boost => 2 },
:keywords => { :boost => 1, :store => :yes }, #have to add store => yes for lazy to work
:photo_code => { :boost => 1 },
}
The production server has over 230k records in it. It took over 6 hours to index everything… (im still working out how to speed that up)… so for development I started with just a few hundred records. That only took a couple of minutes to index.
Once I had my index done I was ready to search it.
search_string = params[:search_term]
@results = Photo.find_by_contents(search_string, :limit => 50) #defaults to 10 results
(pagination notes are coming in a later post)
One of the unique requirements of this project was the need to refine the search on a specific set of keywords. For example, we need to know how many of the search results are also tagged with age, gender or ethnic keywords. So, to know how many of the search results are also tagged with “Mid Adult Women (2,544) ” I used the lazy method in ferret to spare a hit to the database for it. Going to the database for these refine search options put page loads over 10 seconds, 20+ seconds in some cases. Since that is completely unacceptable, Im glad the Ferret team came up with :lazy. It may have saved my job
Photo.find_by_contents(search_string + ” AND Mid Adult Women”, {:lazy => [:keywords]}).total_hits
I learned that find_by_contents will ignore the :condition option for extra query info. But, you can do multiple AND OR statements as part of the search string. Now I can refine my search many times over without ever touching the database!
In the end I am searching over 230k records getting as much as 20k results on some searches and showing total_hits on over 60 refine search options in less than .4 seconds with only one hit to the database. Pretty cool.
Next is pagination
Hi, its a so nice and good site ever. Its a really great and fantastic post here in this site. So, thank you for the sharing of your ideas and thoughts to all of us.