|

Full Text Search in MongoDB

A guest post by Sam Millman, an active participant on both the official MongoDB user group at mongodb-user and on Stackoverflow MongoDB.

With version 2.4 of MongoDB comes the (experimental) feature of full text searching (FTS). This allows you to perform Google-like searches across your collections.

It should be noted that the FTS features of MongoDB are not ready for production. Please do not attempt to use this new feature in production until it has been vetted as being suitable.

There are no direct helpers for issuing the commands needed for FTS indexing and querying, so you must use MongoDB’s command functions directly.

Why Full Text Search

What if you want to enable a search feature for your site? Maybe one that could search all videos or all users. You might be asking yourself, “Why bother with full text search when I can just do: db.users.find({name:/^sammaye/})?”

Well, yes, you can of course do that, but it would not be very efficient, and regexes, or regular expressions, never are. Not only that, but you are probably searching only in English, but what if you want to search in Russian or another Latin language? How about further afield with the likes of Arabic? What if you want to search in the middle of a block of text? Then you can’t use an index at all. What about synonyms? Or sorting by the rank of the keyword density witin the fields of the document?

You might be asking yourself, “What is the point of stemming?” Stemming is the notion of changing a word into its root form. As an example, “walking” will result in “walk” and “trees” will result in “tree” after stemming. There are problems to stemming though: “argue”, “argued” and “argues” will all result in the stemmed word “argu,” while “arguments” will stem to “argument”. This means you can search by both the actual word and the normalized word for what the user entered. This makes your searching a lot more effective against real queries.

All in all, using regexes on the client side is not a scalable way to search MongoDB using full text tactics.

Before MongoDB 2.4, the normal way to deal with this problem while using indexes, was to use a keywords array within the root document as shown within the documentation. This meant that you would use a multikey field alongside, possibly, non-multikey fields. This, however, can bring its own problems to using indexes effectively. This approach also provides the problems of stemming, synonyms, ranking and foreground indexing.

The keywords array is completely managed on the client side by the application, or rather by the user. So, the new full text search feature is designed to be a replacement to this method. Although FTS in MongoDB has some way to go in its development, what has been brought out so far is extremely interesting and worth a look.

Creating an FTS index

Before any searching can be done, you must start MongoDB with the textSearchEnabled command, either through the command line options:

./mongod --setParameter textSearchEnabled=true

Or by running an administration command against the database in question:

db.adminCommand( { setParameter : "*", textSearchEnabled : true } );

Once complete, you will then have access to adding a text option as an index type within ensureIndex:

db.collection.ensureIndex( { "description": "text" } )

If more than one text index is created on a collection searching will fail, so make sure you fit all the needed fields within one single ensureIndex call.

Querying the FTS index

Now that MongoDB has been setup with FTS, you need to run the command that will allow you to search a collection. Currently the feature is accessed through the text command:

db.collection.runCommand( "text", { search : "walk" } )

This will search the collection for all fields in the text index that have the stemmed and unstemmed word of “walk.”

Current limitations

The feature released is not nearly as complex as what is provided by most other FTS technologies. The search currently has no “exact” matching, and supports only Latin and Russian languages. For example, you can’t search the title of all videos of a certain user in Arabic, or in Latin languages reliably.

It is also lacking facets, which is the ability to use the aggregation framework within it, and also the ability to search subdocuments.

As a final point, it should be noted that the result is currently limited to a single BSON document. BSON documents in MongoDB have a limitation in size of 16 megabytes. This means that you can only return 16 megabytes of results at a time currently.

I shouldn’t have to mention this, of course, since this feature is extremely alpha.

Conclusion

This post has taken you through the basics of the new FTS feature, which was released with MongoDB 2.4. We covered the creation of an FTS index and the querying of it from the shell, along with some limitations of the current alpha feature.

Safari Books Online has the content you need

Below are some MongoDB books with all sorts of tips and information.

MongoDB: The Definitive Guide, 2nd Edition shows you the many advantages of using document-oriented databases, and demonstrates how this reliable, high-performance system allows for almost infinite horizontal scalability.
MongoDB in Action is a comprehensive guide to MongoDB for application developers. The book begins by explaining what makes MongoDB unique and describing its ideal use cases. A series of tutorials designed for MongoDB mastery then leads into detailed examples for leveraging MongoDB in e-commerce, social networking, analytics, and other common applications.
MongoDB and Python is a cookbook-style text to help Python programmers work with MongoDB. It is full of useful, practical recipes for solving real-world problems ranging from how to do fast geo queries for location-based apps to efficiently indexing your user documents for social-graph lookups to how best to integrate MongoDB with the Pyramid Web framework.
Learn how to create large MongoDB clusters! Scaling MongoDB shows you how to use MongoDB efficiently for very large databases. It Covers sharding, cluster setup, and administration.

About the author

sammillman Sam Millman has been using MongoDB for almost 4 years now starting with MongoDB when it had just been released. He is an active participant on both the official MongoDB user group at mongodb-user and on Stackoverflow MongoDB. He has a love for all things web based and enjoys actively building web awesomeness in jQuery and PHP with a little Python on the side. You can contact him either at his blog as http://www.sammaye.wordpress.com/ or on Twitter as @sam_millman.

About Safari Books Online

Safari Books Online is an online learning library that provides access to thousands of technical, engineering, business, and digital media books and training videos. Get the latest information on topics like Windows 8, Android Development, iOS Development, Cloud Computing, HTML5, and so much more – sometimes even before the book is published or on bookshelves. Learn something new today with a free subscription to Safari Books Online.
|

Comments are closed.