|

Improving CouchDB Performance

In a previous piece titled The Technology Behind Couchbase I described the technology that underpins the functionality of the Couchbase Server 1.8 and the forthcoming Couchbase Server 2.0. In this post, I will show you how to improve the performance of CouchDB.

Within CouchDB, there is a one-to-one relationship between the design documents that you create, and the indexes that are produced in the process.

That’s important because if you put all 20 view definitions into one design document, all 20 views will be updated at the same time the next time someone accesses the view. The index is created when the view is accessed, and if there are lots of views to update then the time between sending the request and getting the reply can be significant.

One simple way to improve your performance is to reduce the number of view definitions in a single design document.

Using stale views

Another way is to make use of the stale parameter to your view. For example:

GET http://couchbase:5984/recipes/_design/recipes/_view/by_recipe?stale=ok

This tells CouchDB to return the view information based on the current index without forcing an update before the results are returned.

Obviously this is much quicker, because there’s no update occurring before the info is returned. But it runs the risk of returning old information, hence the stale moniker. Using this method for all your requests will mean your index is never updated.

Updating after

One solution to this is to continue to use stale views, but instead of specifying ok use the term update_after. This returns the index result as quickly as possibly by still using the stale view, but then asks CouchDB to update the index after the results have been returned:

GET http://couchbase:5984/recipes/_design/recipes/_view/by_recipe?stale=update_after

This updates the index after the results are returned, but may also delay queries made by other clients until the view update has been completed.

Using the _changes feed

CouchDB includes a _changes feed that outputs information when the database changes. You set a process to watch the changes feed and then trigger a view update when a specific number of changes have taken place. For example, the Python script below (which uses the couchdb module from monitors the changes feed for the recipes database, and requests the view (causing an update) when 10 changes have been received:

from couchdb import 
Server                                                                                                                
                                                                                                                                          
cdbs = Server('http://localhost:5984/')                                                                                                   
db = cdbs['recipes']                                                                                                                      
# the since parameter defaults to 'last_seq' when using continuous feed                                                                  
ch = db.changes(feed='continuous',heartbeat='1000')                                                                                       
                                                                                                                                          
counter = 0                                                                                                                               
                                                                                                                                          
for line in ch:                                                                                                                           
    counter=counter+1                                                                                                                     
    if (counter > 10):                                                                                                                    
       db.view('recipes/by_name')                                                                                                         
       counter = 0                                                                                                                        

You can now set all of your clients to use stale views, because this background process will update the view as changes are made to the DB, and you are no longer relying on your client requests to control your index updates.

Safari Books Online has the content you need

Check out these Couchbase and CouchDB books available from Safari Books Online:

CouchDB is a new breed of database for the Internet, geared to meet the needs of today’s dynamic web applications. With Getting Started with CouchDB, you’ll learn how CouchDB’s simple model for storing, processing, and accessing data makes it ideal for the type of data and rapid response users now demand from your applications—and how easy CouchDB is to set up, deploy, maintain, and scale.
Learn how to create MapReduce views in CouchDB that let you query the document-oriented database for meaningful data. With Writing and Querying MapReduce Views in CouchDB, you’ll get step-by-step instructions and lots of sample code to create and explore several MapReduce views, using an example database you construct.
Scaling CouchDB is a practical guide to web developers who need to scale their CouchDB database instances. The basic concepts behind CouchDB’s scalability (i.e. its distributed shared nothing architecture) will be covered.
Read Getting Started with GEO, CouchDB, and Node.js and learn how open formats like GeoJSON combined with data stores like CouchDB are making development of location based applications easier then ever before. Using NodeJS and CouchDB you can store location data, and perform complex queries on that data quickly.
In CouchDB: The Definitive Guide, three of the core developers gently explain how to work with CouchDB, using clear and practical scenarios. Each chapter showcases key features, such as simple document CRUD (create, read, updated, delete), advanced MapReduce, and deployment tuning for performance and reliability.

About the Author

  A professional writer for over 15 years, Martin ‘MC’ Brown is the author and contributor to over 26 books covering an array of topics, including the recently published Getting Started with CouchDB. His expertise spans myriad development languages and platforms Perl, Python, Java, JavaScript, Basic, Pascal, Modula-2, C, C++, Rebol, Gawk, Shellscript, Windows, Solaris, Linux, BeOS, Microsoft WP, Mac OS and more. He is a former LAMP Technologies Editor for LinuxWorld magazine and is a regular contributor to ServerWatch.com, LinuxPlanet, ComputerWorld and IBM developerWorks. As a Subject Matter Expert for Microsoft he provided technical input to their Windows Server and certification teams. He draws on a rich and varied background as founder member of a leading UK ISP, systems manager and IT consultant for an advertising agency and Internet solutions group, technical specialist for an intercontinental ISP network, and database designer and programmer and as a self-confessed compulsive consumer of computing hardware and software. MC is currently the VP of Technical Publications and Education for Couchbase and is responsible for all published documentation, training program and content, and the Couchbase Techzone, and can be reached at mcslp.net.

About Safari Books Online

Safari Books Online is an online learning library that provides access to thousands of technical, engineering, business, and digital media books and training videos. Get the latest information on topics like Windows 8, Android Development, iOS Development, Cloud Computing, HTML5, and so much more – sometimes even before the book is published or on bookshelves. Learn something new today with a free subscription to Safari Books Online.
|

2 Responses to Improving CouchDB Performance

  1. kxepal says:

    > db.view(‘recipes/by_name’)
    better to limit output due to we don’t interested in all of it:
    > db.view(‘recipes/by_name’, limit=1)
    This will reduce memory footprint because we wouldn’t read all view output, but still have to trigger his update.

  2. MC Brown says:

    That’s certainly completely valid – although that particular issue was not what the article was trying to address.

    It’s safe to say that limiting the size of the return query with limits, paging (especially with offset over key/id information) all have an effect on the performance of your view queries. So too do including entire docs in your query definition, or writing over-complicated views.