-
Notifications
You must be signed in to change notification settings - Fork 107
workqueue couch db replication procedure
From time to time we need to cleanup the central couch databases because they accumulate too much deleted docs (>90%), which causes:
- a slowdown in the system in general, including replication to the agents workqueue_inbox db
- an increase in the views size (central workqueue views are now about 70GB!)
There are at least two ways to cleanup couch databases, either deleting the whole database (when we do not care about its data, since it can get replicated back), or by temporarily creating another database only with the documents we want (so called permanent docs). For workqueue database in CMSWEB, the latter is the procedure we need to follow.
Since we migrated to CouchServer 1.6.1, the _replicate
database is gone and all replication tasks are carried by the replicator
db.
It's also worth mentioning that Alan faced several couch weirdness (and sometimes couch/replication crashes), problems like:
- permission denied to create documents in the new database (thus requiring
user_ctx
option, more below) - dozens/hundreds of process trigerring the couch replication, which caused most of the instabilities (can be seen with _active_tasks, which prints TONS of replication tasks instead of 1 only)
- replication filter in erlang causes much more replication process to be spawned, increasing the couch server instability.
These instructions are supposed to be performed in one of the developers VM, after copying over the original workqueue.couch database file from the CMSWEB production node. When running these steps on CMSWEB, there might be minor changes like the user_ctx
and/or making an outage before swapping the db files.
Let us start by creating the replication filter function in javascript. This function is very simple, it does NOT replicate documents that are deleted (doc._deleted:true), all the other documents will get replicated. Run the following command in the terminal:
cat > /tmp/filter.js << EOF
function(doc, req) {
if (doc._deleted){
return false;
}
return true;
}
EOF
Now we need to move this tmp file under the workqueue couchapps directory AND correct/update the ownership and permissions (owner might be different in CMSWEB):
mv /tmp/filter.js /data/srv/current/apps/workqueue/data/couchapps/WorkQueue/filters/FilterDeletedDocs.js
sudo chown _sw:_sw /data/srv/current/apps/workqueue/data/couchapps/WorkQueue/filters/FilterDeletedDocs.js
Make sure the correct (the one you want to replicate/cleanup) workqueue.couch
database is in place and with the right permissions/ownership. File is:
/data/srv/state/couchdb/database/workqueue.couch
Restart couch server such that it uploads the new filter function (in FilterDeletedDocs.js) to the WorkQueue design document:
(A=/data/cfg/admin; cd /data; $A/InstallDev -s stop:couchdb)
(A=/data/cfg/admin; cd /data; $A/InstallDev -s start:couchdb)
OPTIONAL: we might have to fix the permissions of this file to avoid errors in the logs (which has been always there). Not sure CMSWEB needs to do this...:
sudo chmod 770 /data/srv/current/auth/couchdb/hmackey.ini
For CMSWEB nodes, we need to disable the cronjobs running couch databases and view compaction to avoid overloading the node. Check that out before proceeding to the next step.
Then it's time to start the replication by creating a new document in the _replicator
database. Run this command:
curl -ks -X POST http://localhost:5984/_replicator -H 'Content-Type:application/json' -d '{"_id":"rep_wq", "source":"workqueue", "target":"newwq", "continuous": true, "create_target":true, "filter":"WorkQueue/FilterDeletedDocs", "user_ctx": {"name": "_admin", "roles": ["_admin"]}}'
the description of those options are:
- _id: we can define the doc id in couch 1.6.1, which helps to keep track of this document
- source: the database to replicate documents from
- target: the database to replicate documents to (if they are both local, we can just provide their names)
- continuous: set to true to make sure any new documents written to the source will also get replicated to the target
- create_target: creates the target database if it does not exist
- filter: the filter function which every single source document has to pass before getting replicated
- user_ctx: we need to provide a user and its role in order to write documents to the new database (CHECK: is it the same in CMSWEB?)
You get a revision number from the previous command, but you can also check the state of this replication by checking the replication document you posted before.
curl -ks "http://localhost:5984/_replicator/rep_wq"
In order to check the status of the replication, one can run the _active_tasks command or get a summary of each database:
curl -ks "http://localhost:5984/_active_tasks"
curl -ks -X GET 'http://localhost:5984/workqueue'
curl -ks -X GET 'http://localhost:5984/newwq'
once newwq
database has the same amount of doc_count
as the source database (workqueue), replication would be up-to-date.
In CMSWEB environment, we still need to create a workqueue outage of 15min or so to make sure workqueue won't have any new documents. Then checking newwq again, if they still have the same amount of valid documents, stop couch server, swap the files, and start couch.