(~saura)Jeet's World: 2013

Thursday, June 13, 2013

Dropping Indexes in Mongo to Free up Disk Space

This was an attempt to free up diskspace. We had massively created indexes to speed up reads. So while investigating the disk usage for mongo database we felt that the we can give up some indexes which were not in use. I was running mongo version 2.0.2. Here is how i investigated and completed the task.

THE GIVENS

$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/xvda1 7.9G 5.1G 2.8G 65% /
tmpfs 7.4G 0 7.4G 0% /dev/shm
/dev/md0 512G 508G 3.8G 100% /data

I Saw 3.8 GB is was left on the /data. Verified /data to be the /data for the mongo.
Mongo was running version 2.0.2
/etc/mongo.conf showed the mongo was running in the replica set configuration.

So is this server master ?? Lets check it out

$ mongo -u XXXX -p XXXXX admin
MongoDB shell version: 2.0.2
connecting to: admin
SECONDARY> db.getCollectionNames()
Tue Jun 11 09:24:08 uncaught exception: error: { "$err" : "not master and slaveok=false", "code" : 13435 }

So this is not master. Switch to Slave mode.
Alternatively You Can refer the mongo REST webapi at port 28017 to know which is primary.

SECONDARY> rs.slaveOk()
SECONDARY>

START THE DETECTIVE WORK

Lookin at what constituted the data

SECONDARY> db.adminCommand('listDatabases')
{
"databases" : [
{
"name" : "local",
"sizeOnDisk" : 28121759744,
"empty" : false
},
{
"name" : "admin",
"sizeOnDisk" : 218103808,
"empty" : false
},
{
"name" : "hindenberg",
"sizeOnDisk" : 515095134208,
"empty" : false
},
{
"name" : "test",
"sizeOnDisk" : 218103808,
"empty" : false
},
{
"name" : "config",
"sizeOnDisk" : 1,
"empty" : true
}
],
"totalSize" : 543653101568,
"ok" : 1
}

SizeOnDisk is in bytes. I am Seeing The databases local and hindenberg are taking 27G and 480G Approximately.
http://docs.mongodb.org/manual/reference/local-database/ says that it is used to save the replication data and inhouse stuff. Checking the size of oplog for replication data.

SECONDARY> rs.slaveOk()
SECONDARY> use local
switched to db local
SECONDARY> db.oplog.rs.dataSize()
25940727652
SECONDARY>

This says that the size of oplog is 24G. We can see the most of the local database is majorly taken up by the oplog. We get past this.

We need to check the other database "hindenberg"

SECONDARY> db.stats()
{
"db" : "hindenberg",
"collections" : 16,
"objects" : 443885912,
"avgObjSize" : 543.3192339837088,
"dataSize" : 241171753684,
"storageSize" : 254077221152,
"numExtents" : 223,
"indexes" : 20,
"indexSize" : 243503445136,
"fileSize" : 515078356992,
"nsSizeMB" : 16,
"ok" : 1
}

We see that the hindenberg database is taking 236G of data and 226G worth of indexes over this data adding it to 462GB data. We need to see what indexes exist and what amount of space does it occupy on each collection

SECONDARY> db.printCollectionStats()
ghostrider
{
"ns" : "hindenberg.ghostrider",
"count" : 15605608,
"size" : 1428461880,
"avgObjSize" : 91.53516351301404,
"storageSize" : 3965091840,
"numExtents" : 30,
"nindexes" : 1,
"lastExtentSize" : 667406336,
"paddingFactor" : 1,
"flags" : 1,
"totalIndexSize" : 633746288,
"indexSizes" : {
"_id_" : 633746288
},
"ok" : 1
}
---
random
{
"ns" : "hindenberg.random",
"count" : 75,
"size" : 35760,
"avgObjSize" : 476.8,
"storageSize" : 40960,
"numExtents" : 2,
"nindexes" : 1,
"lastExtentSize" : 32768,
"paddingFactor" : 1,
"flags" : 1,
"totalIndexSize" : 8176,
"indexSizes" : {
"_id_" : 8176
},
"ok" : 1
}
---
bbws
{
"ns" : "hindenberg.bbws",
"count" : 20586,
"size" : 1714552,
"avgObjSize" : 83.2872826192558,
"storageSize" : 2793472,
"numExtents" : 5,
"nindexes" : 1,
"lastExtentSize" : 2097152,
"paddingFactor" : 1,
"flags" : 1,
"totalIndexSize" : 670432,
"indexSizes" : {
"_id_" : 670432
},
"ok" : 1
}
---
jiggs
{
"ns" : "hindenberg.jiggs",
"count" : 2700687,
"size" : 832824148,
"avgObjSize" : 308.3749238619655,
"storageSize" : 986931200,
"numExtents" : 20,
"nindexes" : 4,
"lastExtentSize" : 175112192,
"paddingFactor" : 1,
"flags" : 1,
"totalIndexSize" : 843084592,
"indexSizes" : {
"_id_" : 85692656,
"sOn" : 337644272,
"smId" : 205029552,
"fid" : 214718112
},
"ok" : 1
}
---
stfu
{
"ns" : "hindenberg.stfu",
"count" : 28920,
"size" : 4862320,
"avgObjSize" : 168.13001383125865,
"storageSize" : 11182080,
"numExtents" : 6,
"nindexes" : 2,
"lastExtentSize" : 8388608,
"paddingFactor" : 1.0099999999981542,
"flags" : 1,
"totalIndexSize" : 4194288,
"indexSizes" : {
"_id_" : 997472,
"gmem" : 3196816
},
"ok" : 1
}
---
lms
{
"ns" : "hindenberg.lms",
"count" : 1,
"size" : 64,
"avgObjSize" : 64,
"storageSize" : 8192,
"numExtents" : 1,
"nindexes" : 1,
"lastExtentSize" : 8192,
"paddingFactor" : 1,
"flags" : 1,
"totalIndexSize" : 8176,
"indexSizes" : {
"_id_" : 8176
},
"ok" : 1
}
---
yolos
{
"ns" : "hindenberg.yolos",
"count" : 2,
"size" : 144,
"avgObjSize" : 72,
"storageSize" : 8192,
"numExtents" : 1,
"nindexes" : 1,
"lastExtentSize" : 8192,
"paddingFactor" : 1,
"flags" : 1,
"totalIndexSize" : 8176,
"indexSizes" : {
"_id_" : 8176
},
"ok" : 1
}
---
mofos
{
"ns" : "hindenberg.mofos",
"count" : 416702947,
"size" : 235652429060,
"avgObjSize" : 565.5165886311814,
"storageSize" : 243732003136,
"numExtents" : 144,
"nindexes" : 4,
"lastExtentSize" : 2146426864,
"paddingFactor" : 1.4299999977435327,
"flags" : 1,
"totalIndexSize" : 242065139568,
"indexSizes" : {
"_id_" : 19739627488,
"smId" : 34082947808,
"cpair_son" : 78027059152,
"pson" : 110215505120
},
"ok" : 1
}
---
migrations
{
"ns" : "hindenberg.migrations",
"count" : 12,
"size" : 1220,
"avgObjSize" : 101.66666666666667,
"storageSize" : 8192,
"numExtents" : 1,
"nindexes" : 1,
"lastExtentSize" : 8192,
"paddingFactor" : 1,
"flags" : 1,
"totalIndexSize" : 8176,
"indexSizes" : {
"_id_" : 8176
},
"ok" : 1
}
---
system.indexes
{
"ns" : "hindenberg.system.indexes",
"count" : 20,
"size" : 1912,
"avgObjSize" : 95.6,
"storageSize" : 8192,
"numExtents" : 1,
"nindexes" : 0,
"lastExtentSize" : 8192,
"paddingFactor" : 1,
"flags" : 0,
"totalIndexSize" : 0,
"indexSizes" : {

},
"ok" : 1
}
---
system.js
{
"ns" : "hindenberg.system.js",
"count" : 2,
"size" : 1316,
"avgObjSize" : 658,
"storageSize" : 8192,
"numExtents" : 1,
"nindexes" : 1,
"lastExtentSize" : 8192,
"paddingFactor" : 1,
"flags" : 1,
"totalIndexSize" : 8176,
"indexSizes" : {
"_id_" : 8176
},
"ok" : 1
}
---
system.profile
{
"ns" : "hindenberg.system.profile",
"count" : 8901888,
"size" : 3293505232,
"avgObjSize" : 369.9782823598769,
"storageSize" : 5368713184,
"numExtents" : 3,
"nindexes" : 0,
"lastExtentSize" : 1075859456,
"paddingFactor" : 1,
"flags" : 0,
"totalIndexSize" : 0,
"indexSizes" : {

},
"capped" : 1,
"max" : 2147483647,
"ok" : 1
}
---
system.users
{
"ns" : "hindenberg.system.users",
"count" : 2,
"size" : 196,
"avgObjSize" : 98,
"storageSize" : 8192,
"numExtents" : 1,
"nindexes" : 1,
"lastExtentSize" : 8192,
"paddingFactor" : 1,
"flags" : 1,
"totalIndexSize" : 8176,
"indexSizes" : {
"_id_" : 8176
},
"ok" : 1
}
---
ufo_preferences
{
"ns" : "hindenberg.ufo_preferences",
"count" : 1000,
"size" : 71520,
"avgObjSize" : 71.52,
"storageSize" : 1003520,
"numExtents" : 1,
"nindexes" : 0,
"lastExtentSize" : 1003520,
"paddingFactor" : 1,
"flags" : 0,
"totalIndexSize" : 0,
"indexSizes" : {

},
"capped" : 1,
"max" : 1000,
"ok" : 1
}
---
ufo_details
{
"ns" : "hindenberg.ufo_details",
"count" : 208,
"size" : 3039296,
"avgObjSize" : 14612,
"storageSize" : 9408512,
"numExtents" : 5,
"nindexes" : 2,
"lastExtentSize" : 7077888,
"paddingFactor" : 1,
"flags" : 1,
"totalIndexSize" : 32704,
"indexSizes" : {
"_id_" : 8176,
"u_1" : 24528
},
"ok" : 1
}
---

Looking at the totalIndexSize Parameter in the JSON Objects we see that the the collection "mofos" has the largest index-size. Listing the Indexes.

SECONDARY > db.mofos.getIndexes()
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"ns" : "hindenberg.mofos",
"name" : "_id_"
},
{
"v" : 1,
"key" : {
"smId" : 1
},
"unique" : true,
"ns" : "hindenberg.mofos",
"name" : "smId_1",
"dropDups" : true
},
{
"v" : 1,
"key" : {
"cPair" : 1,
"sOn" : -1,
"_id" : -1
},
"ns" : "hindenberg.mofos",
"name" : "cpair_son"
},
{
"v" : 1,
"key" : {
"p" : 1,
"sOn" : -1,
"_id" : -1
},
"ns" : "hindenberg.mofos",
"name" : "p_Son"
}
]

SECONDARY> db.mofos.dropIndex("conversationPair_1_sentOn_-1__id_-1");
{ "errmsg" : "not master", "ok" : 0 }

Seems The Indexes were not deleted. REASON: Please read through.
Indexes can be created on the servers of replica set independently. And can be deleted independently too. People choose this to tweak application behavior. For example Slave Servers will have more reads => so create more indexes on slave servers.
Here is a caveat,
Some servers are put into clusters from the start. And the indexes are created later when we see data volume leading to slow queries. Once we create an index in primary get written in the oplog and are also transferred to the secondary servers during replication.
So the Indexes which were created by oplog play will think that the indexes are created on primary. This is a bug in mongo, which i think should be taken care of in the later versions of mongo.

SO HOW TO FIX IT:
-----------------------------
Once we take a server out of a replica set, it thinks of everything being a part of independent self. That will be a nice time to hit mongo with some admin level commands ignoring quorum behavior. And when the space is released, we add it back to the replica_set

So i changed the parameters in /etc/mongod.conf to comment out replSet configuration and start mongod process to bind to some other port say 37017, so that the primary is not able to find this secondary slave.

#replSet = hindenberg_replica_set

port = 37017

$ sudo service mongod start
Starting mongod: [ OK ]
forked process: 5048
all output going to: /var/log/mongo/mongod.log
[~]$ tail -f /var/log/mongo/mongod.log
Tue Jun 11 13:56:23 [initandlisten] MongoDB starting : pid=5048 port=37017 dbpath=/data/ 64-bit host=mongohost
Tue Jun 11 13:56:23 [initandlisten] db version v2.0.2, pdfile version 4.5
Tue Jun 11 13:56:23 [initandlisten] git version: 514b122d308928517f5841888ceaa4246a7f18e3
Tue Jun 11 13:56:23 [initandlisten] build info: Linux bs-linux64.10gen.cc 2.6.21.7-2.ec2.v1.2.fc8xen #1 SMP Fri Nov 20 17:48:28 EST 2009 x86_64 BOOST_LIB_VERSION=1_41
Tue Jun 11 13:56:23 [initandlisten] options: { auth: "true", config: "/etc/mongod.conf", dbpath: "/data", directoryperdb: "true", fork: "true", keyFile: "/data/rep.key", logappend: "true", logpath: "/var/log/mongo/mongod.log", port: 37017, rest: "true" }
Tue Jun 11 13:56:23 [initandlisten] journal dir=/data/primary_db/journal
Tue Jun 11 13:56:23 [initandlisten] recover : no journal files present, no recovery needed
Tue Jun 11 13:56:24 [initandlisten] waiting for connections on port 37017
Tue Jun 11 13:56:24 [websvr] admin web console waiting for connections on port 38017
^C
[~]$ sudo netstat -ntupl
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 0.0.0.0:37017 0.0.0.0:* LISTEN 5048/mongod
tcp 0 0 0.0.0.0:38017 0.0.0.0:* LISTEN 5048/mongod

So we see that not mongod process is running on port 37017. Now we will login to the mongo and will try to run the dropIndex again. Note that "SECONDARY" in front of the > prompt is gone showing the service is not running as a replica set mode.

$ mongo -u XXXXX -p XXXXXX admin --port 37017
MongoDB shell version: 2.0.2
connecting to: 127.0.0.1:37017/admin
> use hindenberg

switched to db hindenberg
> db.mofos.dropIndex("cpair_son");
{ "nIndexesWas" : 4, "ok" : 1 }
> db.mofos.dropIndex("p_sOn");
{ "nIndexesWas" : 3, "ok" : 1 }
> db.stats()
{
"db" : "hindenberg",
"collections" : 16,
"objects" : 444191482,
"avgObjSize" : 543.3492968602221,
"dataSize" : 241351129416,
"storageSize" : 254077221152,
"numExtents" : 223,
"indexes" : 18,
"indexSize" : 55336328992,
"fileSize" : 515078356992,
"nsSizeMB" : 16,
"ok" : 1
}

We Now see that the indexSize has reduced to 51GB from 226GB.

$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/xvda1 7.9G 5.1G 2.8G 65% /
tmpfs 7.4G 0 7.4G 0% /dev/shm
/dev/md0 512G 507G 5.4G 99% /data

We Also See that the disk space is not freed up. REASON: Mongo allocates size before hand. So if a fill of 100MB is made. Mongo will create a fixed file of 2G, fill it with zeroes and will keep on appending until the file is full. It repeats the step everytime a new data is added. This is called Extents. You would also find the number of extents in the db.stats() output. In the above case it is 223. That means hindenberg is actually taking up 223 X 2G that is roughly 446G in data and indexes.

In delete operation it doesnot delete the extents rather reuses it. So we wont be seeing any disk space utilization unntil and unless the virtually freed up space gets used up by mongo.

For this we need to compact the database or repair the database

$ mongo -u XXXXX -p XXXXXX admin --port 37017
MongoDB shell version: 2.0.2
connecting to: 127.0.0.1:37017/admin
> use hisndenberg
switched to db hindenberg
> db.mofos.runCommand("compact")
{ "ok" : 1 }
>

You may want to refer to http://docs.mongodb.org/manual/reference/command/compact/ to additional methods to compact database.

If you have deleted significant amount of indexes from multiple databases and multiple collections, maybe that is when you would want to run a full database repair rather than selecting each database and then collection and then run compact indivudually which you will find in http://docs.mongodb.org/manual/reference/command/repairDatabase/

Anyways that makes about 150G of free space on the partition.

You would also want to find the duplicated in the db indexes and drop them using dropdups command.

Finally, I reverted the changes to the /etc/mongo.conf to make it part of the replica set again and to listen to default port of 27017. The Cluster was happy to accept the new healthy mongo node.

MISSION ACCOMPLISHED. WIN WIN for everyone.

Suggested Reading:

Hope this tutorial helps you enough, Please Let me know

Friday, May 31, 2013

Using a Linux RSA key with Putty for Logging in from Windows Machine to Linux SSH Server

Putty doesnot take the default IDRSA files to do authentication with the server. Here is a step by step tutorial of ow people can connect to the talk.to infra using putty and windows,

Download the PuttyGen software from http://the.earth.li/~sgtatham/putty/latest/x86/puttygen.exe

Open in, Go to Conversions Menu > Import Key

SELECT THE PRIVATE KEY FILE

ENTER THE PASSPHRASE FOR THE PRIVATE KEY LOCAL DECRYPTION

THIS WINDOW WILL OPEN

GO TO FILE > SAVE PRIVATE KEY

NOW YOU HAVE ABC.PPK for ur use

Now Lets Configure PUTTY for this

OPen Putty: and Navigate to Connection > SSH > Auth in the LEft Hand Tree

Tick in the "Allow Agent Forwarding" , "Allow Attempted changes of username in SSH-2" and Supply the Generated PPK file in the Private Key Section as Demonstrated below

Go to the Kex Portion Just Above that and select "RSA-based key exchange" from "Algorithm selection policy"

MOVE IT UP ALL THE WAY TO THE TOP USING THE "Up" Button

YOU ARE ALL SET NOW

GO to THE SESSION BRANCH IN THE LHS TREE. AND ENTER THE IP ADDRESS OF THE HOST YOU ARE CONNECTING

Be sure to put in the username and the private key password,

May this prove helpful.

Thursday, May 30, 2013

Important Lessons to Learn From OpenSource

Open Source is all about openness, freedom, and community and somewhat all the above. Is it ?

From standpoint of a fresher, and being thrown at opensource is probably the best thing that happened to my life. Being a bengali and inheriting some cultural communism traits from my homeland Bengal, i really felt this overwhelmingly fun. Here by this post i make an attempt to share my views in context to what i have learnt from this industry valuable to my life.

DIY (Do it yourself)
We often come in situations where a responsibility of fixing a thing is not a individuals choice but is critical in a line of things. The important perception in this scenario is that i and you create we and all such we's create the community. If somebody takes even a small step to fix a thing, that gets magnified by the community as a stable next release.

Design Principles
Open Source design principles involve a lot of variables because of distribution of development workforce. The Bazaar Model discusses all such problems. In these challenges, simple analogies of modularity, control makes it possible for the others to contribute. All these actually are pieces of human aspects put to use to deal with any kind of creative work from somebody being as learned and experienced as a Chief Technology Office to a Junior Programmer. Being modular makes life more simple.

Hacktivism
This is a term coined for somebody who is not satisfied the usual and want to know more of the details by opening a black box. Though this term might seem very awesomely violent to some people and a act of copyright infringement for corporate. it is not so. It is derivative of common man inquisition and quest for learning and a little of "Lets Make Things Work" in a way it is not destructive at all. As long as there is hacktivism in the industry, there is whole new application idea coming to surface every now and then. Looking into a system internal gives more control over usual in a good way.

A Different Taste Everyday
Opensource gives more than it takes. With tons of opensource tools available on the horizon, we have a very widespread choice in the horizon. This is one of the real fun trying out new software to mold some of the very essential lifestyle choices. Starting from reading up mail to set up a cloud storage for your essential files, you can customize and program you way for a new and different way of doing stuff.
New ways of doing or new look and feel the same thing is always rejuvenating.

A New detective Story Everyday
Somebody related to software has a certain levels of pessimism in their lives, maybe because of the fact that no software is well enough to be perfect at its run-time To learn new ways the system can go wrong is just a another bug in our lives. Here simple lessons of perseverance, optimism scrutiny, and zest in improving the system drives this creativity towards more subtle limits. The most important thing is that a simple post-mortem uncovers so many truths about the system, that can be potentially be source of other bug(s). Here a good detective cleans the ecosystem rather than cleaning the system of only the brats under investigation. The detective is somewhat like Max Payne :).

Own what you do.
This one of the awesome wisdom community gives. What is yours is everybody's and what is everybody's is yours. You can get a software and can change components and make it like you own it. This is better than it reads. Any effort that is well enough as an enhancement in this customized system is well enough for the community as it traces back to it. The change might be expected / critical / or might be simple addon serving various other test cases. Either way it adds to the development cycle. VLC is one of the big successes mainly due to this.

Meet the Inspirers.
Opensource is filled with people loving, gentle and knowledge loving people. The kind of people fond of wildlife, photography, music and craftsmanship. Its a delight to know people with such personalities. Just as a matter of fact, I had been to FOSS.IN 2012 conference in Bangalore. I met with sebastian kugler (KDE Maintainer) (www.vizzzion.org)(http://www.behindkde.org/sebastian-k%C3%BCgler), Tobias Mueller (Gnome 3.6 Developer), Kushal Das (Fedora Developer), Lennart Poettering (systemd Maintainer), Gene Kogan, Kartik Mistry (Debian Maintainer), Ulrich Drepper (Former Fedora Contributer, currently works in GS). Each One of them had contributed to improve technology for themselves and others. It is pure magic that instills so much enthusiasm and inspiration.

Nobody is perfect
The Opensource evolved from mistakes of others. Though shortcomings of a software cannot be attributed to its developer, we get very important lessons of life. PATIENCE AND OPTIMISM. When u identify a bug and deploy a fix, you are actually bridging the gap of some other developer. And somebody might as well do it for you. We also infer from this fact, that the ready to use software is also buggy and we need to monitor and even deploy a fix if necessary. Living in this constant responsible and aware environment is a very important habit to get to.

Nothing is waste.
Somebody's leisure is anothers solution. GitHub, Launchpad, Sourceforge, Bitbucket are full of such small scale yet very useful softwares. A student makes some plugin for a software for disseration grades, ends up getting used by big time corporates, or better ends up being a complete full fledged software. Just as a peice of cloth ends up getting used up, so it software.

Proactive Participation
In opensource "Support" is a thing which is very rare. Proactive Participation in debugging, providing IRC help, Providing Documentations, Cheatsheets, Manuals are also important in software, which gives value addition. I feel very proud to see that majority of reasonably highly used softwares have nice documentation to help people use. Ops Software in particular usually have a very frequent discussions in IRC confrooms and forums, where developers learn about users facing problems and provide help to configuration and use using their expertise. Platforms like stackexchange help in tagging and categorising such expertise.

With all these points i hope adopting opensource instill good values in your life too.

Friday, January 18, 2013

A Meeting with Gene Kogan

I always thought creating art with codes is what happened in Walt Disney Studio Research in 1970's. And that its an era of 3DS Max, mordern graphics rendering engine, Flash and Sprites etc. I was proved wrong at FOSS.IN opensource by a hacker named Gene Kogan.

Take a look at: http://hasgeek.tv/fossdotin/2012-1/114-gene-kogan-hackernote-day-1-how-open-source-software-has-upended-art

Gene, is an artist, an opensource hacktivisit. Find him in genekogan.com. He blasted the hackernote session on DAY1 at Foss.in, at Bangalore. This guy loves art and is persuing a line of creating art with http://openprocessing.org/. He calls himself a Applied Mathamatics Guy and not a S/W Developer, but he definitely boggles out better s/w developers.
https://github.com/genekogan/

The Next Day i met him personally at the hackspace. To Discuss the Logger Tool he created for logging his day to day life. https://github.com/genekogan/logger

I actually started corresponding to him regarding suggestions and how it can be converted into a nice piece of service and a social media platform. Though insisted on the fact that he required nothing more than a way to log his life. But he was so happy that somebody had shown interest in his work and was willing to work in this venture. He was open to all changes, also told about the list of changes he is gonna release in coming days.

This guy is an example of what i liked to be, what i used to be. It is fun writing softwares to address own problems rather than problems of the mass domain and big corporates. Its so much fun, when u are the user of your own software. I wish him good luck in future endeavors with a hope to work full time with him someday.