Best unofficial Apache Server developers community
Username
Forgot password?
Sign in with Twitter account
Sign in with Facebook account
List archives

Is MongoDB producing more than 100% storage overhead? i.e., I insert 22GB data and it occupies 50GB on the disk

Erlang Driver / Record Support
(21 lines)
Passing multiple Unique IDs (28 digits string values) as an array string at one time (not individual) to MongoDB and get Names as an array list
(34 lines)
Jan 26, 2012
X Chen
X Chen
I have done an simple experiment to test MongoDB's performance and
disk usage. I insert 22GB data and it occupies 50GB on the disk. I
will describe this experiment in details as below.

Setup:

     Version - MongoDB 2.0.2
     Environment:
          Single node without any replication or sharding
          VM via VirtualBox
          Linux Ubuntu 64bit
          100GB fixed virtual disk and 1GB memory

Experiment program:

     Language: C# && MongoDB C# driver
     Target and Procedure: Very simple. I just constantly create a new
{KEY, VALUE} pair and insert it into MongoDB.
     Settings:
          Number of Insertion = 1024 * 1024 * 1024 / 3
          Size of the KEY = 20 bytes (byte array), a counter with
increment of 1 for each insertion, i.e., KEY = {1, 2, 3, ...,
1024*1024*1024}
          Size of the VALUE = 100 bytes (byte array), randomly
generated through Random class.

Results:

     So this experiment means I wished to insert about 40GB of data
(120 bytes of data for each insertion) into MongoDB and I believe it
is simple enough. However, I stopped when the actual inserted data
reached 22GB because I found the storage overhead issue. The actual
data I inserted is about 22GB, but all the indexdb.* files are with
size of 50GB. So there is more than 100% storage overhead.


My own thoughts:

     I have read quite a bit of MongoDB's docs. According to what I
have read, there might be two kinds of overhead for the storage.
     1) the oplog. But it is meant to be capped about 5% of disk
space. In my case, it is capped about 5GB.
     2) preallocated data file. I didn't change any settings of
mongod, so I think it is 2GB in advance. And let me assume that the
latest 2GB file in use is nearly empty, so totally at most 4GB
overhead.

So from my calculation, whatever size of data I insert, there should
be 9GB overhead at most. But now the overhead is 50GB - 22GB = 28GB.
And I don't get a clue what is inside that 28GB. And if this overhead
is always more than 100%, it is quite a lot.

Can any one explain it to me?


Reply
Tags: setup versiondisk usage
couchdb disk storage format - why so large overhead?
December 28, 2011 10:12:26 AM
Hello. I'm using CouchDB for two+ years for our company's internal projects. It's very good and reliable database, i'm almost satisfied with it. But, couchdb disk size makes me cry. I'll describe this. My new project must store and manipulate…
Over 50GB of data in a MongoDB Cluster. How to effeciently perform frequency analysis on it?
September 15, 2011 01:24:59 PM
Hi all, I have a question regarding on performing calculations on a set of data of about 50GB which resides in a MongoDB cluster on EC2. My set up comprises of 4 shards, 3 configs and 1 router, all of which is a micro instance. My data are…
More than 20 million data storage my Mongodb Server,How Can I remove some data by query
November 6, 2011 09:36:28 AM
Hi,All I use the master-slave schema to record all users login logs in my site. The master server is a physics computer,slave server is another physics computer.It is have 500 thousand logs very day. It have more than 20 million logs in the…
Re: Fluctuating disk overhead
August 23, 2010 06:18:18 PM
David Van Couvering wrote: It will probably be a lot easier to explain if you du the "log" directory and the "seg0" directory separately each time. And for more detail do a ls -l of the "log" directory. My guess is that it is due to logging, and…
storage overhead
January 24, 2011 07:11:39 AM
Hi, Trying to figure out space overhead i've done a test. I've downloaded the lastest > db.serverStatus() { "host" : "blade7", "version" : "1.7.5-pre-", "uptime" : 1006, "uptimeEstimate" : 1005, …
Mongodb Flush data file to disk
October 20, 2011 02:08:59 PM
So.. by default, mongo "flushes to disk" every 60 seconds (or whatever you configure it as). My question is, does this mean there is a 2GB data file mapped into RAM.. that is getting written to.. and when fsync time rolls around, the entire 2GB…
how to insert binary data into mongodb?
September 15, 2011 05:15:34 AM
Hello all I am using ruby(mongo) to communicate with MongoDB, trying to insert binary data into mongodb. I wrote following ruby code but failed to work: db = Mongo::Connection.new("10.128.61.33",27017).db("testing") photos =…
Re: Re: Mongodb Sharded - Error on 100 threads trying to insert data to one collection
February 6, 2011 10:45:11 PM
Most of these are cleaned up for 1.8 On Sun, Feb 6, 2011 at 11:43 PM, Maimon Oded <oded.m### @gmail.com> wrote: > Thanks... > Mongodb messages are too scarry.. > Some are not even relevant to the admin... > > ERROR should be…
Re: Re: Mongodb Sharded - Error on 100 threads trying to insert data to one collection
February 6, 2011 10:46:21 PM
Grate! On Feb 7, 2011 6:44 AM, "Eliot Horowitz" <elioth### @gmail.com> wrote: Most of these are cleaned up for 1.8 On Sun, Feb 6, 2011 at 11:43 PM, Maimon Oded <oded.### @gmail.com> wrote: > Thanks... > Mongodb m... >…
Mongodb Sharded - Error on 100 threads trying to insert data to one collection
February 6, 2011 02:37:26 AM
Hi, We tried to insert data from 100 threads into one collection. here are the errors from the different components (mongod, mongos, java application server), it seams like when the shard was loaded an needed to split the chunks, some of the…
Re: Re: Mongodb Sharded - Error on 100 threads trying to insert data to one collection
February 6, 2011 10:44:01 PM
Thanks... Mongodb messages are too scarry.. Some are not even relevant to the admin... ERROR should be displayed on critical errors that requires admin intervention.... On Feb 6, 2011 11:22 PM, "Eliot Horowitz" <elioth### @gmail.com>…
Producing a csv file with a header in Simple Data Writer
August 19, 2010 03:22:16 PM
Hi all, Can anyone help me to enable header writing in the CSV file? Enabling the property "print_field_names" in the file jmeter.properties doesn't help. Here's a copy of that file: Thanks Thanh #
Created: (PIG-1792) Skewed Join Taking Too Long and Producing Too Much Data
January 6, 2011 09:18:29 PM
Skewed Join Taking Too Long and Producing Too Much Data
Data overhead discussion in Cassandra
July 14, 2011 02:14:59 PM
We just set up a demo cluster with Cassandra 0.8.1 with 12 nodes and loaded 1.5 TB of data into it. However, the actual space on disk being used by data files in Cassandra is 3 TB. We're using a standard column family with a million rows…
Custom Data Type size - too big overhead?
May 12, 2011 09:04:58 AM
Hi, we are trying to minimize data storage size as possible. We'd like to replace BOX and POINT datatypes with our own. We don't need double precision - 4 bytes integer would be totally fine. I tried following experiment in which custom data type…
Using virtual disk as HDFS storage
September 12, 2011 01:50:19 PM
Hi, I am exploring ways in which HDFS could be run in an environment supporting virtualization. For the particular application, I would at least like the map tasks to run in virtual containers (perhaps even lightweight containers for example LXC),…
storage size Vs disk space
January 5, 2012 02:11:59 PM
Hi I've ran inserts to fill up mongo with 10M objects. and did 5M modifications (adding a time string to documents) all with the deafault opLog size mongoVUE shows 5.5G storage + 1G index with 1.6 padding While the used disk space for the mongo…
Data storage compared to storage size
March 17, 2011 11:09:35 AM
Looking over my stats today I noticed that my stats storage size was much smaller than the file size. ~6 GB Storage Size ~12 GB File Size Since going live we have dropped and recreate on collection as capped. Other than that we haven't started…
Re: Comparison of MongoDB & CouchDB: MongoDB seems better on insert
December 20, 2010 04:25:55 PM
On Mon, Dec 20, 2010 at 5:20 PM, Sebastian Cohnen <sebasti### @googlemail.com> wrote: > question inside :) > > On 20.12.2010, at 23:02, Jan Lehnardt wrote: > >> Hi, >> >> On 20 Dec 2010, at 22:32, Chenini,…
Re: Comparison of MongoDB & CouchDB: MongoDB seems better on insert
December 20, 2010 04:53:07 PM
On 20.12.2010, at 23:34, Jan Lehnardt wrote: > > On 20 Dec 2010, at 23:20, Sebastian Cohnen wrote: > >> question inside :) >> >> On 20.12.2010, at 23:02, Jan Lehnardt wrote: >> >>> Hi, >>> …
Mongodb grew to 850GB, disk has zero bytes left!! I rm 9GB file but still have 0 disk space! How can I reclaim the bytes? (urgent)
March 21, 2011
Woke up this morning to find out MongoDB grew to all available disk space for some reason. I cannot repair the db to compact it as i have no space…
How to reduce MongoDB storage space after deleting a large amount of data
May 27, 2011
Hi all, I have a database in MongoDB, called dump. Currently, it reached 6GB in my server. I decided to delete 90% of data in this database to…
Handling full disk storage in Android apps
June 3, 2011
I'm curious about handling a random situation such as running out of disk storage on an Android device. I looked over the examples on…
Bulk Insert (like import) bson encoded data in mongodb
March 1, 2011
Hi, I am new to mongodb and thinking of trying this out in PHP5. If someone has any info or has done this please help. Basically I have a an object…
MongoDB / NodeJS - Data Insert Doesn't Auto Increment Object_ID
June 14, 2011
Hi There: I'm new to Node.JS, MongoDB, and Mongoose so please forgive me if my questions are naive. I've wrote a small bit of code to asyncronously…
MongoDB GridFS VS Directly disk IO
June 2, 2011
Use MongoDB GridFS store images and images stored directly on disk What are the advantages?
High IO Rate when writing on disk and blocked SQL INSERT statements
March 23, 2011
Hi there, I've a problem with high IO wait. I've tested some things with the following hardware and software -physical machine (HP ProLiant), Debian…
What is the fastest and most reliable way to split a 50GB binary file into chunks of 5GB or less, and then reassemble it later?
July 4, 2010
Our servers are running Ubuntu Linux, and the binary file is a BSON dump of a large MongoDB collection. How reliable is a tool like split ? Is there…
Hadoop JBOD disk configuration on HP Smart Array 410/i disk controller
May 9, 2011
Hi all, I'm in a evaluation phase of some hw that could be used for setting up a hadoop cluster. This hw is refurbished (hp G6 servers w/ Smart…
Hadoop JBOD disk configuration on HP Smart Array 410/i disk controller
May 6, 2011
Hi all, I'm in a evaluation phase of some hw that could be used for setting up a hadoop cluster. This hw is refurbished (hp G6 servers w/ Smart…
MongoDB: storage & when to use relationships
March 2, 2011
I'm new to MongoDB, so please bear with me. I have 2 questions: First, take the following: // add a record $obj = array( "title" => "Calvin and…
Problem with storage format of date in mongodb
March 1, 2011
I am using mongodb's date type to store date.. I am getting following data from my server's nginx log [17/Feb/2011:00:07:03 +0000] it means date is…
Django with Pluggable MongoDB Storage troubles
February 18, 2011
I'm trying to use django, and mongoengine to provide the storage backend only with GridFS. I still have a MySQL database. I'm running into a strange…
MongoDB and GrifFS. What are the best storage options in the range of 1 TB?
January 15, 2011
We are going to launch a service that will require between 1 and 2 GB for file storage per paid user. I am going to use GridFS for storing files. I…
Sphinx Storage Engine for MySQL reindex on record insert
February 28, 2011
Do I have to run the indexer each time I add new records unto my tables? Is there no other way around this? Whenever I add a new record I always…
What is the best way to secure MySQL data on a laptop *without* whole-disk-encryption?
January 4, 2011
I need to have the mysql data on my laptop stored in an encrypted state so that in case of the laptop being lost/stolen it will extremely difficult…
Opinion on data storage.
March 4, 2011
I have an upcoming project where the core of it will be storing a mapping between two integers. ( 1234 in column A maps to 4567 in column B). There…
Best method for xml data storage
June 14, 2011
I am a php/mysql developer learning android. I am creating an android app that receives info from my php app to create list views of different…
IOS Data Storage - MAMP
June 24, 2011
I am building an iPhone app which will store data in a database using sqllite. My question is, is it possible to test this on a local database, i.e.…
Problem with storage data
January 20, 2011
Save data: localstorage.setItem ('name', 'value'); When force close app or reboot device all storage data are lost. Help me fix problem:) PS Sorry…