M102: MongoDB for DBAs – Home works

In this article you can find the solution for M102: MongoDB for DBAs – Home works. Before looking the answers directly to this, you can try yourself and match the answers with this.

Week 1
Homework: Homework 1.1
Download and install MongoDB from www.mongodb.org. Then run the database as a single server instance on your PC (that is, run the mongod binary). Then, run the administrative shell.

From the shell prompt type

> db.isMaster().maxBsonObjectSize

at the “>” prompt.
What do you get as a result?
Answer
16777216


Homework: Homework 1.2
Download the handout. Take a look at its content.
Now, import its contents into MongoDB, into a database called “pcat” and a collection called “products”. Use the mongoimport utility to do this.
When done, run this query in the mongo shell:

>db.products.find( { type : "case" } ).count()

What’s the result?

Steps
Go to the directory where the products.json is present.

Answer
3


Homework: Homework 1.3
At this point you should have pcat.products loaded from the previous step. You can confirm this by running in the shell:

Now, what query would you run to get all the products where brand equals the string “ACME”?

Steps:

Answer
db.products.find({brand: “ACME”});


Homework: Homework 1.4
How would you print out, in the shell, just the value in the “name” field, for all the product documents in the collection, without extraneous characters or braces, sorted alphabetically, ascending? (Check all that would apply.)

Answer
var c = db.products.find( { } ).sort( { name : 1 } ); c.forEach( function( doc ) { print( doc.name ) } ); (Correct)
var c = db.products.find( { } ).sort( { name : -1 } ); while( c.hasNext() ) { print( c.next().name); }
db.products.find( { }, { name : 1, _id : 0 } ).sort( { name : 1 } )
var c = db.products.find( { }, { name : 1, _id : 0 } ).sort( { name : 1 } ); while( c.hasNext() ) { print( c.next().name); } (Correct)


Week 2
Homework: Homework 2.1
We will use the pcat.products collection from week 1. So start with that; if not already set up, import it:

mongoimport --db pcat -c products < products.json

You can find products.json from the Download Handouts link.

In the shell, go to the pcat database. If you type:

use pcat;
db.products.count()

the shell should return 11.

Next, download homework2.js from the Download Handouts link. Run the shell with this script:
mongo --shell pcat homework2.js

First, make a mini-backup of the collection before we start modifying it. In the shell:

If you have any issues you can restore from "products_bak"; or, you can re-import with mongoimport. (You would perhaps need in that situation to empty the collection first or drop it; see the --drop option on mongoimport --help.)

In the shell, type:

homework.a()

What is the output? (The above will check that products_bak is populated.)

Answer
3.05


Homework: Homework 2.2
Add a new product to the products collection of this form:
{
"_id" : "ac9",
"name" : "AC9 Phone",
"brand" : "ACME",
"type" : "phone",
"price" : 333,
"warranty_years" : 0.25,
"available" : true
}

Note: in general because of the automatic line continuation in the shell, you can cut/paste in the above and shouldn't have to type it all out. Just enclose it in the proper statement(s) to get it added.
Next, load into a shell variable the object corresponding to _id : ObjectId("507d95d5719dbef170f15c00")

  • Then change term_years to 3 for that document. (And save that to the database.)
  • Then change over_rate for sms in limits to 0.01 from 0. Save that too.

At the shell prompt type: homework.b()

What is the output?

Answer
0.050.019031


Homework: Homework 2.3
How many products have a voice limit? (That is, have a voice field present in the limits subdocument.)

Input your answer below, no spaces.

While you can parse this one by eye, please try to use a query that will do the work of counting it for you.

Steps

Answer
3


Week 3
Homework: Homework 3.1

Start a mongod server instance (if you still have a replica set, that would work too).
Next, download the handout and run: mongo --shell localhost/performance performance.js

homework.init()

Build an index on the "active" and "tstamp" fields. You can verify that you've done your job with
db.sensor_readings.getIndexes()

When you are done, run:

homework.a()

and enter the numeric result below (no spaces).

Note: if you would like to try different indexes, you can use db.sensor_readings.dropIndexes() to drop your old index before creating a new one. (For this problem you will only need one index beyond the _id index which is present by default.)

Answer
6


Homework: Homework 3.2

In a mongo shell run homework.b(). This will run in an infinite loop printing some output as it runs various statements against the server.

We'll now imagine that on this system a user has complained of slowness and we suspect there is a slow operation running. Find the slow operation and terminate it.

In order to do this, you'll want to open a second window (or tab) and there, run a second instance of the mongo shell, with something like:

$ mongo --shell localhost/performance performance.js

Keep the other shell with homework.b() going while this is happening. Once you have eliminated the slow operation, run (on your second tab):

homework.c()

and enter the output below. Once you have it right and are ready to move on, ctrl-c (terminate) the shell that is still running the homework.b() function.

Answer
12


Homework: Homework 3.3

Download and extract the json file in products.zip

Then perform the following in the terminal (or at the command prompt):

mongoimport -d pcat -c products --drop products.json

If that looks somewhat familiar, that's because it's (nearly) the same command you used to import the pcat.products collection for Homework 2.1, with the only difference in the command being that it will drop the collection if it's already present. This version of the collection, however, contains the state of the collection as it would exist once you've solved all of the homework of chapter 2.

Next, go into the pcat database.

mongo pcat

Create an index on the products collection for the field, "for".

After creating the index, do a find() for products that work with an "ac3" phone ("ac3" is present in the "for" field).

  • How many products match this query?
  • Run the same query, but this time do an explain(). How many documents were examined?
  • Does the explain() output indicate that an index was used?

Answer
Q1: 0
Q1: 1
Q1: 3
Q1: 4 (Correct)
Q2: 1
Q2: 4 (Correct)
Q2: 5
Q2: 12
Q3: No
Q3: Yes (Correct)


Homework: Homework 3.4

Which of the following are available in WiredTiger but not in MMAPv1? Check all that apply.

Document level locking (Correct)
Indexes
Data compression (Correct)
Collection level locking
Covered Queries


Week 4

M101JS: MongoDB for Node.js Developers – Final Exam Detail

M101JS Final Exam Elaboration


Question 1

Step 1:
Download the Enron email dataset enron.zip, unzip it and restore it using mongorestore.

Step 2
Fine one document to see the document structre

Step 3
Now you need to write a query to calculate the number of messages sent by email address “andrew.fastow@enron.com”

The required answer is 3.

Back



Question 2

Back



Question 3

Back



Question 4
You can find the solution at GitHub.

Back



Question 5

To test you need to insert few documents and create indexes matching with the question. In the below code we are inserting 3003 documents into the testcollection.

In the below query we are creating the required indexes to test the collection according to question.

Finally run the query given in the question and have a look which indexes are used in the below explain method output. You should know that all the winning plan and rejected plans could be used for the query.

Back



Question 6
Add an index on last_name, first_name if one does not already exist.
This is false because adding an index does not speed up inserts. It actually slows them down, because the index needs to be maintained. And so if you add an index, than it could potentially slow down an insert, so that it never helps to add an index on an insert.

Remove all indexes from the collection, leaving only the index on _id in place.
This is true because without indexes the database doesn’t need to maintain the indexes on an insert and so it’ll be a quicker to do an insert.

Provide a hint to MongoDB that it should not use an index for the inserts
This is false because hints tell the database what index to use on a query for a read but you can’t tell the database not update the update and keep the indexes up-to-date.

Set w=0, j=0 on writes
This is true because setting this write concert you are going to get your best performance by so-called fire-and-forget writes and the reason is that you are not going to wait for any acknowledgement from the server.

Build a replica set and insert data into the secondary nodes to free up the primary nodes.
This is false because you can only write data to the primary node but not secondary nodes.

Back



Question 7

Back



Question 8
Maybe, it depends on whether Node 2 has processed the write.
This is the correct among all other options because if only Node 1 has processed the write but not the Node 2 then the rollback is happens, but if Node 2 already processed write before the Node 1 getting down means no rollback is needed.

Back



Question 9
The patient_id is the correct answer because there is a lot of patients, so you get a good diversity of data where as rest options something is where there aren’t that many and it would be hard to have well-distributed data.

Back



Question 10
The query returned 120,477 documents.
This is false because the query returns 83,057 documents.

The query used an index to figure out which documents match the find criteria.
This is false because the match is ‘headers.Date’ but the ‘Header.From’ has the index.

The query scanned every document in the collection.
This statement is correct because if you check the indexBounds, it can be seen that MinKey and MaxKey key range is canned which means every document in the collection is scanned.

The query avoided sorting the documents because it was able to use an index’s ordering.
This is true because Header.From has an index and indexes are always in sorted order and the query doesn’t need to sort.

Back

M101JS: MongoDB for Node.js Developers – Final Exam

I was in dilemma whether to publish M101JS Final Exam answers or not, because finding the answers on the website might stop you trying to answer yourself and promote to copy from somewhere instead. At the same time I thought you might be busy with other stuffs and this article might help you to answer the M101JS final exam in less time. In addition, I found few websites answering few questions incorrectly and coping from there might degrade your final grading in the M101JS: MongoDB for Node.js Developers final exam. ☺

Final: Question 1
Please download the Enron email dataset enron.zip, unzip it and then restore it using mongorestore. It should restore to a collection called “messages” in a database called “enron”. Note that this is an abbreviated version of the full corpus. There should be 120,477 documents after restore.

Inspect a few of the documents to get a basic understanding of the structure. Enron was an American corporation that engaged in a widespread accounting fraud and subsequently failed.

In this dataset, each document is an email message. Like all Email messages, there is one sender but there can be multiple recipients.

Construct a query to calculate the number of messages sent by Andrew Fastow, CFO, to Jeff Skilling, the president. Andrew Fastow’s email addess was andrew.fastow@enron.com. Jeff Skilling’s email was jeff.skilling@enron.com.
For reference, the number of email messages from Andrew Fastow to John Lavorato (john.lavorato@enron.com) was 1.

Answer
1
3 (Correct)
5
7
9
12

Elaboration is here


Final: Question 2
Please use the Enron dataset you imported for the previous problem. For this question you will use the aggregation framework to figure out pairs of people that tend to communicate a lot. To do this, you will need to unwind the To list for each message.

This problem is a little tricky because a recipient may appear more than once in the To list for a message. You will need to fix that in a stage of the aggregation before doing your grouping and counting of (sender, recipient) pairs.

Which pair of people have the greatest number of messages in the dataset?

Answer
susan.mara@enron.com to jeff.dasovich@enron.com (Correct)
soblander@carrfut.com to soblander@carrfut.com
susan.mara@enron.com to james.steffes@enron.com
evelyn.metoyer@enron.com to kate.symes@enron.com
susan.mara@enron.com to alan.comnes@enron.com

Elaboration is here


Final: Question 3
In this problem you will update a document in the Enron dataset to illustrate your mastery of updating documents from the shell.

Please add the email address “mrpotatohead@mongodb.com” to the list of addresses in the “headers.To” array for the document with “headers.Message-ID” of “<8147308.1075851042335.JavaMail.evans@thyme>”

After you have completed that task, please download final3.zip from the Download Handout link and run final3-validate.js to get the validation code and put it in the box below without any extra spaces. The validation script assumes that it is connecting to a simple mongo instance on the standard port on localhost.

Answer
vOnRg05kwcqyEFSve96R

Elaboration is here


Final: Question 4
Enhancing the Blog to support viewers liking certain comments

In this problem, you will be enhancing the blog project to support users liking certain comments and the like counts showing up the in the permalink page.

Start by downloading Final4.zip and posts.json from the Download Handout link and loading up the blog dataset posts.json. The user interface has already been implemented for you. It’s not fancy. The /post URL shows the like counts next to each comment and displays a Like button that you can click on. That Like button POSTS to the /like URL on the blog, makes the necessary changes to the database state (you are implementing this), and then redirects the browser back to the permalink page.

This full round trip and redisplay of the entire web page is not how you would implement liking in a modern web app, but it makes it easier for us to reason about, so we will go with it.

Your job is to search the code for the string “TODO: Final exam question – Increment the number of likes” and make any necessary changes. You can choose whatever schema you want, but you should note that the entry_template makes some assumptions about the how the like value will be encoded and if you go with a different convention than it assumes, you will need to make some adjustments.

The validation script does not look at the database. It looks at the blog.

The validation script, final4-validate.js, will fetch your blog, go to the first post’s permalink page and attempt to increment the vote count. You run it as follows:

node final4-validate.js

Remember that the blog needs to be running as well as Mongo. The validation script takes some options if you want to run outside of localhost.
After you have gotten it working, enter the validation string below.

Answer
VQ3jedFjG5VmElLTYKqS

Elaboration is here


Final: Question 5
Suppose your have a collection stuff which has the _id index,

and one or more of the following indexes as well:

Now suppose you want to run the following query against the collection.

db.stuff.find({'a':{'$lt':10000}, 'b':{'$gt': 5000}}, {'a':1, 'c':1}).sort({'c':-1})

Which of the indexes could be used by MongoDB to assist in answering the query? Check all that apply.

Answer
c_1 (Correct)
a_1_b_1 (Correct)
a_1_c_1 (Correct)
_id_
a_1_b_1_c_-1 (Correct)

Elaboration is here


Final: Question 6
Suppose you have a collection of students of the following form:

Now suppose that basic inserts into the collection, which only include the last name, first name and student_id, are too slow (we can’t do enough of them per second from our program). What could potentially improve the speed of inserts. Check all that apply.

Answer

Add an index on last_name, first_name if one does not already exist.
Remove all indexes from the collection, leaving only the index on _id in place (Correct)
Provide a hint to MongoDB that it should not use an index for the inserts
Set w=0, j=0 on writes (Correct)
Build a replica set and insert data into the secondary nodes to free up the primary nodes

Elaboration is here


Final: Question 7
You have been tasked to cleanup a photosharing database. The database consists of two collections, albums, and images. Every image is supposed to be in an album, but there are orphan images that appear in no album. Here are some example documents (not from the collections you will be downloading).

From the above, you can conclude that the image with _id = 99705 is in album 67. It is not an orphan.

Your task is to write a program to remove every image from the images collection that appears in no album. Or put another way, if an image does not appear in at least one album, it’s an orphan and should be removed from the images collection.

Download and unzip Final7.zip and use mongoimport to import the collections in albums.json and images.json.

When you are done removing the orphan images from the collection, there should be 89,737 documents in the images collection. To prove you did it correctly, what are the total number of images with the tag ‘kittens” after the removal of orphans? As a sanity check, there are 49,932 images that are tagged ‘kittens’ before you remove the images.
Hint: you might consider creating an index or two or your program will take a long time to run.

Answer
49,932
47,678
38,934
45,911
44,822 (Correct)

Elaboration is here


Final: Question 8
Suppose you have a three node replica set. Node 1 is the primary. Node 2 is a secondary, Node 3 is a secondary running with a delay of two hours. All writes to the database are issued with w=majority and j=1 (by which we mean that the getLastError call has those values set).

A write operation (could be insert or update) is initiated from your application using the Node.js driver at time=0. At time=5 seconds, the primary, Node 1, goes down for an hour and node 2 is elected primary. Note that your write operation has not yet returned at the time of the failure. Note also that although you have not received a response from the write, it has been processed and written by Node 1 before the failure. Node 3, since it has a slave delay option set, is lagging.

Will there be a rollback of data on Node 1 when Node 1 comes back up? Choose the best answer.

Answer

Yes, always
No, never
Maybe, it depends on whether Node 3 has processed the write
Maybe, it depends on whether Node 2 has processed the write (Correct)

Elaboration is here


Final: Question 9
Imagine an electronic medical record database designed to hold the medical records of every individual in the United States. Because each person has more than 16MB of medical history and records, it’s not feasible to have a single document for every patient. Instead, there is a patient collection that contains basic information on each person and maps the person to a patient_id, and a record collection that contains one document for each test or procedure. One patient may have dozens or even hundreds of documents in the record collection.

We need to decide on a shard key to shard the record collection. What’s the best shard key for the record collection, provided that we are willing to run inefficient scatter-gather operations to do infrequent research and run studies on various diseases and cohorts? That is, think mostly about the operational aspects of such a system. And by operational, we mean, think about what the most common operations that this systems needs to perform day in and day out.

Answer

patient_id (Correct)
_id
Primary care physician (your principal doctor that handles everyday problems)
Date and time when medical record was created
Patient first name
Patient last name

Elaboration is here


Final: Question 10
Understanding the output of explain

We perform the following query on the enron dataset:

and get the following explain output.

Check below all the statements that are true about the way MongoDB handled this query.

Answer

The query returned 120,477 documents.
The query used an index to figure out which
documents match the find criteria.
The query scanned every document in the collection. (Correct)
The query avoided sorting the
documents because it was able to use an index’s ordering. (Correct)

Elaboration is here

Setting External Text Editor in MongoDB

You can use your own editor in the mongo shell by setting the EDITOR environment variable before starting the mongo shell. Environment variable EDITOR specifies the path to an editor to use with the edit shell command. A JavaScript variable EDITOR will override the value of EDITOR. Once you set the environment variable EDITOR, you can edit with the specified by typing edit variable or edit function. I am using Windows 10 to setup EDITOR but you can use the same process in Linux or Mac OS.

Step 1
You need to set up new environment variable EDITOR and its value as the editor location which you want to make external editor. I am making notepad as the external editor and in my computer notepad is in the location C:\Windows\notepad.exe.

Setting external editor in MongoDB
Setting external editor in MongoDB

Step 2
Open a command prompt and start MongoDB server by typing mongod in the command prompt.
Open another command prompt and start MongoDB client by typing mongo in the command prompt.

Step 3
Now it’s time to test few samples on the external editor that you set in the above steps. In the first demostration, you need to define a variable a in the mongo shell. Once the variable with value 1000 is set, you need to open this a variable in the external editor and change the value to 2000. When you give the edit a command to the shell, it opens notepad with 1000 and you need to change the value to 2000 and save the file and close it and the new value of variable a is 2000. To check the value of a type a in the mongo shell.

Step 4
In this step you can see how to edit a function using edit mongo shell command.

Define a function helloWorld as below:

function helloWorld() {}

Now you can edit this function using the edit shell command as below:

edit helloWorld

Once you give the command, external text editor notepad opens and modify the function as below:

Now type helloWorld to see the function definition and helloWorld() to execute the function as below:

Setup and Configure Replica Sets in MongoDB

MongoDB is becoming more and more popular day by day because of its great features and simple to use. High availability and easy scalability is one of the cool feature of MongoDB which it performs with replica sets concept. In this article I will highlight its high availability feature and we will revolve more or less around replication, how to set up and configure replica sets and finally we will test few database operations to see the replication and failover feature in detail.

Replica Sets is group/cluster of the mongod instances which might be geographically spread servers and the nodes regularly communicates with each other and replicates to each other so that up to date data can be available through any instance. In replica sets there are minimum 2 nodes and maximum 12 nodes can exist but for failover voting there is at least 3 nodes needed.

Before setting up replica sets let’s know few terms and working mechanism first. There are 4 types of nodes available in MongoDB replica sets.

  1. Primary node: In MongoDB replica sets there is one and only one primary node which stores the data and all reads and writes go to this node, however, we can change read preferences to the secondary nodes as well. If all read and write goes to the primary node you can get up to date data i.e. strong consistency. But when the read preferences goes to the secondary nodes, you might not get the up to date data because syncing from primary node to all other secondary nodes might delayed by some time and this is also termed as eventual consistency.
  2. Secondary node: Like primary node secondary nodes also store data. Besides that secondary nodes stays in sync with the primary node to get up to date data. Whenever a primary node is down there is always an election among the secondary servers to elect one as a primary for failover. This failover mechanism is automatic and no human intervention is needed. Each node can give 1 vote only during the process of electing the new primary during failover process.
  3. Backup node: This node is used for backup data only and never used for primary server.
  4. Arbiter node: Arbiter node doesn’t store any data but they are used for voting to make any node strict majority to elect as primary during failover process.

Now I believe that you know the basic concept of the replication. It’s time to create a replica sets. In this replica sets demonstration I am showing 3 nodes, however you can use up to 12 nodes if you wish. I am using my single machine for the demonstration.

Let’s follow the given steps to create and configure the 3 nodes replica sets in MongoDB. The configuration should be as mentioned below:

  • Port 27017 (default port) – primary node, 1.log as log file name 1.log, data file in rs1
  • Port 27018 – secondary node, 2.log as log file name, data file in rs2
  • Port 27019 – secondary node, 3.log as log file name, data file in rs3.

The replica sets name is “acemyskillsrepsets”.

Steps 1
You need to create three directories to store three mongod instances data separately.
On windows
Open command prompt and give the following command.

C:\>mkdir \data\rs1 \data\rs2 \data\rs3

On unix or mac

mkdir –p /data/rs1 /data/rs2 /data/rs3

Step 2
Now let’s start three mongod instances as follows.
On windows

C:\>start mongod --replSet acemyskillsrepsets --logpath  \data\rs1\1.log --dbpath \data\rs1 --port 27017 --smallfiles --oplogSize 64
C:\>start mongod --replSet acemyskillsrepsets --logpath  \data\rs1\2.log --dbpath \data\rs2 --port 27018 --smallfiles --oplogSize 64
C:\>start mongod --replSet acemyskillsrepsets --logpath  \data\rs1\3.log --dbpath \data\rs3 --port 27019 --smallfiles --oplogSize 64

On unix or mac

mongod --replSet acemyskillsrepsets --logpath \data\rs1\1.log --dbpath /data/rs1 --port 27017 --smallfiles --oplogSize 64 --fork 
mongod --replSet acemyskillsrepsets --logpath \data\rs1\2.log --dbpath /data/rs2 --port 27018 --smallfiles --oplogSize 64 –fork
mongod --replSet acemyskillsrepsets --logpath \data\rs1\3.log --dbpath /data/rs3 --port 27019 --smallfiles --oplogSize 64 --fork

Now three mongod servers are running but they are not configured or initialized yet to interconnect to work for replica sets.

Step 3
You need to interconnect all three nodes. You need to open a command prompt and write configuration code and finally you need to call rs.initiate(config) command to start the replica sets as expected. I am giving replica sets name as “acemyskillsrepsets” but you can give it any other name.

Now replica sets with three nodes is setup and configured successfully. You can check the status of the replica sets with rs.status().

If you observe the above result set its clearly written that there are three nodes and localhost:27017 is the primary and rest two are secondary nodes. Congratulations!!! the replication system named as “acemysekillsrepsets” is now ready to use.

Testing
Now we need to insert a document into the primary node and after insertion read the inserted document.

Now let’s connect with a secondary node and read the data inserted by the primary node. If we read the data in the secondary node means the secondary nodes are in sync with the primary node and primary node’s replication is available to all secondary nodes.

To connect with the localhost:27018 by opening another command prompt and connect with a secondary node localhost:27018. After connecting with localhost:27018 and once you try to read the data from the node, it gives you an error because by default you cannot read from the secondary node. To read data from the secondary node you need to give the command rs.slaveOk().

You can only read data from the secondary nodes but cannot insert document to the secondary nodes. Try the following query which gives you error.

Replication Internals
To know how this replication is happening internally, you need to analyze the oplog.rs file in each nodes including primary and secondary.

Let’s connect to the primary node (localhost:27017) and switch to local database. You can use show collections command to see all the collections present in the local database. Finally, you can see the oplog.rs file by giving the query db.oplog.rs.find().pretty().

In the above file you can see there are three operations creating replica sets, creating movies collection and inserting a movie “up” into the collection. You can find the similar oplog.rs file in all secondary nodes. This is the file which is synced between the nodes and replication is happened between the nodes.

Now let’s connect to a secondary node localhost:27018 to observe the oplog.rs but you will definitely see the similar file over there too.

Failover in Replication
To see the failover situation in replica sets, let’s shutdown the primary node. It can be clearly seen in which node you are now in the command prompt. But if you want to check from command, you can give rs.isMaster(). If you are in the secondary node, you need to connect to localhost:27017 i.e. primary node. Once you are inside the primary node, we need to shutdown the primary node giving the db.shutdownServer() command. Once the primary server is down there will be automatic election process to select the a new primary node which will take only a few milliseconds.

Now let’s connect to localhost:27018 and give the rs.status() command . In the result set it can be seen that one of the two secondary nodes becomes primary. Whenever the previously shutdown node becomes up, it will be secondary.

To see what happens when the previous node is down but one document is inserted into the newly elected primary node. Its very simple, whenever the previously down server becomes up it becomes as secondary node and sync all the data (oplog.rs) that is in the new elected primary nodes. Let’s see this scenario in the following section.

  1. Inserting a movie document into the new primary node.
    acemyskillsrepsets:PRIMARY> db.movies.insert({name: "The Day After Tomorrow", releasedyear: 2004});
    WriteResult({ "nInserted" : 1 })
    
  2. Make the previously down node to up
    C:\>start mongod --replSet acemyskillsrepsets --logpath  \data\rs1\1.log --dbpath \data\rs1 --port 27017 --smallfiles --oplogSize 64
    
  3. Connect to the localhost:27017 (which was down before) and you can see all the documents including the inserted documents when it was down. 

  4. Lastly, you can check the oplog.rs file and if you match it primary node. Its similar as primary node.

That’s the end of this article. I believe you understand the basic set up and configuration of replica sets in MongoDB. After reading and exercising this article you can try replica sets with separate computer or if possible geographically separated servers that will reflect the real implementation of the replica sets but don’t forget to manage your firewall configuration in that case. Happy Coding!!! Cheers!!!

CRUD Operations in MongoDB – Part 2

This is the second part of the CRUD Operations in MongoDB article. In the previous Part 1, you knew about the Create and Read (CR) operations. In this Part 2 you will know the basic CRUD operations Update and Delete (UD) in MongoDB using the same movies collection that is used in the Part 1.

In the previous Part 1 you know CRUD operations Create and Read(CR) operations. Now we are moving into third section Update of CRUD operations in MongoDB. There are mainly four different ways we can use update method in MongoDB which are enlisted below.

  1. Replacement of wholesale document
  2. Manipulating the individual fields in a document
  3. Upserts operation in a document
  4. Multiple updates in a collection.

1. Replacement of wholesale document
The update operator takes at least two arguments. The first argument is a query to the collection which documents to update, analogous to the WHERE clause in SQL. The second argument in update is a document that we need to update. In this type of update, whatever you put inside that document (second argument) will replace everything except the primary key of the document.

In the following the first query is to show the movie “Jurassic World”. The second query is using to update the “Jurassic World” movie document with “The Day After Tomorrow”. After running this query it replaces wholesale document expect the _id primary key.

strong>1. Manipulating the individual fields in a document
By using the $set operator we can update the specific fields in a document. If the fields are not present in the document, it adds the fields to the document. In the following query we are adding two new fields (genre, runningtimeinminutes) to the recently added movie “The Day After Tomorrow”. The second query is displaying the updated movie as output.

The $inc operator is used to make some increment in a field in MongoDB. If the field that you mentioned with $inc doesn’t exist, it creates that field to have an increment step. Suppose you need to increment runningtimeinminutes in the “The Day After Tomorrow” movie by 6 minutes. Currently it has 124 minutes and after using $inc, its value reached to 130 minutes.

By using $unset operator in MongoDB we can remove any specific field from the document. Now let’s remove genre from the “The Day After Tomorrow” movie.

Using $push, $pop, $pull, $pullAll, $addToSet
Let’s see these operator in another demo collection because it’s little bit difficult to show demonstration in the movies collection.

The $push operator extends the array by adding new element at the end. In the following query 25 is pushed into the array the end.

The $pop operator removes an element from the array either from start or end of the array. Our z array has now [ 10, 20, 12, 13, 25 ] element. To remove an element from the end of this array you need to give 1 and to remove from start of the array you need to give -1 in the query. In the following query first you gave 1 to the $pop operator and 25 is removed. In the next query you gave -1 to the $pop operator and 10 is removed.

The $pushAll operator is used to add numbers to an array. For instance, the following query adds 30, 40, 50 number to the demo collection in z array.

The $pull operator is used to remove an element either from start or at the end but the $pull operator removes an element from any location inside an array.

The $pullAll operator removes a number of elements from any location inside an array. Now we have [ 20, 12, 13, 30, 50 ]. Now we want to remove 12 and 30 elements from the z array in demo collection.

The $addToSet adds element to the array if that element is not present and doesn’t add anything if that element is already present in the array. For instance, let’s add an element 100 twice in the array and you can see only one 100 element in the z array of demo collection.

Upserts operation in a document
The third type of update in MongDB is upserts operation in a document. If you see our movies collection we don’t have the movie “The Passion of the Christ” and if you try to run the following query nothing will happen because 0 document matched with our query.

But in MongoDB there is a third argument upsert: true/false which is used to insert a new document if that query doesn’t return a matching document. Now let’s rerun the above query just by adding a third argument upsert:true which will add a new document to the collection if no matching in the collection.

Multiple updates in a collection
Although the query selects multiple documents, the default behavior of update method in MongoDB is to update one single document. If you wish to update multiple documents in a query you have to supply a third argument with multi:true.

In the following query the empty document selects all the documents in the collection but it adds musicby field only to the first document in the collection.

To update all the matching document in the collection you have to supply a third argument multi:true in the query.

This is all about the update operator in MongoDB. Now we are moving into final operation of the CRUD i.e. Delete operation.

Delete
The remove method is used to remove documents in MongoDB. The following query removes a movie “The Day After Tomorrow” from the movies collection.

To remove all the documents from a collection either you can use db.movies.remove({}) or db.movies.drop() query. The first query keeps all the metadata of the collection like indexes but in the second query all the metadata of the collection is lost. You need to recreate the metadata once dropped.
In the following query we are removing all the documents from the movies collection using remove() method but you can also use the drop() method.

At the end, In this two part article you go through all the CRUD operations and I believe that you enjoyed the article. I am also sure that you are now perfect in CRUD operations in MongoDB if you practice all the examples with me. Happy Coding Cheers!!!

CRUD Operations in MongoDB – Part 1

CRUD stands for Create, Read, Update and Delete. Almost all programmers are familiar with SQL world where Insert, Select, Update and Delete are equivalent to CRUD operations. CRUD operations analogous to MongoDB are Insert, Find, Update and Remove methods. SQL table and record concept is collection and document in MongoDB. This article is divided into two sub articles CRUD Operations in MongoDB – Part 1 and CURD Operations in MongoDB – Part 2. This is the Part 1 and in this article, we will know the basic CRUD operations Create and Read (CR) in MongoDB using movies collection.

Now we know the basic definition of CRUD, it’s time to insert some of the documents into the database. To insert documents into MongoDB database, we need to start mongod server and mongo shell. If you don’t have MongoDB installed on your computer, I refer you to read my previous article on Install, configure and start MongoDB in Windows. Installing MongoDB on MAC OS and Linux is also almost similar. You need to follow the following steps to run mongod (server) and mongo shell (client):

  1. Open command prompt and give mongod command which starts the mongodb server
  2. Open another command prompt and give mongo command which starts mongo shell.

When mongo shell is started by default it shows the version and connects to the test database. But, if you want to connect the shell to other database when it starts you can give the database name to the command prompt.

You can use show dbs command to list out all the database names present in the MongoDB.

To get the database name that we are currently connected, we can use the db command in the Mongo shell.

To switch to any database or create a new database you can use use command. If the database is not available it creates the new database with the supplied name.

You can delete a database with dropDatabase command but you should be inside the database which you want to delete using use command.

Throughout this article we will use a hollywood database with movies collection. The schema of the document is as below:

Now its time to create a database named hollywood. In the following snippet we are creating a database hollywood, inserting a document into the movies collection with db.movies.insert({}) and finally displaying the collection with db.movies.find() method.

insert
You can also insert a number of documents into a hollywood collection where you need to supply multiple documents in an array. Once bulk insert is successful, you can give find command to see all the inserted results.

We already inserted four documents into movies collection. Now its time to import 30 more documents into the movies collection with mongoimport command. To import the documents you need to following the following steps:

  1. Download the Hollywood.csv
  2. Open command prompt and move to the folder where the downloaded Hollywood.csv file is located
  3. Give the command as:

Once all the documents are imported into the movies collection, you can verify it by just counting the number of documents present in the collection using count command of MongoDB.

I hope you understand how to insert single document, multiple documents and importing documents from csv into the existing collection. This is all about CRUD operations C (Create) part. Now we are moving into R (Read) part in detail.

findOne
The findOne method in MongoDB is used to display a random single document from a collection. You can use findOne method in three ways in MongoDB.

Firstly, you can use it without giving any argument which just picks a single document from the collection. Let’s use this method in the movies collection to get a single document at random. The _id is the primary key and globally unique within a collection.

Secondly, you can use it with one argument which specifies what criteria a document should match to display a single document. The following findOne method displays a single document whose name field contains “Avatar”.

Thirdly, you can use a second argument which allows you to stipulate what fields you want to get back from the database. In the following command we are telling the findOne method to display a single document whose name contains Avatar and display only name and directedby fields in the result set. By default MongoDB display the primary key _id field which we are forcely hiding from the result set saying false to it.