SQLvsNOSQL

Comparison between MySQL and MongoDB

##Background The basic concept of SQL database is Relational database. The definition of relational database is that it strictly uses relations to store data. The way that a relational database matches data is that it uses common characteristics found in the dataset. In a table of a relational database, data is divided into sets of rows and columns. One example of relational database is Microsoft SQL server. It is a set of tables containing data fitted into predefined categories. Each table contains one or more data categories in columns, and each row contains a unique instance of data for the categories defined by the columns. Without knowing the structure of the database table, users can still have access data from the database. But SQL is quite expensive and difficult to scale, because the scaling of relational database has to be distributed on to multiple servers, and handling tables across different servers is a chaos [1]. Also, data in SQL server has to fit into tables, and it will be very complex to design database structure if data doesn’t fit into tables.

While SQL database is widely used, in the past few years, a great variety of alternative databases emerged because of the doubt of “one size fits all” thinking concerning data stores. These new database is commonly subsumed under the term of NoSQL. The basic characteristic of NoSQL is that, it would not require fixed table. Moreover, NoSQL database usually avoid join operations, and typically scale horizontally. Another important characteristic of NoSQL is that it trades off “ACID” (atomicity, consistency, isolation and durability), and it allows the schema of data differ from record to record to vary degrees[2]. Here are some advantages of NoSQL database: 1) data can be inserted into database without defining a rigid database schema. 2) data is automatically spread onto several servers without the help of other applications. 3) NoSQL cache data in system memory to increase performance. There are three types of popular NoSQL databases: key-value stores, column-oriented database, and document-based stores.

The comparison of SQL and NoSQL database is a hot buzz in the air for a pretty long time. In the paper A performance comparison of SQL and NoSQL databases, Yishan Li and Sathiamoorthy Manoharan found that while NoSQL databases are generally good at storing key-value data, not all NoSQL databases perform better than SQL database. They also observed that in NoSQL database, different types of operation will lead to various performance [3]. In the paper RDBMS vs NoSQL: Performance and Scaling Comparison, Christoforos Hadjigeorgiou found that MongoDB can perform much better for complicated queries at the cost of data duplication. Another conclusion the author proposed is that MySQL performs best at deletion whereas MongoDB excels at inserting documents [4].

##Goals The focus of our report is to compare the single thread and multiple threads performance, and join table performance of MySQL and MongoDB. We compare read, find, and delete operations, with different amount of data stored in database. The rest of the paper is organized as follows. In Section Differences between SQL and NoSQL we will introduce the major differences between the two databases. In Section Testing System Design the structure of our project will be introduced, and in Section Test methods two major approaches used to test the two databases in details. At last, our test result will be discussed and analyzed.

##Differences between SQL and NoSQL Several major differences between MySQL and MongoDB are presented as following.

###Memory Usage While MySQL has quite reasonable memory usage mechanism, MongoDB is very good at running out of memory. When reading, the data in memory perform as cache, and when writing, memory can turn random writing operation into order writing operation, both of which lead to a high promotion of performance. However, the mechanism of MongoDB limits the control of memory usage, which would lead to a high memory occupation. When there is still enough memory, MongoDB performs satisfying reading and writing. But its performance reduces severely when memory is almost running out.

###Scalability As we know, MySQL is a kind of relational database, which requires that the data in its server has to fit into tables, and it can be difficult to design structure if data doesn’t fit into tables. The characteristic of MySQL result in its poor scalability, so that it can be inconvenient to use MySQL when data is complex and diverse. However, as a kind of NoSQL database, MongoDB is quite flexible and has an acceptable scalability. Because MongoDB doesn’t require fixed tables, it can store variety of data as requirement of users [3]. MySQL databases are vertically scalable, it is scaled by increasing the horsepower of the hardware, whereas the NoSQL databases are horizontally scalable, and it is scaled by increasing the databases servers in the pool of resources to reduce the load.

###Complex query performance MongoDB could handle more complicated queries faster, mainly because of its simpler schema. However, the awesome complex query performance of MongoDB is at the cost of data duplication, which means that it may contain large amounts of data duplicates. Moreover, because of the use of subdocuments, MongoDB shows advantage over MySQL when combine complex queries with some subqueries. If complex queries are typical in an application, then MongoDB is a good choice while taking in account the cost in storage and memory.

###Single thread VS. Multiple threads Another important aspect of difference between MySQL and MongoDB is their performance in single thread and multiple threads. When the same types of queries are performed, results of 1, 2 or more threads can be measured in scale of time. The performance is highly dependent on different queries used in tests, and might not be monotonic with numbers of threads, due to results in RDBMS vs NoSQL: Performance and Scaling Comparison. Also, beyond 4 threads both databases’ performance show declining and no apparent advantage of one over the other[4].

###Data Association In a SQL database, when creating a database, common sense dictates that we use separate tables for different types of entities. Some examples are: customers, orders, items, messages etc... But we also need to have relationships between these tables. For instance, customers make orders, and orders contain items. These relationships need to be represented in the database. Also, when fetching data with SQL, we need to use certain types of JOIN queries to get what we need. In the MongoDB, a data model uses embedded documents to describe relationships between connected data. Decisions that affect how you model data can affect application performance and database capacity.

###Security In discussing security, it is necessary to consider fully protecting the entire server host against all types of applicable attacks: eavesdropping, altering, playback, and denial of service. While MySQL performs quite safely, MongoDB still shows some doubts around security issues. By default MongoDB installs with no password credentials, which leads to an authentication weakness. There are also authorization weaknesses in MongoDB, for the reason that any created user has read-only access to the whole database. Besides, there are also admin authorization weakness, multiple interfaces weakness, and some other security issues shown in MongoDB.

###Other differences MySQL supports for atomic transactions. The ability to contain multiple operations within a transaction and rollback the whole thing as if it were a single operation. MongoDB does not support transactions, but single operations are atomic. MySQL uses specific language for data manipulation, e.g. Select, Insert, and Update statements, while NoSQL finish tasks through object-oriented APIs[5]. As for consistency, MySQL can be configured for strong consistency, while NoSQL depends on different products[6].

             |     MySQL     |     MongoDB
------------ | ------------- | -------------
Memory Usage |	reasonable memory usage | Need enough memory to keep efficient
Complex query performance |	unsatisfying complex query performance | handle more complicated queries faster
Single thread VS. Multiple thread |   vary with different operations    |   no apparent advantage over 4 threads
Data association  |  support dynamic SQL query like join tables   |  support embedded collections
Security | give a quite safe performance | still have some security issues
Supports transactions | supports for atomic transactions | does not support transactions but single operations are atomic
Data manipulation | Specific language using Select, Insert, and Update | Through object-oriented APIs
Consistency | Can be configured for strong consistency | Depends on products

##Testing System Design We designed the DAO layer for two databases MySQL and MongoDB. Based on DAO layer, the test environment is set up using spring test and Junit. In this section, Spring, Spring Data Mongo and Hibernate will be introduced.

###Spring Spring is a popular application framework written in Java and is used by many developers to create quality applications. This framework consists of many parts which provide different services, and it helps developers to pay attention to the business logic in a proper manner[8]. The technology that Spring is most identified with is the Dependency Injection (DI) flavor of Inversion of Control. Dependency Injection (or sometime called wiring) helps in gluing different classes together and same time keeping them independent. Testing an application written with Spring is simple because environment-dependent code is moved into this framework. Furthermore, by using JavaBean-style POJOs, it becomes easier to use dependency injection for injecting test data[9].

###Spring Data MongoDB The Spring Data MongoDB project provides integration with the MongoDB document database. Key functional areas of Spring Data MongoDB are a POJO centric model for interacting with a MongoDB DBCollection and easily writing a Repository style data access layer.

The class MongoTemplate is the central class of the Spring’s MongoDB support providing a rich feature set to interact with the database. The template offers convenience operations to create, update, delete and query for MongoDB documents and provides a mapping between your domain objects and MongoDB documents.

###Hibernate Another tool we use in our project is hibernate. Hibernate is an object – relational mapping and persistence framework for Java that allows developers to map plain old Java objects to relational database tables. Hibernate mainly aims at relieving the developer from the common data persistence related tasks[10]. With the help of hibernate, developers can get maximum effects of data query and retrieval facilities, because hibernate maps the objects in Java with tables in database in an efficient manner. In a word, hibernate provides improved productivity, performance, maintainability, and portability[11].

A Session in Hibernate is used to get a physical connection with a database. The Session object is lightweight and designed to be instantiated each time an interaction is needed with the database. Persistent objects are saved and retrieved through a Session object.The session objects should not be kept open for a long time because they are not usually thread safe and they should be created and destroyed them as needed.

##Testing Methods In previous section, several differences between MySQL and MongoDB are listed. Among them we decide to choose multiple threads performance and join performance to test the two database. In this section, our test methods will be discussed in details.

###Single thread and multiple threads test In single thread test, we use three basic queries of the two database: add, find, and delete. When adding, we insert random data into two databases. As for finding, at first we insert a certain amount of data into database, and then execute find operation to find data in the database. And when deleting, we also insert data first, then delete operation is operated to delete specific data. The performance of all the three operations are measured by time.

For the add operation, we suppose that the shorter time one database will use to finish all the adding tasks, the better the database performs. We set the amount of data as independent variable, and the time needed as dependent variable. When the number of data needed to be inserted increase, we choose the database which needs less time as the winner.

For the find operation, the prerequisite is that there have already existed data in database. Several find operations will be performed, and the time needed will be measured. We divide the measurement into two aspects. First, we keep the number of data in database stable, and increase the number of data needed to be found. This approach could help us know which database is more suitable for heavy find load, when the number of data remains unchanged. Another method is that we keep the number of data need to be found stable, and change the number of data which has already existed in database. In this way we could find out whether the database is more suitable for finding targets in a large number of data.

We have a similar approach for measuring the efficiency of delete operation. First we keep the number of data in database invariable, and change the number of records need to be deleted, then check the reaction time of each database. The other approach is that we keep the number of records needs to be deleted stable, and increase the number of data in database, so that we could know which database performs better when data increases.

After finishing single thread test, we now start to work on multiple threads test. We decide to test database performance in 2, 4, and 8 threads. The approaches we used to test multiple threads are the same as those used in single thread test: add, find, and delete. The difference between single thread test and multiple threads test is that, we will perform the same operations in 2, 4, and 8 threads. The test result will show if multiple threads can cause any improvement or negative effects, and if one database could perform worse or better as the number of threads increases.

##Testing Result

In order to compare single thread performance of SQL and NoSQL databases, we choose MySQL(SQL) and MongoDB(NoSQL) to test insertion, findbyId and deletion performance by comparing their time cost. We have used different numbers of records and strategies to compare the performance between SQL and NoSQL.

####Single Thread insertion Performance Comparison

To test the insertion performance of MySQL and MongoDB, we had used a method of adding different size of records into database and compared their time cost. The number of records changes from 1000 to 200000, and we divided them into 9 groups and test their executing time by adding the same number of records into two databases at each time.

######Data

#Records	1000	5000	10000	20000	40000	80000	140000	160000	200000
MongoDB	1411	2263	2641	2668	5352	8600	13137	17449	21908
MySQL	703	2742	3089	5425	10452	15705	23175	29037	37783

We have tested 9 groups of records, and the number of records is from 1000 to 200000 (See Table). The data in the first row stands for different size of records, and the data in the second and third rows are time cost of adding different number of records into MongoDB and MySQL.

######Figure

######Analysis

According to above diagram, we can see MongoDB's line is under MySQL's line. It means MongoDB is faster than MySQL in insertion the same size of records into databases. Based on above test results, we conclude that MongoDB's adding performance is better than MySQL's adding performance.

####Single Thread Find Performance Comparison I

We have adopted two methods to test the find performance of MySQL and MongoDB. One method is to test the time cost in finding varied number of records from databases, in this case, the the number of records in the databases are fixed (200000). The other method is to test the time cost in finding fixed records (5000) from databases, the number of records in databases increases from 5000 to 200000.

######Data

#Records	1000	5000	10000	20000	40000	80000	140000	160000	200000
MongoDB	1369	1049	2827	3647	5775	10865	15659	18071	28114
MySQL	15	74	136	247	320	413	449	505	561

We have tested 9 groups of data, and the numbers of records change from 1000 to 200000 (See Table). The data in the first row stands for different records' size, the second and third rows are time cost for two databases. Each column between the second and third rows stands for time cost of finding the same number of records from MongoDB and MySQL.

######Figure

######Analysis

According to above diagram, the line of MongoDB is over the line of MySQL. It means that MySQL is faster than MongoDB in finding varied number of records in databases which have fixed number of records. We also see that the line of MySQL grows very slow in finding the increasing number of records from fixed number of records in database. We can get that the the size of records in database has little affection on MySQL's finding performance.

####Single Thread Find Performance Comparison II

######Data

#Records	5000	10000	20000	40000	80000	120000	160000	200000
MongoDB	1979	2310	1774	1921	2166	1944	2525	1889
MySQL	138	110	72	69	76	74	74	75

We have tested 8 groups of data. The data in the first row stands for different records' size, the second and third rows are time cost for two databases. Each column between the second and third rows stands for time cost of finding 5000 records from databases with increasing number of records from 5000 to 200000.

######Figure

######Analysis

According to above diagram, the line of MongoDB is over the line of MySQL. It means that MySQL is faster than MongoDB in finding 5000 records from databases with increasing number of records. We also see that the line of MySQL is smooth and level when they find a fixed number of records from increasing number of records in database, which means the size of records in database has limited affection on MySQL's findbyID performance.

####Single Thread Delete Performance Comparison I

In order to test the delete performance of MySQL and MongoDB, we have adopted two strategies to test their time cost. One way is to randomly delete fixed number of records from varied data size in databases, the other way is to randomly delete varied number of records from fixed records from databases.

######Data

#Records	1000	5000	10000	20000	40000	80000	140000	160000	200000
MongoDB	192	697	1167	1688	3221	4535	6310	9147	12319
MySQL	847	1947	2449	4478	8054	11940	15842	19022	21822

We have tested 9 groups of data. The data in the first row stands for different records' size of finding from databases, and each column between the second and third rows stands for time cost when finding the same number of records from MongoDB and MySQL.

######Figure

######Analysis

According to the above diagram, the line of MongoDB is under the line of MySQL. It means that MongoDB is faster than MySQL in deletion varied number of records from databases which have fixed number of records(200000).

####Single Thread Delete Performance Comparison II

######Data

#Records	5000	10000	20000	40000	80000	140000	160000	200000
MongoDB	1035	673	624	641	568	638	621	587
MySQL	1073	1144	1391	1570	1493	1800	1652	1947

We have tested 8 groups of data. The data in the first row stands for different records size from 5000 to 20000 in the databases, and each column between the second and third rows stands for the time cost in finding 5000 records from databases with varied size of records.

######Figure

######Analysis

According to above diagram, the blue line of MongoDB is under the the red line of MySQL. It means MongoDB has a better performance than MySQL in deletion fixed number of records (5000) from databases with varied number of records.

###More in NoSQL DB (Multi-Thread Test)

In order to test the multi-thread performance of NoSQL databases, we have picked the Mongo Database and tested its insert, findbyId, delete performance by comparing its time cost in different threads. We have used single thread, dual threads, four threads and eight threads to execute query so that we could see whether using multiple threads is an effective way to save time in Mongo like NOSQL databases.

####Multi-Thread Add Performance

To test the insert performance, we have utilized the control variates method by using different number of threads to add the same number of records to the database. And after that we increase the records inserted to the database and check the time for inserting to see if there is any difference using different number of threads.

#####Data

#Threads/Data Inserted	1000	10000	20000	40000	80000	100000	140000	160000	200000
1 Thread	1228	5493	7793	10214	18614	21057	24557	23865	33061
2 Threads	1175	4707	6859	8906	15489	15341	18995	23215	28381
4 Threads	1188	5346	8120	12122	17250	18727	22316	26486	29256
8 Threads	1138	6076	10204	13968	18527	22397	24609	29854	32018

We have tested 9 groups of data, each column of the first row means the number of data that have been inserted into the database, from 1000 to 200000. And the corresponding column on the second to the fifth row is the number of time in millisecond that is spent on this insertion. For two or more threads, the time given is the last-finished time the threads take to perform the insertion.

For example, if the insert 1000 records of data using 4 threads, each of the threads will only insert 250 items of data, each taking 1020 ms, 1100 ms, 869 ms, 1188 ms. Therefore, the total amount of time showed on the table would the max of the four time which is 1188 ms. The insertion can not be defined as finished until the last thread finishes its job.

#####Figure #####Analysis

From the figure, we can easily see that the number of threads doesn't have much improvement on the time used to do the insertion. The time line travels across each other at times and increase linearly with the upsuring inserted records.

The reason why multi-thread has no effect on the insertion performance is probably because of the lock system of the database. When one thread has entered the critical section, other thread has to wait until the ongoing thread finish. In that case, the multiple threads just divide the task separetely and use almost the same amount of time as the single thread does. Therefore, for insertion, there is no point to use multi-thread to visit the database as multiple threads may consume more resources than single thread.

####Multi-Thread Find Performance

To test the find performance, we have implemented two test methods. Both of the methods have utilized the control variates method. One is to find certain number of records in the database while the number of records in the database is increasing. The other is to find an increasing number of records in a database with fixed number of records. Both of the methods compare the result with single thread and multi-thread. The time is counted to see if multi-thread execution has any benefit on the time factor.

Find Performance Test I

In Find Performance test I, the task is to find an increasing number of records in the database with 200000 records. The time of the query is calculated and compared with the result using two or more threads.

######Data

#Threads	1000	5000	10000	20000	40000	80000	120000	160000	200000
1 Thread	669	2556	4336	7670	10210	19119	23878	33536	40495
2 Threads	531	2205	3672	6132	7463	13545	16597	22391	24650
4 Threads	592	1863	3291	5376	8999	12298	19754	20307	26495
8 Threads	360	1479	2647	5427	7908	13954	18270	19893	26214

We have tested 9 groups of data, each column of the first row means the number of data that needs to be acquired through the query process, from 1000 to 200000. And the corresponding column on the second to the fifth row is the number of time in millisecond that is spent on this query. For two or more threads, the time given is the last-finished time of the threads take to perform the query.

######Figure

######Analysis

As you can see from the above figure, the time spend on the single thread is larger than multi-thread ones. It almost takes 50% more time than the multi-thread ones, which is 40000 ms compared to 25000 ms. From this prospective, we can infer that using multi-thread will improve the time factor on the query like execution on the Mongo like NoSQL database. However, we could also see that the time spent on query by two or more threads are almost the same, which means the performance gained through adding threads is limited. If you have already got two threads performing the query on the NoSQL databases, adding more threads won't bring you significant improvement as you may expect.

Find Performance Test II

In Find Performance test II, the task is to find 5000 records in the database while the number of records in the database is increasing, from 10000 to 200000. The time of the query is calculated and compared with the result using two or more threads.

######Data

#Threads	10000	20000	40000	80000	160000	200000
1 Thread	3024	4126	2715	2820	3626	3282
2 Threads	2577	2792	2538	2567	3544	3053
4 Threads	2054	2285	2080	2527	2849	2116
8 Threads	1788	1630	1889	1755	1916	2265

We have tested 7 groups of data, each column of the first row means the number of records existing in the database, from 10000 to 200000. And the corresponding column on the second to the fifth row is the number of time in millisecond that is spent on finding 5000 records in this database. For two or more threads, the time given is the last-finished time of the threads take to perform the query.

######Figure

######Analysis

We can infer from the above figure that the time spent on searching won't increase a lot as the amount of records existing in the database increases. The time is around 2000 ms to 4000 ms. Multiple threads do have some impact on the time cost of the query process, but it is not worthwhile. The most significant gap between single and multiple threads is 2000 ms which is 2 seconds. It won't be an issue for using another 2 seconds to find 5000 data in a database in worst-case scenario. However, if the number of records needs to found increases, maybe multi-thread way would be a more acceptable methods.

####Multi-Thread Delete Performance

To test the delete multi-thread performance of NoSQL database, we have implemented two test methods. Both of the methods have utilized the control variates method. One is to delete certain number of records in the database while the number of records in the database is increasing. The other is to delete an increasing number of records in a database with fixed number of records. Both of the methods compare the result with single thread and multi-thread. The time is counted to see if multi-thread execution has any benefit on the time factor.

Delete Performance Test I

In Delete Performance test I, the task is to delete an increasing number of records in a database with 200000 records. The time of the deletion is calculated and compared with the result using two or more threads.

######Data

#Threads	1000	5000	10000	20000	40000	80000	120000	160000	200000
1 Thread	202	819	1286	2380	5075	11026	20619	24445	31485
2 Threads	270	855	1507	2879	3934	5965	18703	25558	29463
4 Threads	398	862	1679	2398	5443	10588	21461	29038	34204
8 Threads	319	1290	1402	2304	4006	15926	21977	21817	26349

We have tested 9 groups of data, each column of the first row means the number of data that needs to be deleted through the deletion process, from 1000 to 200000. And the corresponding column on the second to the fifth row is the number of time in millisecond that is spent on this query. For two or more threads, the time given is the last-finished time of the threads take to perform the deletion.

######Figure

######Analysis

As we can infer from the figure, the time spent on the deletion increases with the number of delete operation increase, as we expected. The multiple threads don't help to shorten the time spend on the deletion. This is probably the same reason as in the insertion, the lock in the database prohibit multiple threads entering the critical section of the deletion process. Therefore, only one thread would be able to perform normally and others are set to wait until it finishes.

Delete Performance Test II

In Delete Performance test II, the task is to delete an 5000 records in the database while the number of records in the database is increasing, from 10000 to 200000. The time of the deletion is calculated and compared with the result using two or more threads.

######Data

#Threads	10000	20000	40000	80000	120000	160000	200000
1 Thread	1418	1890	1481	1411	1986	1026	936
2 Threads	1238	1415	1024	1706	1595	934	1055
4 Threads	1412	1319	930	1371	1404	919	1053
8 Threads	1280	1487	1053	1190	996	1233	1186

We have tested 7 groups of data, each column of the first row means the number of records existing in the database, from 10000 to 200000. And the corresponding column on the second to the fifth row is the number of time in millisecond that is spent on deleting 5000 records in this database. For two or more threads, the time given is the last-finished time of the threads take to perform the query.

######Figure

######Analysis

As we can infer from the above figure, the time in performing this task is fairly small, only takes about 1000 ms to 2000 ms. And we cannot see any performance gain by using multi-thread methods. I think the reason is the same as mentioned in the former test methods that the lock in the database prohibits the multiple threads from enterring into its critical section.

###Data Association

###Relation Mapping

Relational design in MySQL

one to many	many to many

- Embedded document in MongoDB In MongoDB, we would apply following model structure to represent one to many relation.


{
	"_id":
	"name":
	"password":
	"requestList":{
		{ 
		   "_id":
		   "comment":
		   "house":{
		   	  "_id":
		   	  "name":
		   	  "type":
		   }
		},
		{ 
		   "_id":
		   "comment":
		   "house":{
		   	  "_id":
		   	  "name":
		   	  "type":
		   }
		}
	}
}

Search in mulitple tables in MySQL

In MySQL, we must define separate tables to represent different models and each column should be the smallest unit. To associate the relationship between tables, foreign key or join table are applied under the schema. In this scenario, searching in MySQL involves several tables using complicated query like join.

Data

#Records (request*100)	100	1000	10000
user+request	73	597	4937
reqeust+house_request+house	456	7473	88244

The records is the size of "house" and "user" table, each house and user record associated with 100 requests. To search a user's request list, using the sql "select * from request where user_id=?". To map the relation between request and house, MySQL supports join table and the sql would be like this "select * from request inner join house_request on request.id=house_request.request_id inner join house on house.id=house_request.house_id where house.id=?"

Analysis

As we can see, searching requests responding to house takes 10+ times than searching user's requests. Notice that when records of user and house is 10000, the request table size is 100*10000. Querying user's requests only search in the "request" table while house's requests would take 2 join operations which consumes large time. The result of join even generates larger size of result set. Query in that result set pretends to be extremely slow. Generally, query across multiple tables using sql operations would take much time than single table. The situation get worse when these table scaling up.

Search in embedded document in MongoDB

In MongoDB, each collection is stored as document. Each coloumn inside a collection also could be a collection. So we could use nested collections to store the relationship inside the document. A huge advantage over SQL when searching in MongoDB is only inside one document without joining.

Data

#Records	100	1000	10000	100000
user	82	56	95	103
user.requestList	42	102	1407	22778
user.requestList.house	45	112	1891	72789

In MongoDB, each user record contains a size of 100 requestList collection. We search on 3 level in user collection. First we search a specific user on name using query "{'name':?}". Then we compared with query in embedded collection using embedded query "{'requestList.comment':?}". Further query on embedded collection inside requestList.house by using query "{'requestList.house.name':?}".

Analysis

There is no extra effort to find user's requests when we find a specific user and varies a little when scaling up. Any search takes in single collection. But with embedded query, searching would happen in each embedded collections inside the document.
Using the embedded collection, it is hard to find a set of requests responding to a specific house. We can insert such a embedded request collection inside the house collection which causes duplicate records on request. In this schema, the document with embedded collections also takes more space in disk and may causes split because the maximum document size in mongoDB is 16MB. To avoid this, We would expect a small size of embedded collection.

##Discussion

We have spotted that the Multi-thread bottle-neck in the Find test of MongoDB. However, we don't think it should be as NoSQL database definitely prevails SQL database when it deals with the massive distributed thing, eg. big data, big numbers of users, big numbers of computers, big supply chains, big science, and so on. Therefore, we are thinking if our dataset are too small compared to the BIG thing, which is the reason why adding more threads are not giving more performance improvement. It should have an overwhelming improvement when its scale get expanded by using more infrastructure such as a cluster of NoSQL server.

##Conclusion

Single-thread Performance

According to single-thread performance test, we can see that MongoDB has a better performance in Insertion and deletion, and the number of records in databases affects the deletion performance of MySQL, but has little affection on the deletion performance of MongoDB. That's because MySQL makes use of transaction and event mechanism in order to keep data safe. It will decrease the insertion and deletion performance. However, MongoDB is document-oriented database, which can store data as a JSON document. There is no transaction and transaction-safe mechanism in it. Therefore, it usually has a bettern performance than MySQL in insertion and deletion. MySQL has a better performance in findbyID, and the number of records in databases can't clearly affect the findbyID performance of MySQL and MongoDB. The main reason is that MySQL's index is based on B-Tree, this advanced data structure can improve the searching performance for MySQL.

####Multi-thread Performance

As we can see from the above Section, implementing multiple threads in insertion and deletion in Mongo like NoSQL database on a single instance has little improvement on time factors. However, this method have signficant influence on query performance, even on a single instance, the performance could be improved by at least 50%. More improvement could be achieved by utilizing expanded scale of cluster servers.

Data Association

Association in MySQL always require extra column or table to store key of related table. It is low efficient when query happens across multiple tables using complicated SQL. We have to avoid unnecessary join with different large size tables. MongoDB achieved this by using embedded collection. The association data is stored inside the document and much faster to locate without query for other documents. It may sacrifice extra space to achieve high efficiency in query.
In a scenario that requires much relational design, it is needed in MySQL to strictly define the structure of tables. With dynamic SQL query, we could easily associated data and map the relation. And MongoDB is hard to express the relations between different collections without SQL. But in a less related and huge scale scenario, query is expected inside single table which SQL lost its power. MongoDB is more suit using embedded structure and flexible to future extension in data model.

Demo Link

https://www.youtube.com/watch?v=ILf0m_i8IV4

##Future Work

For the restriction of time and equipment, some work will be included in our future tasks.

More powerful hardware will be implemented, so that more large data can be used in tests, as a result of which more precise conclusion can be reached.
Besides the JUnit Test, the front-end test will be included in our research, for the reason that it could bring more results on Multi-thread test of SQL database.
The performance of other supported NoSQL database such as Cassandra will be involved.
Hardware Extension: MySQL consumes large volume of hardware resource, it is expected a improvement in performance with hardware improvement. MongoDB also support distributed deployment which database run in multiple nodes. The performance is expected different with single node.

##References [1]Differences between SQL and NoSQL http://www.thewindowsclub.com/difference-sql-nosql-comparision

[2]SQL VS. NoSQL http://www.thegeekstuff.com/2014/01/sql-vs-nosql-db/

[3] Ramon Lawrence, “Integration and Virtualization of Relational SQL and NoSQL Systems including MySQL and MongoDB”, in International Conference on Computational Science and Computational Intelligence, 2014.

[4] Christoforos Hadjigeorgiou, “RDBMS vs NoSQL: Performance and Scaling Comparison”, in MSc in High Performance Computing, 2013.

[5] Alexandru Boicea, Florin Radulescu, and Laura Ioana Agapin, “MongoDB vs Oracle - database comparison”, in 2012 Third International Conference on Emerging Intelligent Data and Web Technologies.

[6] Yishan Li, Sathiamoorthy Manoharan, “A performance comparison of SQL and NoSQL databases”, in IEEE 2013.

[7]Spring framework reference: http://docs.spring.io/spring/docs/current/spring-framework-reference/html/mvc.html

[8]Spring Framework http://www.tutorialspoint.com/spring/spring_web_mvc_framework.htm

[9]Introduction to Spring MVC http://crunchify.com/spring-mvc-introduction-to-spring-3-mvc-framework/

[10]Introduction to Hibernate http://www.java2blog.com/2013/01/introduction-to-hibernate-framework.html

[11]Hibernate Tutorial https://docs.jboss.org/hibernate/orm/3.2/reference/en/html/tutorial.html

Name		Name	Last commit message	Last commit date
Latest commit History 182 Commits
.settings		.settings
img		img
resources		resources
src		src
.classpath		.classpath
.gitignore		.gitignore
.project		.project
ChangeNum.java		ChangeNum.java
README.md		README.md
input.txt		input.txt
output.txt		output.txt
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SQLvsNOSQL

Find Performance Test I

Find Performance Test II

Delete Performance Test I

Delete Performance Test II

Search in mulitple tables in MySQL

Data

Analysis

Search in embedded document in MongoDB

Data

Analysis

Single-thread Performance

Data Association

Demo Link

About

Releases

Packages

Contributors 5

Languages

CSC510/SQLvsNOSQL

Folders and files

Latest commit

History

Repository files navigation

SQLvsNOSQL

Find Performance Test I

Find Performance Test II

Delete Performance Test I

Delete Performance Test II

Search in mulitple tables in MySQL

Data

Analysis

Search in embedded document in MongoDB

Data

Analysis

Single-thread Performance

Data Association

Demo Link

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages