Skip to content

CSC510/SQLvsNOSQL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SQLvsNOSQL

Comparison between MySQL and MongoDB

##Background The basic concept of SQL database is Relational database. The definition of relational database is that it strictly uses relations to store data. The way that a relational database matches data is that it uses common characteristics found in the dataset. In a table of a relational database, data is divided into sets of rows and columns. One example of relational database is Microsoft SQL server. It is a set of tables containing data fitted into predefined categories. Each table contains one or more data categories in columns, and each row contains a unique instance of data for the categories defined by the columns. Without knowing the structure of the database table, users can still have access data from the database. But SQL is quite expensive and difficult to scale, because the scaling of relational database has to be distributed on to multiple servers, and handling tables across different servers is a chaos [1]. Also, data in SQL server has to fit into tables, and it will be very complex to design database structure if data doesn’t fit into tables.

While SQL database is widely used, in the past few years, a great variety of alternative databases emerged because of the doubt of “one size fits all” thinking concerning data stores. These new database is commonly subsumed under the term of NoSQL. The basic characteristic of NoSQL is that, it would not require fixed table. Moreover, NoSQL database usually avoid join operations, and typically scale horizontally. Another important characteristic of NoSQL is that it trades off “ACID” (atomicity, consistency, isolation and durability), and it allows the schema of data differ from record to record to vary degrees[2]. Here are some advantages of NoSQL database: 1) data can be inserted into database without defining a rigid database schema. 2) data is automatically spread onto several servers without the help of other applications. 3) NoSQL cache data in system memory to increase performance. There are three types of popular NoSQL databases: key-value stores, column-oriented database, and document-based stores.

The comparison of SQL and NoSQL database is a hot buzz in the air for a pretty long time. In the paper A performance comparison of SQL and NoSQL databases, Yishan Li and Sathiamoorthy Manoharan found that while NoSQL databases are generally good at storing key-value data, not all NoSQL databases perform better than SQL database. They also observed that in NoSQL database, different types of operation will lead to various performance [3]. In the paper RDBMS vs NoSQL: Performance and Scaling Comparison, Christoforos Hadjigeorgiou found that MongoDB can perform much better for complicated queries at the cost of data duplication. Another conclusion the author proposed is that MySQL performs best at deletion whereas MongoDB excels at inserting documents [4].

##Goals The focus of our report is to compare the single thread and multiple threads performance, and join table performance of MySQL and MongoDB. We compare read, find, and delete operations, with different amount of data stored in database. The rest of the paper is organized as follows. In Section Differences between SQL and NoSQL we will introduce the major differences between the two databases. In Section Testing System Design the structure of our project will be introduced, and in Section Test methods two major approaches used to test the two databases in details. At last, our test result will be discussed and analyzed.

##Differences between SQL and NoSQL Several major differences between MySQL and MongoDB are presented as following.

###Memory Usage While MySQL has quite reasonable memory usage mechanism, MongoDB is very good at running out of memory. When reading, the data in memory perform as cache, and when writing, memory can turn random writing operation into order writing operation, both of which lead to a high promotion of performance. However, the mechanism of MongoDB limits the control of memory usage, which would lead to a high memory occupation. When there is still enough memory, MongoDB performs satisfying reading and writing. But its performance reduces severely when memory is almost running out.

###Scalability As we know, MySQL is a kind of relational database, which requires that the data in its server has to fit into tables, and it can be difficult to design structure if data doesn’t fit into tables. The characteristic of MySQL result in its poor scalability, so that it can be inconvenient to use MySQL when data is complex and diverse. However, as a kind of NoSQL database, MongoDB is quite flexible and has an acceptable scalability. Because MongoDB doesn’t require fixed tables, it can store variety of data as requirement of users [3]. MySQL databases are vertically scalable, it is scaled by increasing the horsepower of the hardware, whereas the NoSQL databases are horizontally scalable, and it is scaled by increasing the databases servers in the pool of resources to reduce the load.

###Complex query performance MongoDB could handle more complicated queries faster, mainly because of its simpler schema. However, the awesome complex query performance of MongoDB is at the cost of data duplication, which means that it may contain large amounts of data duplicates. Moreover, because of the use of subdocuments, MongoDB shows advantage over MySQL when combine complex queries with some subqueries. If complex queries are typical in an application, then MongoDB is a good choice while taking in account the cost in storage and memory.

###Single thread VS. Multiple threads Another important aspect of difference between MySQL and MongoDB is their performance in single thread and multiple threads. When the same types of queries are performed, results of 1, 2 or more threads can be measured in scale of time. The performance is highly dependent on different queries used in tests, and might not be monotonic with numbers of threads, due to results in RDBMS vs NoSQL: Performance and Scaling Comparison. Also, beyond 4 threads both databases’ performance show declining and no apparent advantage of one over the other[4].

###Data Association In a SQL database, when creating a database, common sense dictates that we use separate tables for different types of entities. Some examples are: customers, orders, items, messages etc... But we also need to have relationships between these tables. For instance, customers make orders, and orders contain items. These relationships need to be represented in the database. Also, when fetching data with SQL, we need to use certain types of JOIN queries to get what we need. In the MongoDB, a data model uses embedded documents to describe relationships between connected data. Decisions that affect how you model data can affect application performance and database capacity.

###Security In discussing security, it is necessary to consider fully protecting the entire server host against all types of applicable attacks: eavesdropping, altering, playback, and denial of service. While MySQL performs quite safely, MongoDB still shows some doubts around security issues. By default MongoDB installs with no password credentials, which leads to an authentication weakness. There are also authorization weaknesses in MongoDB, for the reason that any created user has read-only access to the whole database. Besides, there are also admin authorization weakness, multiple interfaces weakness, and some other security issues shown in MongoDB.

###Other differences MySQL supports for atomic transactions. The ability to contain multiple operations within a transaction and rollback the whole thing as if it were a single operation. MongoDB does not support transactions, but single operations are atomic. MySQL uses specific language for data manipulation, e.g. Select, Insert, and Update statements, while NoSQL finish tasks through object-oriented APIs[5]. As for consistency, MySQL can be configured for strong consistency, while NoSQL depends on different products[6].

             |     MySQL     |     MongoDB
------------ | ------------- | -------------
Memory Usage |	reasonable memory usage | Need enough memory to keep efficient
Complex query performance |	unsatisfying complex query performance | handle more complicated queries faster
Single thread VS. Multiple thread |   vary with different operations    |   no apparent advantage over 4 threads
Data association  |  support dynamic SQL query like join tables   |  support embedded collections
Security | give a quite safe performance | still have some security issues
Supports transactions | supports for atomic transactions | does not support transactions but single operations are atomic
Data manipulation | Specific language using Select, Insert, and Update | Through object-oriented APIs
Consistency | Can be configured for strong consistency | Depends on products

##Testing System Design We designed the DAO layer for two databases MySQL and MongoDB. Based on DAO layer, the test environment is set up using spring test and Junit. In this section, Spring, Spring Data Mongo and Hibernate will be introduced.

###Spring Spring is a popular application framework written in Java and is used by many developers to create quality applications. This framework consists of many parts which provide different services, and it helps developers to pay attention to the business logic in a proper manner[8]. The technology that Spring is most identified with is the Dependency Injection (DI) flavor of Inversion of Control. Dependency Injection (or sometime called wiring) helps in gluing different classes together and same time keeping them independent. Testing an application written with Spring is simple because environment-dependent code is moved into this framework. Furthermore, by using JavaBean-style POJOs, it becomes easier to use dependency injection for injecting test data[9].

###Spring Data MongoDB The Spring Data MongoDB project provides integration with the MongoDB document database. Key functional areas of Spring Data MongoDB are a POJO centric model for interacting with a MongoDB DBCollection and easily writing a Repository style data access layer.

The class MongoTemplate is the central class of the Spring’s MongoDB support providing a rich feature set to interact with the database. The template offers convenience operations to create, update, delete and query for MongoDB documents and provides a mapping between your domain objects and MongoDB documents.

###Hibernate Another tool we use in our project is hibernate. Hibernate is an object – relational mapping and persistence framework for Java that allows developers to map plain old Java objects to relational database tables. Hibernate mainly aims at relieving the developer from the common data persistence related tasks[10]. With the help of hibernate, developers can get maximum effects of data query and retrieval facilities, because hibernate maps the objects in Java with tables in database in an efficient manner. In a word, hibernate provides improved productivity, performance, maintainability, and portability[11].

A Session in Hibernate is used to get a physical connection with a database. The Session object is lightweight and designed to be instantiated each time an interaction is needed with the database. Persistent objects are saved and retrieved through a Session object.The session objects should not be kept open for a long time because they are not usually thread safe and they should be created and destroyed them as needed.

##Testing Methods In previous section, several differences between MySQL and MongoDB are listed. Among them we decide to choose multiple threads performance and join performance to test the two database. In this section, our test methods will be discussed in details.

###Single thread and multiple threads test In single thread test, we use three basic queries of the two database: add, find, and delete. When adding, we insert random data into two databases. As for finding, at first we insert a certain amount of data into database, and then execute find operation to find data in the database. And when deleting, we also insert data first, then delete operation is operated to delete specific data. The performance of all the three operations are measured by time.

For the add operation, we suppose that the shorter time one database will use to finish all the adding tasks, the better the database performs. We set the amount of data as independent variable, and the time needed as dependent variable. When the number of data needed to be inserted increase, we choose the database which needs less time as the winner.

For the find operation, the prerequisite is that there have already existed data in database. Several find operations will be performed, and the time needed will be measured. We divide the measurement into two aspects. First, we keep the number of data in database stable, and increase the number of data needed to be found. This approach could help us know which database is more suitable for heavy find load, when the number of data remains unchanged. Another method is that we keep the number of data need to be found stable, and change the number of data which has already existed in database. In this way we could find out whether the database is more suitable for finding targets in a large number of data.

We have a similar approach for measuring the efficiency of delete operation. First we keep the number of data in database invariable, and change the number of records need to be deleted, then check the reaction time of each database. The other approach is that we keep the number of records needs to be deleted stable, and increase the number of data in database, so that we could know which database performs better when data increases.

After finishing single thread test, we now start to work on multiple threads test. We decide to test database performance in 2, 4, and 8 threads. The approaches we used to test multiple threads are the same as those used in single thread test: add, find, and delete. The difference between single thread test and multiple threads test is that, we will perform the same operations in 2, 4, and 8 threads. The test result will show if multiple threads can cause any improvement or negative effects, and if one database could perform worse or better as the number of threads increases.

##Testing Result

In order to compare single thread performance of SQL and NoSQL databases, we choose MySQL(SQL) and MongoDB(NoSQL) to test insertion, findbyId and deletion performance by comparing their time cost. We have used different numbers of records and strategies to compare the performance between SQL and NoSQL.

####Single Thread insertion Performance Comparison

To test the insertion performance of MySQL and MongoDB, we had used a method of adding different size of records into database and compared their time cost. The number of records changes from 1000 to 200000, and we divided them into 9 groups and test their executing time by adding the same number of records into two databases at each time.

######Data

#Records 1000 5000 10000 20000 40000 80000 140000 160000 200000
MongoDB 1411 2263 2641 2668 5352 8600 13137 17449 21908
MySQL 703 2742 3089 5425 10452 15705 23175 29037 37783

We have tested 9 groups of records, and the number of records is from 1000 to 200000 (See Table). The data in the first row stands for different size of records, and the data in the second and third rows are time cost of adding different number of records into MongoDB and MySQL.

######Figure

######Analysis

According to above diagram, we can see MongoDB's line is under MySQL's line. It means MongoDB is faster than MySQL in insertion the same size of records into databases. Based on above test results, we conclude that MongoDB's adding performance is better than MySQL's adding performance.

####Single Thread Find Performance Comparison I

We have adopted two methods to test the find performance of MySQL and MongoDB. One method is to test the time cost in finding varied number of records from databases, in this case, the the number of records in the databases are fixed (200000). The other method is to test the time cost in finding fixed records (5000) from databases, the number of records in databases increases from 5000 to 200000.

######Data

#Records 1000 5000 10000 20000 40000 80000 140000 160000 200000
MongoDB 1369 1049 2827 3647 5775 10865 15659 18071 28114
MySQL 15 74 136 247 320 413 449 505 561

We have tested 9 groups of data, and the numbers of records change from 1000 to 200000 (See Table). The data in the first row stands for different records' size, the second and third rows are time cost for two databases. Each column between the second and third rows stands for time cost of finding the same number of records from MongoDB and MySQL.

######Figure

######Analysis

According to above diagram, the line of MongoDB is over the line of MySQL. It means that MySQL is faster than MongoDB in finding varied number of records in databases which have fixed number of records. We also see that the line of MySQL grows very slow in finding the increasing number of records from fixed number of records in database. We can get that the the size of records in database has little affection on MySQL's finding performance.

####Single Thread Find Performance Comparison II

######Data

#Records 5000 10000 20000 40000 80000 120000 160000 200000
MongoDB 1979 2310 1774 1921 2166 1944 2525 1889
MySQL 138 110 72 69 76 74 74 75

We have tested 8 groups of data. The data in the first row stands for different records' size, the second and third rows are time cost for two databases. Each column between the second and third rows stands for time cost of finding 5000 records from databases with increasing number of records from 5000 to 200000.

######Figure

######Analysis

According to above diagram, the line of MongoDB is over the line of MySQL. It means that MySQL is faster than MongoDB in finding 5000 records from databases with increasing number of records. We also see that the line of MySQL is smooth and level when they find a fixed number of records from increasing number of records in database, which means the size of records in database has limited affection on MySQL's findbyID performance.

####Single Thread Delete Performance Comparison I

In order to test the delete performance of MySQL and MongoDB, we have adopted two strategies to test their time cost. One way is to randomly delete fixed number of records from varied data size in databases, the other way is to randomly delete varied number of records from fixed records from databases.

######Data

#Records 1000 5000 10000 20000 40000 80000 140000 160000 200000
MongoDB 192 697 1167 1688 3221 4535 6310 9147 12319
MySQL 847 1947 2449 4478 8054 11940 15842 19022 21822

We have tested 9 groups of data. The data in the first row stands for different records' size of finding from databases, and each column between the second and third rows stands for time cost when finding the same number of records from MongoDB and MySQL.

######Figure

######Analysis

According to the above diagram, the line of MongoDB is under the line of MySQL. It means that MongoDB is faster than MySQL in deletion varied number of records from databases which have fixed number of records(200000).

####Single Thread Delete Performance Comparison II

######Data

#Records 5000 10000 20000 40000 80000 140000 160000 200000
MongoDB 1035 673 624 641 568 638 621 587
MySQL 1073 1144 1391 1570 1493 1800 1652 1947

We have tested 8 groups of data. The data in the first row stands for different records size from 5000 to 20000 in the databases, and each column between the second and third rows stands for the time cost in finding 5000 records from databases with varied size of records.

######Figure

######Analysis

According to above diagram, the blue line of MongoDB is under the the red line of MySQL. It means MongoDB has a better performance than MySQL in deletion fixed number of records (5000) from databases with varied number of records.

###More in NoSQL DB (Multi-Thread Test)

In order to test the multi-thread performance of NoSQL databases, we have picked the Mongo Database and tested its insert, findbyId, delete performance by comparing its time cost in different threads. We have used single thread, dual threads, four threads and eight threads to execute query so that we could see whether using multiple threads is an effective way to save time in Mongo like NOSQL databases.

####Multi-Thread Add Performance

To test the insert performance, we have utilized the control variates method by using different number of threads to add the same number of records to the database. And after that we increase the records inserted to the database and check the time for inserting to see if there is any difference using different number of threads.

#####Data

#Threads/Data Inserted 1000 10000 20000 40000 80000 100000 140000 160000 200000
1 Thread 1228 5493 7793 10214 18614 21057 24557 23865 33061
2 Threads 1175 4707 6859 8906 15489 15341 18995 23215 28381
4 Threads 1188 5346 8120 12122 17250 18727 22316 26486 29256
8 Threads 1138 6076 10204 13968 18527 22397 24609 29854 32018

We have tested 9 groups of data, each column of the first row means the number of data that have been inserted into the database, from 1000 to 200000. And the corresponding column on the second to the fifth row is the number of time in millisecond that is spent on this insertion. For two or more threads, the time given is the last-finished time the threads take to perform the insertion.

For example, if the insert 1000 records of data using 4 threads, each of the threads will only insert 250 items of data, each taking 1020 ms, 1100 ms, 869 ms, 1188 ms. Therefore, the total amount of time showed on the table would the max of the four time which is 1188 ms. The insertion can not be defined as finished until the last thread finishes its job.

#####Figure #####Analysis

From the figure, we can easily see that the number of threads doesn't have much improvement on the time used to do the insertion. The time line travels across each other at times and increase linearly with the upsuring inserted records.

The reason why multi-thread has no effect on the insertion performance is probably because of the lock system of the database. When one thread has entered the critical section, other thread has to wait until the ongoing thread finish. In that case, the multiple threads just divide the task separetely and use almost the same amount of time as the single thread does. Therefore, for insertion, there is no point to use multi-thread to visit the database as multiple threads may consume more resources than single thread.

####Multi-Thread Find Performance

To test the find performance, we have implemented two test methods. Both of the methods have utilized the control variates method. One is to find certain number of records in the database while the number of records in the database is increasing. The other is to find an increasing number of records in a database with fixed number of records. Both of the methods compare the result with single thread and multi-thread. The time is counted to see if multi-thread execution has any benefit on the time factor.

Find Performance Test I

In Find Performance test I, the task is to find an increasing number of records in the database with 200000 records. The time of the query is calculated and compared with the result using two or more threads.

######Data

#Threads 1000 5000 10000 20000 40000 80000 120000 160000 200000
1 Thread 669 2556 4336 7670 10210 19119 23878 33536 40495
2 Threads 531 2205 3672 6132 7463 13545 16597 22391 24650
4 Threads 592 1863 3291 5376 8999 12298 19754 20307 26495
8 Threads 360 1479 2647 5427 7908 13954 18270 19893 26214

We have tested 9 groups of data, each column of the first row means the number of data that needs to be acquired through the query process, from 1000 to 200000. And the corresponding column on the second to the fifth row is the number of time in millisecond that is spent on this query. For two or more threads, the time given is the last-finished time of the threads take to perform the query.

######Figure

######Analysis

As you can see from the above figure, the time spend on the single thread is larger than multi-thread ones. It almost takes 50% more time than the multi-thread ones, which is 40000 ms compared to 25000 ms. From this prospective, we can infer that using multi-thread will improve the time factor on the query like execution on the Mongo like NoSQL database. However, we could also see that the time spent on query by two or more threads are almost the same, which means the performance gained through adding threads is limited. If you have already got two threads performing the query on the NoSQL databases, adding more threads won't bring you significant improvement as you may expect.

Find Performance Test II

In Find Performance test II, the task is to find 5000 records in the database while the number of records in the database is increasing, from 10000 to 200000. The time of the query is calculated and compared with the result using two or more threads.

######Data

#Threads 10000 20000 40000 80000 160000 200000
1 Thread 3024 4126 2715 2820 3626 3282
2 Threads 2577 2792 2538 2567 3544 3053
4 Threads 2054 2285 2080 2527 2849 2116
8 Threads 1788 1630 1889 1755 1916 2265

We have tested 7 groups of data, each column of the first row means the number of records existing in the database, from 10000 to 200000. And the corresponding column on the second to the fifth row is the number of time in millisecond that is spent on finding 5000 records in this database. For two or more threads, the time given is the last-finished time of the threads take to perform the query.

######Figure

######Analysis

We can infer from the above figure that the time spent on searching won't increase a lot as the amount of records existing in the database increases. The time is around 2000 ms to 4000 ms. Multiple threads do have some impact on the time cost of the query process, but it is not worthwhile. The most significant gap between single and multiple threads is 2000 ms which is 2 seconds. It won't be an issue for using another 2 seconds to find 5000 data in a database in worst-case scenario. However, if the number of records needs to found increases, maybe multi-thread way would be a more acceptable methods.

####Multi-Thread Delete Performance

To test the delete multi-thread performance of NoSQL database, we have implemented two test methods. Both of the methods have utilized the control variates method. One is to delete certain number of records in the database while the number of records in the database is increasing. The other is to delete an increasing number of records in a database with fixed number of records. Both of the methods compare the result with single thread and multi-thread. The time is counted to see if multi-thread execution has any benefit on the time factor.

Delete Performance Test I

In Delete Performance test I, the task is to delete an increasing number of records in a database with 200000 records. The time of the deletion is calculated and compared with the result using two or more threads.

######Data

#Threads 1000 5000 10000 20000 40000 80000 120000 160000 200000
1 Thread 202 819 1286 2380 5075 11026 20619 24445 31485
2 Threads 270 855 1507 2879 3934 5965 18703 25558 29463
4 Threads 398 862 1679 2398 5443 10588 21461 29038 34204
8 Threads 319 1290 1402 2304 4006 15926 21977 21817 26349

We have tested 9 groups of data, each column of the first row means the number of data that needs to be deleted through the deletion process, from 1000 to 200000. And the corresponding column on the second to the fifth row is the number of time in millisecond that is spent on this query. For two or more threads, the time given is the last-finished time of the threads take to perform the deletion.

######Figure

######Analysis

As we can infer from the figure, the time spent on the deletion increases with the number of delete operation increase, as we expected. The multiple threads don't help to shorten the time spend on the deletion. This is probably the same reason as in the insertion, the lock in the database prohibit multiple threads entering the critical section of the deletion process. Therefore, only one thread would be able to perform normally and others are set to wait until it finishes.

Delete Performance Test II

In Delete Performance test II, the task is to delete an 5000 records in the database while the number of records in the database is increasing, from 10000 to 200000. The time of the deletion is calculated and compared with the result using two or more threads.

######Data

#Threads 10000 20000 40000 80000 120000 160000 200000
1 Thread 1418 1890 1481 1411 1986 1026 936
2 Threads 1238 1415 1024 1706 1595 934 1055
4 Threads 1412 1319 930 1371 1404 919 1053
8 Threads 1280 1487 1053 1190 996 1233 1186

We have tested 7 groups of data, each column of the first row means the number of records existing in the database, from 10000 to 200000. And the corresponding column on the second to the fifth row is the number of time in millisecond that is spent on deleting 5000 records in this database. For two or more threads, the time given is the last-finished time of the threads take to perform the query.

######Figure

######Analysis

As we can infer from the above figure, the time in performing this task is fairly small, only takes about 1000 ms to 2000 ms. And we cannot see any performance gain by using multi-thread methods. I think the reason is the same as mentioned in the former test methods that the lock in the database prohibits the multiple threads from enterring into its critical section.

###Data Association

###Relation Mapping

  • Relational design in MySQL
 
one to manymany to many
- Embedded document in MongoDB In MongoDB, we would apply following model structure to represent one to many relation.

{
	"_id":
	"name":
	"password":
	"requestList":{
		{ 
		   "_id":
		   "comment":
		   "house":{
		   	  "_id":
		   	  "name":
		   	  "type":
		   }
		},
		{ 
		   "_id":
		   "comment":
		   "house":{
		   	  "_id":
		   	  "name":
		   	  "type":
		   }
		}
	}
}

Search in mulitple tables in MySQL

In MySQL, we must define separate tables to represent different models and each column should be the smallest unit. To associate the relationship between tables, foreign key or join table are applied under the schema. In this scenario, searching in MySQL involves several tables using complicated query like join.

Data
     
#Records
(request*100)
1001000 10000
user+request 73 597 4937
reqeust+house_request+house 456 7473 88244

The records is the size of "house" and "user" table, each house and user record associated with 100 requests. To search a user's request list, using the sql "select * from request where user_id=?". To map the relation between request and house, MySQL supports join table and the sql would be like this "select * from request inner join house_request on request.id=house_request.request_id inner join house on house.id=house_request.house_id where house.id=?"

Analysis

As we can see, searching requests responding to house takes 10+ times than searching user's requests. Notice that when records of user and house is 10000, the request table size is 100*10000. Querying user's requests only search in the "request" table while house's requests would take 2 join operations which consumes large time. The result of join even generates larger size of result set. Query in that result set pretends to be extremely slow. Generally, query across multiple tables using sql operations would take much time than single table. The situation get worse when these table scaling up.

Search in embedded document in MongoDB

In MongoDB, each collection is stored as document. Each coloumn inside a collection also could be a collection. So we could use nested collections to store the relationship inside the document. A huge advantage over SQL when searching in MongoDB is only inside one document without joining.

Data
         
#Records100100010000100000
user 82 56 95 103
user.requestList 42 102 1407 22778
user.requestList.house 45 112 1891 72789

In MongoDB, each user record contains a size of 100 requestList collection. We search on 3 level in user collection. First we search a specific user on name using query "{'name':?}". Then we compared with query in embedded collection using embedded query "{'requestList.comment':?}". Further query on embedded collection inside requestList.house by using query "{'requestList.house.name':?}".

Analysis

There is no extra effort to find user's requests when we find a specific user and varies a little when scaling up. Any search takes in single collection. But with embedded query, searching would happen in each embedded collections inside the document.
Using the embedded collection, it is hard to find a set of requests responding to a specific house. We can insert such a embedded request collection inside the house collection which causes duplicate records on request. In this schema, the document with embedded collections also takes more space in disk and may causes split because the maximum document size in mongoDB is 16MB. To avoid this, We would expect a small size of embedded collection.

##Discussion

  • We have spotted that the Multi-thread bottle-neck in the Find test of MongoDB. However, we don't think it should be as NoSQL database definitely prevails SQL database when it deals with the massive distributed thing, eg. big data, big numbers of users, big numbers of computers, big supply chains, big science, and so on. Therefore, we are thinking if our dataset are too small compared to the BIG thing, which is the reason why adding more threads are not giving more performance improvement. It should have an overwhelming improvement when its scale get expanded by using more infrastructure such as a cluster of NoSQL server.

##Conclusion

Single-thread Performance

According to single-thread performance test, we can see that MongoDB has a better performance in Insertion and deletion, and the number of records in databases affects the deletion performance of MySQL, but has little affection on the deletion performance of MongoDB. That's because MySQL makes use of transaction and event mechanism in order to keep data safe. It will decrease the insertion and deletion performance. However, MongoDB is document-oriented database, which can store data as a JSON document. There is no transaction and transaction-safe mechanism in it. Therefore, it usually has a bettern performance than MySQL in insertion and deletion. MySQL has a better performance in findbyID, and the number of records in databases can't clearly affect the findbyID performance of MySQL and MongoDB. The main reason is that MySQL's index is based on B-Tree, this advanced data structure can improve the searching performance for MySQL.

####Multi-thread Performance

  • As we can see from the above Section, implementing multiple threads in insertion and deletion in Mongo like NoSQL database on a single instance has little improvement on time factors. However, this method have signficant influence on query performance, even on a single instance, the performance could be improved by at least 50%. More improvement could be achieved by utilizing expanded scale of cluster servers.

Data Association

Association in MySQL always require extra column or table to store key of related table. It is low efficient when query happens across multiple tables using complicated SQL. We have to avoid unnecessary join with different large size tables. MongoDB achieved this by using embedded collection. The association data is stored inside the document and much faster to locate without query for other documents. It may sacrifice extra space to achieve high efficiency in query.
In a scenario that requires much relational design, it is needed in MySQL to strictly define the structure of tables. With dynamic SQL query, we could easily associated data and map the relation. And MongoDB is hard to express the relations between different collections without SQL. But in a less related and huge scale scenario, query is expected inside single table which SQL lost its power. MongoDB is more suit using embedded structure and flexible to future extension in data model.

Demo Link

https://www.youtube.com/watch?v=ILf0m_i8IV4

##Future Work

For the restriction of time and equipment, some work will be included in our future tasks.

  • More powerful hardware will be implemented, so that more large data can be used in tests, as a result of which more precise conclusion can be reached.
  • Besides the JUnit Test, the front-end test will be included in our research, for the reason that it could bring more results on Multi-thread test of SQL database.
  • The performance of other supported NoSQL database such as Cassandra will be involved.
  • Hardware Extension: MySQL consumes large volume of hardware resource, it is expected a improvement in performance with hardware improvement. MongoDB also support distributed deployment which database run in multiple nodes. The performance is expected different with single node.

##References [1]Differences between SQL and NoSQL http://www.thewindowsclub.com/difference-sql-nosql-comparision

[2]SQL VS. NoSQL http://www.thegeekstuff.com/2014/01/sql-vs-nosql-db/

[3] Ramon Lawrence, “Integration and Virtualization of Relational SQL and NoSQL Systems including MySQL and MongoDB”, in International Conference on Computational Science and Computational Intelligence, 2014.

[4] Christoforos Hadjigeorgiou, “RDBMS vs NoSQL: Performance and Scaling Comparison”, in MSc in High Performance Computing, 2013.

[5] Alexandru Boicea, Florin Radulescu, and Laura Ioana Agapin, “MongoDB vs Oracle - database comparison”, in 2012 Third International Conference on Emerging Intelligent Data and Web Technologies.

[6] Yishan Li, Sathiamoorthy Manoharan, “A performance comparison of SQL and NoSQL databases”, in IEEE 2013.

[7]Spring framework reference: http://docs.spring.io/spring/docs/current/spring-framework-reference/html/mvc.html

[8]Spring Framework http://www.tutorialspoint.com/spring/spring_web_mvc_framework.htm

[9]Introduction to Spring MVC http://crunchify.com/spring-mvc-introduction-to-spring-3-mvc-framework/

[10]Introduction to Hibernate http://www.java2blog.com/2013/01/introduction-to-hibernate-framework.html

[11]Hibernate Tutorial https://docs.jboss.org/hibernate/orm/3.2/reference/en/html/tutorial.html

About

Comparison between SQL and MongoDB

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages