Modern Data Science

This course (unit) was originally designed for various elite class Bachelor students and Master students in some top Asia Pacific universities, including Deakin University (SIT742), and the partner university Southwest University etc. (since 2015).
Materials in this module include resources collected from various open-source online repositories.
If you found any issue/bug for this document, please submit an issue at
Prerequisite unit 👉 :
Subsequent unit 👉 :
Pull requests are welcome:
Point of Contact 👉 : Prof. Gang Li

Prepared by 🌷 TULIP Lab

💡 Content

Designed primarily for aspiring data scientists, this course (aka unit) lays the foundational groundwork for modern methods, techniques, and applications in data science. Upon successful completion, students will be able to utilize distributed storage and computing platforms to process and analyze big data, employing contemporary techniques in data analytics.

The learning activities in this course are structured to help students develop knowledge and skills in reviewing tabular data, such as relational databases and distributed storage systems, with a focus on platforms like Apache Spark. In the realm of data analytics, students will explore various data mining and machine learning methods. Additionally, students will have the opportunity to delve into advanced concepts such as differential privacy and frequent pattern discovery using association rule mining algorithms.

This course offers a blend of theory and practical application, aimed at providing a comprehensive mathematical toolkit essential for future data scientists.

📒 Modules

Students will have access to a comprehensive range of subject materials, comprising slides handouts, and relevant readings. It is recommended that students commence their engagement with each session by thoroughly reviewing the pertinent slides handouts and readings to obtain a comprehensive understanding of the content.

Additionally, students are encouraged to supplement their knowledge by conducting independent research, utilizing online resources or referring to textbooks that cover relevant information related to the topics under study.

This unit needs a total of 44 class hours, including 22 hours lecturing, and 22 hours workshops, interactive discussion or student presentations.

🗓️ Lectures Plan

The unit's lecture plan is scheduled around 6 different modules as below:

㊙️

All lecture slides handouts are password protected, and available for Deakin SIT742 students on CloudDeakin site.

🔬 Module	🏷️ Category	📒 Topic	🎯 ULOs
0️⃣	Preliminary	📖 Unit Induction	ULO1
1️⃣	Preliminary	📖 Python Foundations for Big Data	ULO1, ULO2
2️⃣	Core	📖 Big Data	UL02, UL03
3️⃣	Core	📖 Big Data Manipulation	ULO4, UL05
4️⃣	Core	📖 Big Data Analytics	ULO4, UL05
5️⃣	Advanced	📖 Advanced Topics in Big Data	UL01, ULO3

🗓️ Workshop Plan

The repository of this unit's workshop (practical classes) can be found at:

You are recommended to do the practicals associated with every module. You may install your own Python package and Apache Spark, but it is much easier to use Cloud platform to run the materials, such as:

Google Colab: which will be used in SIT742 practical classes.
Databricks - Community version: the original contributor to Apache Spark.

Students coming into this unit may be with various technical background, and you may schedule your own study based on your available time and background. We assume no prior knowledge on Python programming, though some understanding of computer programming will be optimal. The following is our recommended practical schedule.

🔬 Session	🏷️ Category	📒 Topic
1️⃣	📖 Unit Induction	M02C, M02D
2️⃣	📖 Python Foundations for Data Science	M02E, M02F
3️⃣	📖 Python Foundations for Data Science	M02G, M02H
4️⃣	📖 Big Data	M03D, M03E
5️⃣	📖 Big Data	M03X, M04A
6️⃣	📖 Data Manipulation	M04B, M04F
7️⃣	📖 Data Manipulation	M04G, M04H
8️⃣	📖 Data Analytics	M05A, M05B
9️⃣	📖 Data Analytics	M05C, M05E
🔟	📖 Advanced Topics in Data Science	M06A, M06B
🏆	📖 Advanced Topics in Data Science	M06D, M06E

🈵 Assessment

Every cohort of student might be assessed differently, depending on the specific requirements of your unit chair (professors of the university).

The assessment is mainly aimed at assessing the students' achievement of the Unit Learning Outcomes (ULOs, a.k.a. objectives), and checking the students' mastery of those theory and methods covered in the unit.

📖 Assessment Plan

The detailed assessment specification and marking rubrics can be found at: M00D-Assessment. The relationship between each assessment task and the ULOs are shown as follows:

🔬 Task	👨‍🏫 Category	🎯 ULO1	🎯 ULO2	🎯 ULO3	Percentage
1️⃣	Presentation	50%	25%	25%	30%
2️⃣	Project	25%	50%	25%	50%
3️⃣	Other	33%	33%	34%	20%

🗓️ Submission Due Dates

2024 - The final assessment files submissions due date is 🗓️ Saturday, 27/07/2024 (tentative), group of one member only (individual work) for all tasks.

It is expected that you will submit the assessment component on time. You will not be allowed to start everything at the last moment, because we will provide you with feedback that you will be expected to use in future assessments.

㊙️

If you find that you are having trouble meeting your deadlines, contact the Unit Chair.

📚 References

This course recommended several key references:

Mining of Massive Datasets
Doing Data Science: Straight Talk from the Frontline, by Cathy O'Neil, Rachel Schutt
Learning Spark: Lightning-Fast Data Analytics, by Jules S. Damji, et al.

👉 Contributors

Thanks goes to these wonderful people 🌷

Made with contributors-img.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
M00-Induction		M00-Induction
M01-Python		M01-Python
M02-BigData		M02-BigData
M03-DataManipulation		M03-DataManipulation
M04-DataAnalytics		M04-DataAnalytics
M05-Advanced		M05-Advanced
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Modern Data Science

💡 Content

📒 Modules

🗓️ Lectures Plan

🗓️ Workshop Plan

🈵 Assessment

📖 Assessment Plan

🗓️ Submission Due Dates

📚 References

👉 Contributors

About

Releases 1

Packages

tulip-lab/modern-data-science

Folders and files

Latest commit

History

Repository files navigation

Modern Data Science

💡 Content

📒 Modules

🗓️ Lectures Plan

🗓️ Workshop Plan

🈵 Assessment

📖 Assessment Plan

🗓️ Submission Due Dates

📚 References

👉 Contributors

About

Topics

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Packages