Data Science Projects repository

Welcome to my Data Science Projects repository at GitHub. Here you find some interesting files to your projects in Data Science.

Machine Learning with Scikit-Learn and TensorFlow

Emotion and identity detection from face images

Convolutional neural network

In deep learning, a convolutional neural network (CNN, or ConvNet) is a class of deep, feed-forward artificial neural networks, most commonly applied to analyzing visual imagery.

This project includes a code that allows users to create a dataset from a collection of images and to prepare Train and Test files to the model.

Train Test Spilt (Scikit-Learn)
Training the model
Evaluation of the model

Source:

Machine Learning on the Iris dataset (classification model)

Supervised learning is the machine learning task of learning a function that maps an input to an output based on example input-output pairs.

Algorithms:

K-nearest Neighbours (KNN)
Logistic regression

Source:

https://www.dataschool.io/

Machine Learning on Weather Data (classification model)

Supervised learning is the machine learning task of learning a function that maps an input to an output based on example input-output pairs.

Algorithms:

Decision Tree

Source:

UCSanDiegoX: Python for Data Science

Regression Analysis using Machine Learning

Regression analysis consists of a set of machine learning methods that allow us to predict a continuous outcome variable (y) based on the value of one or multiple predictor variables (x).

Source:

Machine Learning on Kaggle Socccer Dataset (clustering model)

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean.

Algorithms:

k-means clustering

Source:

UCSanDiegoX: Python for Data Science
https://en.wikipedia.org/wiki/Cluster_analysis
https://en.wikipedia.org/wiki/K-means_clustering

Anomaly Detection (KDD CUP 99 network intrusion data)

The detection of anomalies has signiﬁcant relevance and often provides critical actionable information in various application domains. Isolation Forest is an outlier detection technique that identifies anomalies instead of normal observations. Similarly to Random Forest it is built on an ensemble of binary (isolation) trees.

Algorithms:

Isolation Forest

Source:

Time Series Analysis - ARIMA Forecasting

Time Series is a collection of data points at constant time intervals. These are analyzed to determine the long term trend so as to forecast the future or perform some other form of analysis. ARIMA models are applied in some cases where data show evidence of non-stationarity, where an initial differencing step (corresponding to the "integrated" part of the model) can be applied one or more times to eliminate the non-stationarity.

Source:

Multivariate Time Series Forecasting - Long Short-Term Memory (LSTM)

Multivariate time series analysis considers simultaneously more than one time-dependent variable. Each variable depends not only on its past values but also has some dependency on other variables. Long Short-Term Memory (LSTM) recurrent neural networks are able to almost seamlessly model problems with multiple input variables.

Source:

Using Python to performance Data Manipulation or to automate tasks

ElasticSearch

This project allows users to connect to a ElasticSearch server, to extract and to write this data in a JSON file.

Elasticsearch is a full-text, distributed NoSQL database.

This code also record the previous indexes that were scanned to only get new inputs.

You can use the JSON to CSV file to convert to a structered table format.

To custom email sender address As your email address is more likely to be to be recognized by your applicants and reviewers, your email is more likely to be received and not get caught in spam filters.
To custom subject field You will be able to set up an interactive subject field
To define recipients according to a mailing list (CSV file) This project extract information from a mailing list, grouping recipients according to their company (for example)
To define language content Possibility to send different contents according to the language informed at mailing list
To use HTML format in your message This project allows you to import a HTML content and send it, according to the language selected
To use attachments This project allows you to send more than one attachment, no matter the filetype and addressed correctly to the recipients group
To keep register of the use This project create a log register for further reference

Data Analytics with Python by Web scraping

This project allows you to extract information from one website, using Python (BeautifulSoup).

BeautifulSoup is a Python library which helps you to navigate, search and modify the parse tree.

This information will be presented in a word cloud visualization.

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
anomaly-detection		anomaly-detection
classification-models		classification-models
clustering_model		clustering_model
decision-tree		decision-tree
elasticsearch		elasticsearch
emotion-identity-detection		emotion-identity-detection
google-data-analytics		google-data-analytics
json_to_csv		json_to_csv
merge_txt_and_convert_to_csv		merge_txt_and_convert_to_csv
multivariate_time_series_lstm		multivariate_time_series_lstm
regression-model		regression-model
send_email		send_email
time_series_analysis		time_series_analysis
webscraping		webscraping
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Science Projects repository

Table of Contents

Machine Learning with Scikit-Learn and TensorFlow

Emotion and identity detection from face images

Machine Learning on the Iris dataset (classification model)

Machine Learning on Weather Data (classification model)

Regression Analysis using Machine Learning

Machine Learning on Kaggle Socccer Dataset (clustering model)

Anomaly Detection (KDD CUP 99 network intrusion data)

Time Series Analysis - ARIMA Forecasting

Multivariate Time Series Forecasting - Long Short-Term Memory (LSTM)

Using Python to performance Data Manipulation or to automate tasks

ElasticSearch

Convert JSON to CSV file

Merge TXT files and Convert to CSV

Using Python to send Emails from Gmail

Data Analytics with Python by Web scraping

About

Releases

Packages

Languages

rvalins/Data-Science-projects

Folders and files

Latest commit

History

Repository files navigation

Data Science Projects repository

Table of Contents

Machine Learning with Scikit-Learn and TensorFlow

Using Python to performance Data Manipulation or to automate tasks

About

Resources

Stars

Watchers

Forks

Languages