Installation | Documentation | Examples | Get Help | How to Contribute
Intel® oneAPI Data Analytics Library (oneDAL) is a powerful machine learning library that helps speed up big data analysis. oneDAL solvers are also used in Intel Distribution for Python in Scikit-learn optimization.
Intel® oneAPI Data Analytics Library is an extension of Intel® Data Analytics Acceleration Library (Intel® DAAL).
- Build yours high-performance data science application with intel® oneDAL
- Python API
- Scikit-learn patching
- Distributed multi-node mode
- oneDAL Apache Spark MLlib samples
- Installation
- Installation from Source
- Examples
- Samples
- Documentation
- Technical Preview Features
- oneDAL and Intel® DAAL
Intel® oneDAL uses all capabilities of Intel® hardware, which allows you to get an sugnificant performance boost on the classic machine learning algorithms.
We provide highly optimized algorithmic building blocks for all stages of data analytics: preprocessing, transformation, analysis, modeling, validation, and decision making.
The current version of oneDAL provides Data Parallel C++ (DPC++) API extensions to the traditional C++ interface.
The size of the data is growing exponentially, as is the need for high-performance and scalable frameworks to analyze all this data and extract some benefits from it. Besides superior performance on a single node, the oneDAL distributed computation mode also provides excellent strong and weak scaling (check charts below).
Intel® oneDAL K-means fit, strong scaling result | Intel® oneDAL K-means fit, weak scaling results |
---|---|
technical details: FPType: float32; HW: Intel Xeon Processor E5-2698 v3 @2.3GHz, 2 sockets, 16 cores per socket; SW: Intel® DAAL (2019.3), MPI4Py (3.0.0), Intel® Distribution Of Python (IDP) 3.6.8; Details available in the article https://arxiv.org/abs/1909.11822
Check out our examples and documentation for information about our API
Intel® oneDAL has a python API that is provided as a standalone python library called daal4py. Below is an example of how daal4py can be used for calculation KMeans clusters
import numpy as np
import pandas as pd
import daal4py as d4p
data = pd.read_csv("local_kmeans_data.csv", dtype = np.float32)
init_alg = d4p.kmeans_init(nClusters = 10,
fptype = "float",
method = "randomDense")
centroids = init_alg.compute(data).centroids
alg = d4p.kmeans(nClusters = 10, maxIterations = 50, fptype = "float",
accuracyThreshold = 0, assignFlag = False)
result = alg.compute(data, centroids)
Python interface to efficient Intel® oneDAL provided by daal4py allows one to create scikit-learn compatible estimators, transformers, clusterers, etc. powered by oneDAL which are nearly as efficient as native programs.
Speedups of Intel® oneDAL powered Scikit-learn over the original Scikit-learn, 28 cores, 1 thread/core |
---|
technical details: FPType: float32; HW: Intel(R) Xeon(R) Platinum 8276L CPU @ 2.20GHz, 2 sockets, 28 cores per socket; SW: scikit-learn 0.22.2, Intel® DAAL (2019.5), Intel® Distribution Of Python (IDP) 3.7.4; Details available in the article https://medium.com/intel-analytics-software/accelerate-your-scikit-learn-applications-a06cacf44912 |
daal4py have an API which matches API from scikit-learn. This framework allows you to speed up your existing projects by changing one line of code
from daal4py.sklearn.svm import SVC
from sklearn.datasets import load_digits
digits = load_digits()
X, y = digits.data, digits.target
svm = SVC(kernel='rbf', gamma='scale', C = 0.5).fit(X, y)
print(svm.score(X, y))
In addition daal4py provides an option to replace some scikit-learn methods by oneDAL solvers which makes it possible to get a performance gain without any code changes. This approach is the basis of Intel distribution for python scikit-learn. You can patch stock scikit-learn by using the only following commandline flag
python -m daal4py my_application.py
Patches can also be enabled programmatically:
from sklearn.svm import SVC
from sklearn.datasets import load_digits
from time import time
svm_sklearn = SVC(kernel="rbf", gamma="scale", C=0.5)
digits = load_digits()
X, y = digits.data, digits.target
start = time()
svm_sklearn = svm_sklearn.fit(X, y)
end = time()
print(end - start) # output: 0.141261...
print(svm_sklearn.score(X, y)) # output: 0.9905397885364496
from daal4py.sklearn import patch_sklearn
patch_sklearn() # <-- apply patch
from sklearn.svm import SVC
svm_d4p = SVC(kernel="rbf", gamma="scale", C=0.5)
start = time()
svm_d4p = svm_d4p.fit(X, y)
end = time()
print(end - start) # output: 0.032536...
print(svm_d4p.score(X, y)) # output: 0.9905397885364496
Often data scientists require different tools for analysis regular and big data. daal4py offers various processing models, which makes it easy to enable distributed multi-node mode.
import numpy as np
import pandas as pd
import daal4py as d4p
d4p.daalinit() # <-- Initialize SPMD mode
data = pd.read_csv("local_kmeans_data.csv", dtype = np.float32)
init_alg = d4p.kmeans_init(nClusters = 10,
fptype = "float",
method = "randomDense",
distributed = True) # <-- change model to distributed
centroids = init_alg.compute(data).centroids
alg = d4p.kmeans(nClusters = 10, maxIterations = 50, fptype = "float",
accuracyThreshold = 0, assignFlag = False,
distributed = True) # <-- change model to distributed
result = alg.compute(data, centroids)
For more details browse our daal4py documentation.
oneDAL provides scala / java interfaces that match Apache Spark MlLib API and use oneDAL solvers under the hood. This implementation allows you to get a 3-18X increase in performance compared to default Apache Spark MLlib.
technical details: FPType: double; HW: 7 x m5.2xlarge AWS instances; SW: Intel DAAL 2020 Gold, Apache Spark 2.4.4, emr-5.27.0; Spark config num executors 12, executor cores 8, executor memory 19GB, task cpus 8
Check samples tab for more details.
You can install oneDAL:
- from oneDAL home page as a part of Intel® oneAPI Base Toolkit.
- from GitHub*.
See Installation from Sources for details.
Except C++ and Python API oneDAL also provide API for C++ SYCL and Java languages. Check out tabs below for more examples.
Samples is an examples of how oneDAL can be used in different applications.
Technical preview features are introduced to gain early feedback from developers. A technical preview feature is subject to change in the future releases. Using a technical preview feature in a production code base is therefore strongly discouraged.
In C++ APIs, technical preview features are located in daal::preview
and onedal::preview
namespaces. In Java APIs, technical preview features are located in packages that have the com.intel.daal.preview
name prefix.
The preview features list:
- Graph Analytics:
- Undirected graph without edge and vertex weights (undirected_adjacency_array_graph) - 32bit vertex index only
- Jaccard Similarity Coefficients for all vertex pairs, a batch algorithm which processes the graph by blocks
Intel® oneAPI Data Analytics Library is an extension of Intel® Data Analytics Acceleration Library (Intel® DAAL).
This repository contains branches corresponding to both oneAPI and classical versions of the library. We encourage you to use oneDAL located under the master
branch.
Product | Latest release | Branch | Resources |
---|---|---|---|
oneDAL | 2021.1-beta08 | master rls/onedal-beta08-rls |
Home page Documentation System Requirements |
Intel® DAAL | 2020 Gold | rls/daal-2020-u2-rls | Home page Developer Guide System Requirements |