-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deep learning #1025
Deep learning #1025
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for submitting this PR. I like this feature and I think it should be merged into TPOT if the performance in computational time and accuracy/score is competitive with other scikit-learn estimators in default TPOT configuration (e.g vs. RandomForestEstimator). Please provide more details about the performance test (with/without GPU) and also the search space of hyperparameters. Regarding the callbacks function from tensorflow.keras
, it can be integrated later.
@@ -0,0 +1,38 @@ | |||
import nose | |||
from sklearn.datasets import make_classification, make_regression | |||
from sklearn.neural_network import MLPClassifier, MLPRegressor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove this line since this two seems not to be used in the scripts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pleas also fix those failed unit tests in this test scripts.
) | ||
return model | ||
|
||
class DeepLearningClassifier(KerasClassifier): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the error in AppVayor, it failed to import tpot due to no tensorflow in the environment. Maybe using HAS_TENSORFLOW
to determine if those new classes should be created.
layer_sizes = [20, 100, 50, 20, 60, 100] | ||
X, y = make_classification(random_state=1) | ||
|
||
def check(X, X_transformed, embedding_layer_size): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is duplicated and please only keep one.
Great, thank you for taking a look and pointing some things out. I'll do the following three things as next steps:
Quick update: I'm realizing that in order to do a 'fair' comparison, I'm going to need to tune all of the default parameters for these networks to as similar as possible to |
Another update. Although I was able to hack this together to make a Hence, I think I am going to focus efforts on reviving that PR and will pause work on this one until then. |
Another comment. Testing performance, I found that for small datasets on my laptops CPU, tensorflow/keras was considerably slower than |
If anyone is interested, this could easily be revived using https://github.com/adriangb/scikeras. MLP example: https://github.com/adriangb/scikeras/blob/master/tests/mlp_models.py |
What does this PR do?
Attempting to revive #895 while expanding upon it a little to allow for more general integration of
Keras
models intoTPOT
.The main focus is basic sequential models (
tensorflow.keras.models.Sequential
).This PR proposes a method of wrapping them to allow parametric generation of deep learning models in
TPOT
. Example classes are created for a classifier, a regressor and a transformer (the latter being the original intent of #895).This is a very rough proof of concept, but I believe that the versatile functionality of
tensorflow.keras
will allow for fancy stuff like callbacks to be integrated into TPOT's logging, etc.Where should the reviewer start?
Please start by looking over the general implementation and API in
deep_learning.py
.I realize that extensive documentation is needed, I will do that at some point, I'm hoping to get some feedback before I spend a bunch of time on that.
How should this PR be tested?
I wrote some basic tests. More is needed.
I also think I found a bug in
tensorflow.keras.utils.generic_utils.has_arg
which would be a blocker for this. I submitted a PR (tensorflow/tensorflow#37004) on thetensorflow
repo, we'll see where that goes. If tests on this PR fail, that's probably why.I also ran some basic tests comparing the performance of these parametrically generated estimators with some basic Keras tutorials I found, it's about the same if not better.
What are the relevant issues?
#895 , #809
PS: @chappers, I made a new PR because it's a new branch on my fork. I squashed your commits into a single commit for now, I will make sure you are a co-author if this gets merged.