Implementation of Neural Networks models #63

mcleantom · 2021-09-14T15:42:40Z

mcleantom
Sep 14, 2021

I have a model which uses a neural network to predict a force, given a set of inputs. Is there any easy way to implement this model in AeroSandbox which will allow the optimisation algorithm to work? Should I just manually implement the vector math of w*a + b or is there an easier way to integrate a tensorflow/Pytorch NN model into the code?

peterdsharpe · 2021-09-18T18:35:03Z

peterdsharpe
Sep 18, 2021
Maintainer

Hey @mcleantom !

Best way at the time of writing would probably be to implement the vector math as-is. Your network inputs a will be a vector of optimization variables (Opti.variable(...)), and w and b will be (presumably) known a priori as the result of your training. These matrix-vector products and sums are very quick to differentiate through!

One of the future goals on the AeroSandbox TODO list is to implement an interface to black-box functions where you can either supply your own gradient or use finite-differencing. Once this is in place, you'll be able to drop in your PyTorch/TF model and simply provide torch.Tensor.backward() as the gradient, and it should work! But for now, the option above is the way to go!

1 reply

mcleantom Sep 18, 2021
Author

Thanks for the reply! Ill have a crack at converting my NN to a Aerosandbox compatible format! :)

peterdsharpe · 2023-08-04T14:45:21Z

peterdsharpe
Aug 4, 2023
Maintainer

Long-winded update, but see NeuralFoil and its AeroSandbox implementation as an example of exactly this!

0 replies

mcleantom · 2023-08-16T15:57:41Z

mcleantom
Aug 16, 2023
Author

Hey @peterdsharpe, thanks for the reply!

Ive had a look at the NeuralFoil library and I've had a go at building a simple neural network in ASB (or, casadi under the hood) and I'm having an issue calculating the dot product within casadi.

The neural network code, with the conversion from pytorch, looks like:

import torch.nn as nn
import aerosandbox.numpy as np


class Layer:

    def __init__(self, input_size, output_size, weights=None, biases=None):
        self.input_size = input_size
        self.output_size = output_size
        if weights is None:
            self.weights = np.random.randn(input_size, output_size)
        else:
            self.weights = weights

        if biases is None:
            self.biases = np.random.randn(output_size)
        else:
            self.biases = biases

    def forward(self, x):
        return np.dot(x, self.weights) + self.biases


class Tanh:

    def __init__(self):
        pass

    def forward(self, x):
        return np.tanh(x)


class Relu:

    def __init__(self):
        pass

    def forward(self, x):
        return np.maximum(x, 0)


class Sequential:

    def __init__(self, layers):
        self.layers = layers

    def forward(self, x):
        for layer in self.layers:
            x = layer.forward(x)
        return x

    @classmethod
    def from_torch(cls, model):
        layers = []
        for layer in model:
            if isinstance(layer, nn.Linear):
                layers.append(Layer(
                    input_size=layer.in_features,
                    output_size=layer.out_features,
                    weights=np.array(layer.weight.detach().numpy().T),
                    biases=np.array(layer.bias.detach().numpy())
                ))
            elif isinstance(layer, nn.ReLU):
                layers.append(Relu())
            elif isinstance(layer, nn.Tanh):
                layers.append(Tanh())
        return cls(layers)

    def __call__(self, *args, **kwargs):
        return self.forward(*args, **kwargs)

And my unit test for this looks like:

from HydroSandbox.models.neural_network import Sequential
from unittest import TestCase
import torch
import torch.nn as tnn
import aerosandbox.numpy as np
import aerosandbox as asb
torch.manual_seed(0)


class TestNeuralNetworks(TestCase):

    def get_torch_model(self):
        torch_model = tnn.Sequential(
            tnn.Linear(3, 5),
            tnn.ReLU(),
            tnn.Linear(5, 2),
            tnn.ReLU(),
            tnn.Linear(2, 1)
        )
        torch_model.eval()
        return torch_model

    def test_torch_model_and_aerosandbox_model_same_result(self):
        model_input = np.ones((1, 3))
        torch_model = self.get_torch_model()
        torch_input = torch.tensor(model_input, dtype=torch.float32)
        torch_output = torch_model(torch_input)
        aero_model = Sequential.from_torch(torch_model)
        aero_output = aero_model(model_input)
        assert np.isclose(torch_output.detach().numpy(), aero_output, atol=1e-5).all()

        opti = asb.Opti()
        opti_input = opti.variable(init_guess=model_input)
        aero_model(opti_input)

The assertion works fine, but in the line aero_model(opti_input) I am getting issues with the dimensions when doing the dot product in the forward call in the linear layer:

Error
Traceback (most recent call last):
  File "/home/tom/src/vpp3/HydroSandbox/models/test_neural_network.py", line 34, in test_torch_model_and_aerosandbox_model_same_result
    aero_model(opti_input.T)
  File "/home/tom/src/vpp3/HydroSandbox/models/neural_network.py", line 70, in __call__
    return self.forward(*args, **kwargs)
  File "/home/tom/src/vpp3/HydroSandbox/models/neural_network.py", line 49, in forward
    x = layer.forward(x)
  File "/home/tom/src/vpp3/HydroSandbox/models/neural_network.py", line 21, in forward
    return np.dot(x, self.weights) + self.biases
  File "/home/tom/miniconda3/envs/vpp3/lib/python3.10/site-packages/aerosandbox/numpy/linalg_top_level.py", line 19, in dot
    return _cas.dot(a, b)
  File "/home/tom/miniconda3/envs/vpp3/lib/python3.10/site-packages/casadi/casadi.py", line 36361, in dot
    return _casadi.dot(*args)
RuntimeError: .../casadi/core/mx_node.cpp:955: Assertion "size2()==y.size2() && size1()==y.size1()" failed:
MXNode::dot: Dimension mismatch. dot requires its two arguments to have equal shapes, but got (3, 1) and (5, 3).

With numpy, the matricies are casted correctly for the dot product:

import numpy as np

A = np.array([[1, 2, 3]])
B = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])
C = np.dot(A, B)  # array([[30, 36, 42]])

However when you use a casadi matrix, there is a dimension mismatch error:

import casadi as ca

A = ca.MX(*A.shape)
B = ca.MX(*B.shape)
C = ca.dot(A, B)

Traceback (most recent call last):
  File "/home/tom/.local/lib/python3.10/site-packages/IPython/core/interactiveshell.py", line 3433, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-72-07337ba0e18e>", line 1, in <module>
    C = ca.dot(A, B)
  File "/home/tom/miniconda3/envs/vpp3/lib/python3.10/site-packages/casadi/casadi.py", line 36361, in dot
    return _casadi.dot(*args)
RuntimeError: .../casadi/core/matrix_impl.hpp:2000: Assertion "x.size()==y.size()" failed:
dot: Dimension mismatch

I was wondering if you knew why this might happen?

0 replies

cdhainaut · 2023-08-16T17:30:08Z

cdhainaut
Aug 16, 2023

Hi @mcleantom,

I am regularly having the same issue. This is because casadi.dot and numpy.dot behave differently according to input arrays shapes.
Numpy actually does not do a proper "dot" product when A or B are 2d arrays. According to numpy.dot documentation (https://numpy.org/doc/stable/reference/generated/numpy.dot.html):

If both a and b are 1-D arrays, it is inner product of vectors (without complex conjugation).
If both a and b are 2-D arrays, it is matrix multiplication, but using matmul or a @ b is preferred.
If either a or b is 0-D (scalar), it is equivalent to multiply and using numpy.multiply(a, b) or a * b is preferred.
If a is an N-D array and b is a 1-D array, it is a sum product over the last axis of a and b.
If a is an N-D array and b is an M-D array (where M>=2), it is a sum product over the last axis of a and the second-to-last axis of b

On the other hand, casadi does not call matrix multiplication instead of dot product when A or B are 2d arrays. This is why you get this error. Standard dot product is not actually defined with the shapes you are using.

I thought about two solutions as a workaround for this problem:

The first solution would be to try to mimic numpy's behaviour, manually redirecting aerosandbox.numpy.dot function to casadi.dot or casadi.mtimes depending on input arrays shapes
The second solution would be the closest to casadi synthax, where you have to decide explicitly which operation you want to use (np.matmul, np.multiply, np.dot...)

@peterdsharpe any opinion on this?

1 reply

peterdsharpe Aug 16, 2023
Maintainer

Hey @carlitador and @mcleantom !

Regarding the proposed workarounds, I think between those two solutions, Option 1 (try to mimic NumPy's behavior within aerosandbox.numpy.dot) is preferable in terms of AeroSandbox development. Keeping aerosandbox.numpy as close to a drop-in-replacement of numpy as possible is probably the least-bug-inducing solution in the long term. Pull requests welcome here!

On the other hand, for users, it's probably good practice (for both reliability but more importantly, readability) to implement Option 2 and explicitly use the operation that matches the desired behavior.

peterdsharpe · 2023-08-16T18:11:30Z

peterdsharpe
Aug 16, 2023
Maintainer

More specifically for this problem - Charles' diagnosis is exactly correct in that a dot product probably isn't the right tool for the job here, and the NumPy definition of the dot product function is quite overloaded. Rather, this should really be a matrix multiply rather than a dot product (just based on tensor shapes).

One idea for a backend-agnostic implementation (i.e., plays well with NumPy, CasADi, and PyTorch) is to use the Python matmul @ operator. Replace this line:

np.dot(x, self.weights) + self.biases

with:

self.weights @ x + self.biases

which is the same syntax seen in NeuralFoil's evaluation.

I know for sure that this works in CasADi and NumPy, and I would bet (fingers crossed) that this works with PyTorch as well. Let me know!

0 replies

mcleantom · 2023-08-17T08:02:27Z

mcleantom
Aug 17, 2023
Author

Thanks for the help, I managed to get it working.

Here is my updated code:

import torch.nn as nn
import aerosandbox.numpy as np


class Layer:

    def __init__(self, input_size, output_size, weights=None, biases=None):
        self.input_size = input_size
        self.output_size = output_size
        if weights is None:
            self.weights = np.random.randn(input_size, output_size)
        else:
            self.weights = weights

        if biases is None:
            self.biases = np.random.randn(output_size)
        else:
            self.biases = biases

    def forward(self, x):
        return self.weights @ x + np.reshape(self.biases, (-1, 1))


class Tanh:

    def __init__(self):
        pass

    def forward(self, x):
        return np.tanh(x)


class Relu:

    def __init__(self):
        pass

    def forward(self, x):
        return np.maximum(x, 0)


class Sequential:

    def __init__(self, layers):
        self.layers = layers

    def forward(self, x):
        for layer in self.layers:
            x = layer.forward(x)
        return x

    @classmethod
    def from_torch(cls, model):
        layers = []
        for layer in model:
            if isinstance(layer, nn.Linear):
                layers.append(Layer(
                    input_size=layer.in_features,
                    output_size=layer.out_features,
                    weights=np.array(layer.weight.detach().numpy()),
                    biases=np.array(layer.bias.detach().numpy())
                ))
            elif isinstance(layer, nn.ReLU):
                layers.append(Relu())
            elif isinstance(layer, nn.Tanh):
                layers.append(Tanh())
        return cls(layers)

    def __call__(self, *args, **kwargs):
        return self.forward(*args, **kwargs)

There was a mistake in the conversion from pytorch model to the sequential model (previously I transposed the weights to try and get it to work weights=np.array(layer.weight.detach().numpy().T), but forgot to remove it after). I also converted the code to use self.weights @ x + np.reshape(self.biases, (-1, 1)). After this change, the input vector needs to be transposed before they are passed into the neural networks, compared to the same code in pytorch. This could easily be fixed by inheriting from Sequential and overloading the __call__ method, however.

The unit test now looks like:

from HydroSandbox.models.neural_network import Sequential
from unittest import TestCase
import torch
import torch.nn as tnn
import aerosandbox.numpy as np
import aerosandbox as asb
torch.manual_seed(0)


class TestNeuralNetworks(TestCase):

    def get_torch_model(self):
        torch_model = tnn.Sequential(
            tnn.Linear(3, 5),
            tnn.ReLU(),
            tnn.Linear(5, 2),
            tnn.ReLU(),
            tnn.Linear(2, 1)
        )
        torch_model.eval()
        return torch_model

    def test_torch_model_and_aerosandbox_model_get_same_result_2(self):
        N_in_features = 26
        N_out_features = 6

        torch_model = tnn.Sequential(
            tnn.Linear(N_in_features, 128),
            tnn.ReLU(),
            tnn.Linear(128, 128),
            tnn.ReLU(),
            tnn.Linear(128, 128),
            tnn.ReLU(),
            tnn.Linear(128, N_out_features),
        )

        x_in = np.random.randn(1, N_in_features)

        torch_result = torch_model(torch.tensor(x_in).float())
        aero_model = Sequential.from_torch(torch_model)
        aero_result = aero_model(x_in.T)
        self.assertTrue(np.isclose(torch_result.detach().numpy(), aero_result.T, atol=1e-5).all())

        opti = asb.Opti()
        x_opti = opti.variable(init_guess=x_in.T).T  # Cant create the variable without transposing
        opti_result = aero_model(x_opti.T)
        opti.minimize(opti_result[0])

        try:  # We just want to be able to use opti.debug.value function, which is only available after solve.
            sol = opti.solve(max_iter=0)
        except Exception:
            pass

        # Not sure why it returns a 1d array, maybe that information is lost when converted to a variable?
        x_opti_value = np.atleast_2d(opti.debug.value(x_opti))
        opti_result_value = opti.debug.value(opti_result)
        aero_model_result_2 = aero_model(x_opti_value.T).flatten()

        self.assertTrue(np.isclose(opti_result_value, aero_model_result_2, atol=1e-5).all())

I would be interested in working on getting the aerosandbox implementation of np.dot to more closely match numpy's implementation, hopefully I will have time to do that soon and submit a pull request.

I would also be interested in getting either l4casadi or ml-casadi to work with aerosandbox, It would be useful to be able to generalize any torch model to work (as currently, this code only works with Sequential neural networks with tanh/relu activations). Do you have any examples of using the casadi.Function object within aerosandbox? I see it's use quite frequently, however I am not sure how it actually works.

Thanks for your help.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation of Neural Networks models #63

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 6 comments 2 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Implementation of Neural Networks models #63

mcleantom Sep 14, 2021

Replies: 6 comments · 2 replies

peterdsharpe Sep 18, 2021 Maintainer

mcleantom Sep 18, 2021 Author

peterdsharpe Aug 4, 2023 Maintainer

mcleantom Aug 16, 2023 Author

cdhainaut Aug 16, 2023

peterdsharpe Aug 16, 2023 Maintainer

peterdsharpe Aug 16, 2023 Maintainer

mcleantom Aug 17, 2023 Author

mcleantom
Sep 14, 2021

Replies: 6 comments 2 replies

peterdsharpe
Sep 18, 2021
Maintainer

mcleantom Sep 18, 2021
Author

peterdsharpe
Aug 4, 2023
Maintainer

mcleantom
Aug 16, 2023
Author

cdhainaut
Aug 16, 2023

peterdsharpe Aug 16, 2023
Maintainer

peterdsharpe
Aug 16, 2023
Maintainer

mcleantom
Aug 17, 2023
Author