Matt Hagy, [email protected]
This project demonstrates how to convert simple Python expressions into Scala expressions. For example, consider the following Python expression.
[a(x, z) for x in l if b(x, y)]
Within the context of this work, the equivalent Scala expressions is
l.filter((x) => b(x, y)).map((x) => a(x, z))
To learn more about this project, including a discussion of results, see the corresponding blog post, A Python to Scala transpiler using neural machine translation (NMT).
Neural machine translation (NMT) is applied using a recurrent neural network (RNN) to implement this transpiler. The RNN is created by adapting the excellent work of Zafarali Ahmed in keras-attention, which was originally developed to convert dates from varied human readable format to machine format.
See Jupyter notebook python_scala_comprehension_transpiler.ipynb for the results. Note, you can directly view this file within the GitHub web app without cloning the project and running Jupyter.
If you'd like to reproduce this work, you'll want to use a machine with a GPU, because otherwise training the RNN will be too slow. This project includes scripts for interacting with an AWS EC2 VM instance.
The Python and Scala equivalent programs are generated programatically using a Scala program. You can find the source code for this program in the src directory.
To build the Java jar you'll need to install sbt. With Homebrew, you can install sbt using the following command.
brew install sbt
You can then simply run the following command in the project root directory.
sbt assembly
Next, you can generate the expression data by running these commands.
mkdir data
bash scripts/push_expressions.sh 400000 > data/expressions
(Note, this was the largest data I could process on a 30 GB VM in the subsequent training step.)
The number controls the number of expressions generated and you can modify it accordingly.
Start by creating a VM instance following instructions in Launch an AWS Deep Learning AMI. Select a VM that includes a GPU. I used a g3s.xlarge instance.
Next, you'll need to configure two environmental variables to specify how to interact with your VM using scripts in this project.
You need to specify the SSH PEM key that you configured for this VM. E.g.,
export AWS_EC2_PRIVATE_KEY=~/Downloads/key.pem
You need to specify the DNS address of the VM. You can find this on the EC2 control panel for this VM instance under the attribute "Public DNS".
export VM_DNS=NAME.REGION.compute.amazonaws.com
Connect to the VM using SSH with the following script.
bash scripts/ssh_tunnel.sh
This also creates a port tunnel between the VM and your laptop so you can access the Jupyter notebook instance from your laptop.
Clone the project on the VM within the top of your user home directory by running the following command.
git clone https://github.com/datalogue/keras-attention.git
This will create the directory keras-attention
in your home directory.
Next, we want to launch the Jupyter notebook server on our VM. We run the Jupyter program in a screen session so that if lose our SSH connection to the VM the Jupyter program will continue to run.
You can create a screen session by running the following command in a shell.
screen -S jupyter
(The -S
flag allows us to name this screen session so that we can easily re-attach to it later.)
Within the screen session, launch the Jupyter notebook.
jupyter notebook
The Jupyter program will start and it will print out a long URL that looks like the following.
Copy/paste this URL into your browser when you connect for the first time,
to login with a token:
http://localhost:8888/?token=876bc...
Follow the instructions and copy the URL into a browser on your laptop. You should see the Jupyter main UI. From there you can open notebooks. (See instructions below for copying notebooks to the VM.)
In the VM terminal, you can dispatch from the screen session by pressing Control
and D
at the same
time. You'll want to keep the SSH session open to preserve the tunnel between the VM and you
laptop. If you lose SSH connection, you can always reconnect by re-running the script scripts/ssh_tunnel.sh
.
Next, we want to copy the Python/Scala expressions from our laptop to the VM. First, we'll create the necessary directories on the VM filesystem. Within the SSH session, run the following command on the VM.
mkdir -p nmt_python_scala_transpiler/data
Next, temporarily close the SSH session so we can run the command to copy data
without having to open a new terminal and reconfigure the environmental variables.
You can exit the SSH session by pressing Control
and D
together.
Run the following script on your laptop to copy data.
bash scripts/push_expressions.sh
Next, we'll want to copy the Jupyter notebook from our laptop to the VM. Run the following script to do so.
bash scripts/push_notebook.sh
Once the upload finishes, you can resume your SSH session to recreate the SSH tunnel so that you can access the Jupyter server on the VM from your laptop.
bash scripts/ssh_tunnel.sh
You can now open the notebook on the Jupyter webapp running on your laptop.
Open the file named nmt_python_scala_transpiler.ipynb
.
You can run all cells in this notebook to repeat this work.
The training cell is an infinite loop and you can interrupt it when you're
satisfied with level of training.
I allowed the notebook to run over night using a GPU on EC2.
Lastly, you can copy the modified Jupyter notebook back to your laptop by running the following script on your laptop.
bash scripts/pull_notebook.sh
Thanks for checking out this project! I welcome any revisions or extensions to this work so feel free to submit a PR.