diff --git a/.github/workflows/cd-docs.yml b/.github/workflows/cd-docs.yml index bf6355d4140..ff042e74017 100644 --- a/.github/workflows/cd-docs.yml +++ b/.github/workflows/cd-docs.yml @@ -5,12 +5,12 @@ on: none: description: "Deploy Syft Documentation" required: false - pull_request: - branches: [dev] - paths: [docs/] - push: - branches: [dev] - paths: [docs/] + # pull_request: + # branches: [dev] + # paths: [docs/] + # push: + # branches: [dev] + # paths: [docs/] jobs: cd-docs: diff --git a/docs/source/deployment/glossary.rst b/docs/source/deployment/glossary.rst index 1cfa7688e9c..a257c3ca4f8 100644 --- a/docs/source/deployment/glossary.rst +++ b/docs/source/deployment/glossary.rst @@ -43,10 +43,6 @@ PyGrid ~~~~~~~~~~~~~~~~~~~~~ ``PyGrid`` is a ``peer-to-peer network`` of data owners and data scientists who can collectively train AI models using ``PySyft``. ``PyGrid`` is also the central server for conducting both model-centric and data-centric ``federated learning``. You may control PyGrid via our user-interface, ``PyGrid Admin``. -HaGrid -~~~~~~~~~~~~~~~~~~~~~ -``Hagrid`` (HAppy GRID!) is a ``command-line tool`` that speeds up the deployment of ``PyGrid``, the software providing a peer-to-peer network of data owners and data scientists who can collectively train models. - Remote Data Science ~~~~~~~~~~~~~~~~~~~~~ A sub-field of data science wherein a data scientist is able to extract insights from a dataset owned by a data owner, but only those insights which the data owner explicitly decides to allow, whose preferences are enforced by information-restricting technologies such as cryptography, information security, and distributed systems. diff --git a/docs/source/deployment/index.rst b/docs/source/deployment/index.rst deleted file mode 100644 index 461ecb3734e..00000000000 --- a/docs/source/deployment/index.rst +++ /dev/null @@ -1,673 +0,0 @@ -.. _advanced_deployment: - -=========================================== -Advanced Deployment: Introduction to HaGrid -=========================================== - -.. toctree:: - :maxdepth: 3 - -Hagrid (HAppy GRID!) is a command-line tool that speeds up the -deployment of PyGrid, the software providing a peer-to-peer network of -data owners and data scientists who can collectively train AI models -using `PySyft `__. - -Hagrid is able to orchestrate a collection of PyGrid Domain and Network -nodes and scale them in a local development environment (based on a -docker-compose file). By stacking multiple copies of this docker, you -can simulate multiple entities (e.g countries) that collaborate over -data and experiment with more complicated data flows such as SMPC. - -Similarly to the local deployment, Hagrid can bootstrap docker on a -Vagrant VM or on a cloud VM, helping you deploy in an user-friendly way -on Azure, AWS\* and GCP*. - -*\* Deploying to AWS and GCP is still under development.* - -Working with Hagrid & Syft API versions: - -- **Development mode:** - You can experiment with your own local checked-out version of Syft - and bootstrap a local Jupyter Notebook where you can use the Syft - & Grid API to communicate with a prod/local dev system\ *.* - -- **Production mode:** - You can specify the branch and repository you want to fork (including your own fork) and Hagrid will monitor those branches in a cron job, pull new changes and restart the services to apply them, therefore your deployed system will always stay up to date. - -Prerequisites -=============== - -The following operating systems are currently supported: Linux, Windows, MacOS. Please ensure you have at least 8GB of ram if you intend to run Hagrid locally. - -Setting up virtual environment using Python 3.9 -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -1. Ensure using **Python3.8+**, which should be first installed in your system. To easily handle further dependencies, we suggest using conda: - - a. Install conda `following these instructions `_ depending on your OS. - - b. Create a new env specifying the Python version (we recommend Python 3.8/3.9) in the terminal: - - .. code-block:: bash - - $ conda create -n myenv python=3.9 - $ conda activate myenv - (to exit): conda deactivate - -Using latest pip -~~~~~~~~~~~~~~~~~ - -**Pip** is required to install dependencies, so make sure you have it installed and up-to-date by running the following these `instructions `__. - -If you have it installed, please check it is the latest version: - -.. code-block:: bash - - $ pip install --upgrade pip && pip -V (Linux) - $ python -m pip install --upgrade pip (for Windows) - - -Install Jupyter Notebook -~~~~~~~~~~~~~~~~~~~~~~~~~ - -1. A very convenient way to interact with a deployed node is via Python, using a Jupyter Notebook. You can install it by running: - - .. code-block:: bash - - $ pip install jupyter-notebook - -2. If you encounter issues, you can also install it using Conda: - - .. code-block:: bash - - $ conda install -c conda-forge notebook - -3. To launch the Jupyter Notebook, you can run the following in your terminal: - - .. code-block:: bash - - $ jupyter notebook - -Installing and configuring Docker -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -1. Install **Docker** and **Docker Composite V2,** which is needed to orchestrate docker, as explained below: - - For **Linux**: - - a. Install **Docker**: - - .. code-block:: bash - - $ sudo apt-get upgrade docker & docker run hello-world - - b. Install **Docker Composite V2** as described `here `__. - - c. You should see ‘Docker Compose version v2’ when running: - - .. code-block:: bash - - $ docker compose version - - d. If not, go through the `instructions here `__ or if you are using Linux, you can try to do: - - .. code-block:: bash - - $ mkdir -p ~/.docker/cli-plugins - $ curl -sSL https://github.com/docker/compose-cli/releases/download/v2.0.0-beta.5/docker-compose-linux-amd64 -o ~/.docker/cli-plugins/docker-compose - $ chmod +x ~/.docker/cli-plugins/docker-compose - - e. Also, make sure you can run without sudo: - - .. code-block:: bash - - $ echo $USER //(should return your username) - $ sudo usermod -aG docker $USER - - - For **Windows**, **MacOs**: - - a. You can install Desktop Docker as explained `here for Windows `_ or `here for MacOS `_. - - b. The ``docker-compose`` should be enabled by default. If you encounter issues, you can check it by: - - - Go to the Docker menu, click ``Preferences (Settings on Windows)`` > ``Experimental features``. - - - Make sure the Use ``Docker Compose V2`` box is checked. - - c. Ensure at least 8GB of RAM are allocated in the Desktop Docker app: - - - Go to 'Preferences' -> 'Resources' - - - Drag the 'Memory' dot until it says at least 8.00GB - - - Click 'Apply & Restart' - -2. Make sure you are using the **dev** branch of the PySyft repository (branch can be found `here `__) - - -Explore locally with the PySyft API -==================================== - -1. Install **tox**: - - .. code-block:: bash - - $ pip install tox - -2. Move to the correct branch in the PySyft repository: - - .. code-block:: bash - - $ git checkout dev - -3. Check current tasks that can be run by tox: - - .. code-block:: bash - - $ tox -l - -4. Open an editable Jupyter Notebook which doesn't require to run in a container: - - .. code-block:: bash - - $ tox -e syft.jupyter - - -Local deployment using Docker -==================================== - -1. Install Hagrid: - - .. code-block:: bash - - $ pip install -U hagrid - -2. Launch a Domain Node: - - .. code-block:: bash - - $ hagrid launch domain - - - .. note:: - - First run **it might take ~5-10 mins** to build the PyGrid docker image. Afterwards, you should see something like: - - .. code-block:: bash - - Launching a domaing PyGrid node on port 8081 ! - - - TYPE: domain - - NAME: mystifying_wolf - - TAG: 035c3b6a378a50f78cd74fc641d863c7 - - PORT: 8081 - - DOCKER: v2.2.3 - - Optionally, you can provide here additional args to use a certain repository and branch, as: - - .. code-block:: bash - - $ hagrid launch domain --repo $REPO --branch $BRANCH - -3. Go to ``localhost:port/login`` in your browser (using the port specified in your CLI, here *8081*) to see the PyGrid Admin UI where you, as a data owner, can manage your PyGrid deployment. - - a. Log in using the following credentials: - - .. code-block:: python - - info@openmined.org - - changethis - - - b. Explore the interface or you can even do requests via `Postman `__. You can check all the available endpoints at http://localhost:8081/api/v1/openapi.json/ and have all the following environment variables set (a more detailed explanationcan be found in `this video section `__): - - |image0| - - The auth token can be obtained by doing a login request as follows: - - |image1| - -4. While the Domain Node is online, you can start a Jupyter Notebook as described `above <#explore-locally-with-the-pysyft-api-no-containers-involved>`__ to use PySyft to communicate to it in a Python client rather than a REST API. Connecting to it can be done as following: - - .. code-block:: python - - import syft as sy - - domain = sy.login(email='info@openmined.org', password='changethis', port=8081) - - domain.store - - domain.requests - - Domain.users - -5. To stop the node, run: - - .. code-block:: bash - - $ hagrid land --tag=035c3b6a378a50f78cd74fc641d863c7 (using the TAG specified in your CLI) - - -Local deployment using Vagrant and VirtualBox -=============================================== - -This is particularly useful to experiment with the Ansible scripts to test new changes. - -1. Run hagrid status and ensure all dependencies are checked to make sure you have Vagrant and VirtualBox installed. - - |image2| - -2. For installing Vagrant, check the `instructions here. `__ - -3. Additionally to Vagrant, we need to install a plugin called landrush that allows using a custom DNS that points to the IP address used in the VM: - - .. code-block:: bash - - $ vagrant plugin install landrush - -3. Move to the correct branch and directory in the PySyft repository: - - .. code-block:: bash - - $ git checkout 0.6.0 - $ cd packages/grid - - -4. Create the environment using vagrant for the first time: - - .. code-block:: bash - - $ vagrant init - $ vagrant up - - - When the VM is booted up, it starts the docker service and then the docker service starts all the containers as configured. As it is just created, provisioning is always **run** automatically\ **.** - - When deploying locally, the tasks listed in ‘main.yml’ for the node are not being run. Therefore, it does not have to do the lengthy - setup every time (installing docker, cloning PySyft and launching the cronjob to reload PySyft). - - .. note:: The tasks for the containers and nodes respectively can be found in \*.yml files defined in ``packages/grid/ansible/roles/containers`` and ``packages/grid/ansible/roles/nodes`` - -5. If you intend to run it frequently and not only once, either run ``vagrant status`` to see if the env has already been created and if yes, to ``run vagrant up --provision`` every time to launch the provisioners, otherwise it is just resuming the existing machine. - -6. To access the VM via SSh and jump to the user we are creating in vagrant: - - .. code-block:: bash - - $ vagrant ssh - $ sudo su -om - $ whoami # should return 'om' - -8. You can go to ``http://10.0.1.2/login`` which is at port 80 to access the PyGrid Admin UI, which you can explore, query via Postman or in a - local Jupyter Notebook using a Python client as described in `steps 3 and 4 here <#local-deployment-using-docker>`__. - -9. To shut down the machine currently managed by Vagrant, you can run the following after exiting this node shell: - - .. code-block:: bash - - $ vagrant halt - -10. Or alternatively to destroy it using: - - .. code-block:: bash - - $ vagrant destroy - - -Deploying on Kubernetes -======================== - -We provide an option to deploy the stack using kubernetes. To test and run this locally we use ``minikube`` and ``devspace``. - -These are the prerequisites needed further, which are explained step-by-step below: - -* docker -* hyperkit -* minikube -* devspace -* kubectl -* kubectx - -MacOS -~~~~~ - -* **Hyperkit** - -Ingress is not working on Mac and Docker and the issue is `being tracked here `_. Until then we will use the ``hyperkit`` backend. - -#. Install hyperkit by running: - -.. code-block:: bash - - $ brew install hyperkit - - -* **Docker** - -#. See above about using ``hyperkit`` on Mac until the ingress issue is fixed. - -#. We will be using Docker - however you do not need to ``enable kubernetes`` in your Docker Desktop App. If it is enabled, disable it and click `Apply & Restart`. - -#. This is because we will use ``minikube`` which will create and manage all the k8s resources we require as a normal container in docker engine. We install it by running: - -.. code-block:: bash - - $ brew install minikube - - - -* **Minikube** - -1. ``minikube`` is a mini master k8s node that you can run on your local machine in a similar manner to Docker. To use minikube you need it to be running: - -.. code-block:: bash - - $ minikube config set driver hyperkit - $ minikube start --disk-size=40g - $ minikube addons enable ingress - -2. If you ever need to reset ``minikube`` you can do: - -.. code-block:: bash - - $ minikube delete --all --purge - -3. Once ``minikube`` is running, you should see the container in Docker by running: - -.. code-block:: bash - - $ docker ps - CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES - 57f73851bf08 gcr.io/k8s-minikube/kicbase:v0.0.25 "/usr/local/bin/entr…" 46 hours ago Up About a minute 127.0.0.1:57954->22/tcp, 127.0.0.1:57955->2376/tcp, 127.0.0.1:57957->5000/tcp, 127.0.0.1:57958->8443/tcp, 127.0.0.1:57956->32443/tcp minikube - - - -* **Kubectl** - -``kubectl`` is the CLI tool for kubernetes. If you have ran ``minikube``, it should have configured your kubectl to point to the local minikube cluster by default. - -You should be able to see this if you run the following command: - -.. code-block:: bash - - $ kubectl get all - NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE - service/kubernetes ClusterIP 10.96.0.1 443/TCP 45h - -* **k8s Namespaces** - -To understand the usage of ``k8s Namespaces``, think of a namespace as a grouping of resources and permissions which lets you easily create and destroy everything related to a single keyword. - -.. code-block:: bash - - $ kubectl get namespaces - NAME STATUS AGE - default Active 45h - kube-node-lease Active 45h - kube-public Active 45h - kube-system Active 45h - kubernetes-dashboard Active 45h - -All k8s have a default namespace and the other ones here are from kubernetes and minikube. - -We will use the namespace ``openmined`` to make it clear what belongs to the Grid stack and what is something else. To create it, we can run: - -.. code-block:: bash - - $ kubectl create namespace openmined - -.. code-block:: bash - - $ kubectl get all -n openmined - No resources found in openmined namespace. - - -* **Kubectx** - -``kubectx`` is a package of helpful utilities which can help you do things like set a default namespace. - -.. code-block:: bash - - $ brew install kubectx - -Now we can use a tool like ``kubens`` to change the default namespace to openmined. - -.. code-block:: bash - - $ kubens openmined - Context "minikube" modified. - Active namespace is "openmined". - -Now when we use commands without `-n` we get openmined by default. - -.. code-block:: bash - - $ kubectl get all - No resources found in openmined namespace. - -* **Helm Charts** - -The most popular way to deploy applications to k8s is with a tool called Helm. What helm aims to do is to provide another layer of abstraction over kubernetes yaml configuration with hierarchical variables, templates and a package definition which can be hosted over HTTP allowing custom applications to depend on other prefabricated helm charts or to provide consumable packages of your code as a helm chart itself. - -* **devspace** - -To make development and deployment of our kubernetes code easier, we use a tool called ``devspace`` which aims to be like a hot reloading dev optimised version of `docker compose` but for kubernetes. More documentation can be `found here `_. - -Additionally ``devspace`` allows us to deploy using helm by auto-generating the values and charts from the ``devspace.yaml`` which means the single source of truth can be created which includes both production helm charts and kubernetes yaml configuration as well as local dev overrides. - -.. code-block:: bash - - $ brew install devspace - - -Deploy to local dev -~~~~~~~~~~~~~~~~~~~ - -1. Check that you have the right namespace: - -.. code-block:: bash - - $ devspace list namespaces - Name Default Exists - default false true - kube-node-lease false true - kube-public false true - kube-system false true - kubernetes-dashboard false true - openmined *true* true - -2. Run the ``dev`` command with ``devspace``: - -* To run a network with headscale VPN: - -.. code-block:: bash - - $ cd packages/grid - $ devspace dev -b -p network - -* To run a domain without the headscale VPN: - -.. code-block:: bash - - $ cd packages/grid - $ devspace dev -b -p domain - -3. Connect VPN in dev: - -You can run the connect VPN settings using all the opened ports with: - -.. code-block:: bash - - $ cd packages/grid - $ python3 vpn/connect_vpn.py http://localhost:8088 http://localhost:8087 http://headscale:8080 - -4. Destroy the local deployment - -.. code-block:: bash - - $ devspace purge - -5. Delete persistent volumes - -The database and the VPN containers have persistent volumes. - -* You can check them with: - -.. code-block:: bash - - $ kubectl get persistentvolumeclaim - -* Then delete PostgreSQL as it follows: - -.. code-block:: bash - - $ kubectl delete persistentvolumeclaim app-db-data-db-0 - -6. Check which images / tags are being used - -This will show all the unique images and their tags currently deployed which is useful -when debugging which version is actually running in the cluster. - -.. code-block:: bash - - $ kubectl get pods --all-namespaces -o jsonpath="{.items[*].spec.containers[*].image}" | tr -s '[[:space:]]' '\n' | sort | uniq -c - - -7. Restart a container / pod / deployment - -* To get all the deployments: - -.. code-block:: bash - - $ kubectl get deployments - NAME READY UP-TO-DATE AVAILABLE AGE - backend 1/1 1 1 18m - backend-stream 1/1 1 1 18m - backend-worker 1/1 1 1 18m - frontend 1/1 1 1 18m - queue 1/1 1 1 19m - -* Restart the backend-worker - -.. code-block:: bash - - $ kubectl rollout restart deployment backend-worker - - -Deploy to Google Kubernetes Engine (GKE) -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -1.Configure kubectl context with GKE: - -.. code-block:: bash - - $ gcloud container clusters get-credentials --region us-central1-c staging-cluster-1 - -2. Check that you have the correct context - -.. code-block:: bash - - $ kubectx - -3. Configure your Google Container Registry (GCR): - -.. code-block:: bash - - $ gcloud auth configure-docker - -4. Check your settings with print - -.. code-block:: bash - - $ devspace print -p domain --var=CONTAINER_REGISTRY=gcr.io/reflected-space-315806/ - -5. You should see that you are creating a domain and that the container registry variable changes the image name to: - -.. code-block:: bash - - images: - backend: - image: gcr.io/reflected-space-315806/openmined/grid-backend - -.. note:: This will tell ``devspace`` to publish to the GCR for your active GCP project. - -6. Create the openmined namespace - -.. code-block:: bash - - $ kubectl create namespace openmined - -7. Tell devspace to use the openmined namespace - -.. code-block:: bash - - $ devspace use namespace openmined - -8. Deploy to GKE: - -.. code-block:: bash - - $ devspace deploy -p domain --var=CONTAINER_REGISTRY=gcr.io/reflected-space-315806/ - -9. Access a container directly: - -.. code-block:: bash - - $ devspace enter - -10. Attach to container stdout: - -.. code-block:: bash - - $ devspace attach - -11. Use port forwarding to access an internal service: - -.. code-block:: bash - - $ kubectl port-forward deployment/tailscale :4000 - - -Deploying to Azure -==================================== - -1. Get your virtual machine on Azure ready - - a. To create one, you can either go to `portal.azure.com `__ or use `this 1-click template `__ available off-the-shelves. - - b. If you proceed to create it yourself, make sure you respect the following: - - i. Use ``Ubuntu Server 20.04`` or newer - - ii. Select ``SSH``, ``HTTP``, ``HTTPS`` as inbound ports - - iii. Have at least ``2x CPU``, ``4GB RAM``, ``40GB HDD``. - - .. note:: - During creation, write down the username used and save the key locally. In case warnings arise regarding having an unprotected key, you can run: - - .. code-block:: bash - - $ sudo chmod 600 key.pem - -2. To deploy to Azure, the following can be run: - - .. code-block:: bash - - $ hagrid launch node --username=azureuser --key-path=~/hagriddeploy_key.pem domain to 51.124.153.133 - - - Additionally, you are being asked if you want to provide another repository and branch to fetch and update HAGrid, which you can skip by pressing ``Enter``. - -3. If successful, you can now access the deployed node at the specified IP address and interact with it via the PyGrid Admin UI at http://51.124.153.133/login (change IP with yours) or use Postman to do API requests. - -.. |image0| image:: ../_static/deployment/image2.png - :width: 95% - -.. |image1| image:: ../_static/deployment/image1.png - :width: 95% - -.. |image2| image:: ../_static/deployment/image3.png - :width: 95% diff --git a/docs/source/guides/data-owner/00-deploy-domain.rst b/docs/source/guides/data-owner/00-deploy-domain.rst deleted file mode 100644 index 0d11a065ce8..00000000000 --- a/docs/source/guides/data-owner/00-deploy-domain.rst +++ /dev/null @@ -1,190 +0,0 @@ -Deploying your own Domain Server -=============================================== - -**Data Owner Tutorials** - -◻️ 00-deploy-domain 👈 - -◻️ 01-upload-data - -.. note:: - **TIP:** To run this tutorial interactively in Jupyter Lab on your own machine type: - -:: - - pip install -U hagrid - hagrid quickstart data-owner - - - -Data owners are defined by those with ``datasets`` 💾 they want to make available for -study by an outside party. - -This tutorial will help you understand how a Data Owner can -``launch`` their own Domain Server to securely host private datasets. - - **Note:** Throughout the tutorials, we also mean Domain Servers whenever we refer to Domain Node. Both mean the same and are used interchangeably. - -Why do Data Owners Deploy Domain Servers? ------------------------------------------ - -The concept of Remote Data Science starts with a server-based model -that we call ``Domain Server``. It allows people/data owners 👨 to load -their ``private`` data into these servers and create an account with -a username and password for Data Scientists💻. - -The advantage of using a Domain Server is that you can catalyze the impact your dataset can have by allowing... - -#. a Data Scientist to only get ``answers`` to the types of ``questions`` you allow them to -#. and by allowing them to get those answers without needing to directly ``access`` or have a copy of your data - - -|00-deploy-domain-00| - - -This means that by having your organization retain governance over the information they steward without -needing to share direct ``copies`` of data to collaborators, domain servers create an opportunity for more -collaboration and more research to happen without losing ``control`` of your data and risking things like IP. - -Steps To Deploy a Domain ------------------------- - -How collaboration gets streamlined will be covered in our tutorials about connecting to a ``"Network Node."`` We will discuss -how control is maintained in our tutorials about ``"How to assign a Privacy Budget."`` For this tutorial, however, -let's start by learning how to deploy a domain server. - -📒 Overview of this tutorial: - -* **Installing** the required software -* **Running** the servers -* **Checking** the status of deployed server - -|00-deploy-domain-01| - -Few things to make a note of before starting: - -- **PySyft** = Privacy-Preserving Library -- **PyGrid** = Networking and Management Platform -- **HAGrid** = Deployment and Command Line Tool - -Step 1: Install wizard -~~~~~~~~~~~~~~~~~~~~~~~ - -To simplify the installation process, we have an `install wizard` that will help you -setup the latest versions of `hagrid` and `syft` on your machine. - -You can go to the install wizard at any time by running the below command: - -:: - - hagrid quickstart - - -.. warning:: - The next step will show you how to launch a domain node. If - you run into any ``issue`` running the above installation wizard, consider - looking for the ``error`` you are getting on our - `GitHub-Issue `__ page. - Still not able to figure out the problem, don’t worry. We are here to - help you. Join the OpenMined - `slack `__ - community and explain your problem in the ``#general`` channel, and - any one of us might be able to help you. - - -Step 2: Launching a Domain Server -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Great work, people!! Once you have installed all the dependencies, it is -time to use ``HAGrid`` to launch your Domain Node. - -To launch a domain node, there are three things that you -need to know: - -1. **What type of node do you need to deploy?** -There are two different types of nodes: Domain Node and Network Node. By -default, HAGrid launches the ``primary`` node that is our Domain Node. - -2. **Where are you going to launch this node to?** -We need to specify that we want to launch it to the ``docker container`` at -port ``8081``. - -3. **What is the name of your Domain Node going to be?** -For that, don’t forget to specify the ``DOMAIN_NAME`` to your -preference. - -After completing the Install Wizard, run the cell below to launch your very first domain node. - -:: - - In: - - # edit DOMAIN_NAME and run this cell - - DOMAIN_NAME = "My Domain" - - !hagrid launch {DOMAIN_NAME} to docker:8081 --tag=latest - -While this command runs, you will see various ``volumes`` and -``containers`` being created. Once this step is complete, move on to -the next step, where we will learn to monitor the health of -our Domain Node. - -Step 3: Checking your Domain Server -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -One exciting benefit of HAGrid is that it makes it easier for your organization/ IT department -to ``monitor`` & ``maintain`` the status of your system as you move forward with other steps. -Let's do a quick health check to ensure the Domain is up and running. - - -:: - - In: - - # run this cell - !hagrid check localhost:8081 - - Out: - - Detecting External IP... - ┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━┓ - ┃ PyGrid ┃ Info ┃ ┃ - ┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━┩ - │ host │ 20.31.143.254 │ ✅ │ - │ UI (βeta) │ http://20.31.143.254/login │ ✅ │ - │ api │ http://20.31.143.254/api/v1 │ ✅ │ - │ ssh │ hagrid ssh 20.31.143.254 │ ✅ │ - │ jupyter │ http://20.31.143.254:8888 │ ✅ │ - └───────────┴─────────────────────────────┴────┘ - -If your output is similar to the above image, voila!! A -``Domain`` ``Node`` was just ``born``. When it’s ready, you will see the -following in the output: - -- **host:** ``IP address`` of the launched Domain Node. -- **UI (Beta):** Link to an ``admin portal`` that allows you to - control Domain Node from a web browser. -- **api:** ``Application layer`` that we run in our notebooks to make - the experience more straightforward and intuitive. -- **Ssh:** ``Key`` to get into virtual machine. -- **jupyter:** Notebook ``environment`` you will use to upload your - datasets. - -Congratulations 👏 You have now successfully deployed a Domain Server! -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Now what? ---------- - -Once you, as a Data Owner, have deployed your Domain Node representing your theoretical organization's -private data server, the next step is to :doc:`Upload Private Data to a Domain Server <01-upload-data>` for research or project use. - - In our following tutorial, we will see how you as a Data Owners can preprocess the data, mark it with correct - metadata and upload it to the Domain Node you've just deployed. - -.. |00-deploy-domain-00| image:: ../../_static/personas-image/data-owner/00-deploy-domain-00.gif - :width: 95% - -.. |00-deploy-domain-01| image:: ../../_static/personas-image/data-owner/00-deploy-domain-01.jpg - :width: 95% diff --git a/docs/source/guides/data-owner/01-upload-data.rst b/docs/source/guides/data-owner/01-upload-data.rst deleted file mode 100644 index 3eea4bda6ca..00000000000 --- a/docs/source/guides/data-owner/01-upload-data.rst +++ /dev/null @@ -1,252 +0,0 @@ -Uploading Private Data to a Domain Server -============================================================ - -**Data Owner Tutorials** - -☑️ 00-deploy-domain - -◻️ 01-upload-data👈 - -.. note:: - **TIP:** To run this tutorial interactively in Jupyter Lab on your own machine type: - -:: - - pip install -U hagrid - hagrid quickstart data-owner - - - -Welcome back to another Data Owner tutorial. In the last tutorial, -you learned :doc:`How to Deploy a Domain Server <00-deploy-domain>` that represents -your organization’s private data servers. But right now, -the node you just deployed is empty. - -After today’s tutorial, you will learn how to ``upload data`` to your new -``domain node``, which involves annotating and doing ETL before -uploading it to our Domain Node/server. - - **Note:** Throughout the tutorials, we also mean Domain Servers - whenever we refer to Domain Node. Both mean the same and are used - interchangeably. - -Step to Upload Private Data ---------------------------- - -📒 Overview of this tutorial: - -#. **Preprocessing** of Data -#. **Marking** it with correct metadata -#. **Uploading** data to Domain Server - -|01-upload-data-00| - -Step 1: Import Syft -~~~~~~~~~~~~~~~~~~~ - -To utilize the privacy-enhancing features offered in PyGrid and to -communicate with your domain node, you must first ``import`` OpenMined's -``private`` deep learning library: PySyft. - -Lets import Syft by running the below cell: - -:: - - In: - - # run this cell - try: - import syft as sy - print("Syft is imported") - except: - print("Syft is not installed. Please use the 🧙🏽‍♂️ Install Wizard above.") - - Out: Syft is imported - -.. _step2: - -Step 2: Log into Domain -~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -By default, only the Domain node ``Admin`` can upload data, -so to upload your data, you will need to first login as the admin. -(*Upload data permissions can be customized after logging into the domain node.*) - -To login to your Domain node, you will need to define which Domain you are logging into and who you are. In this case, it will take the form of: - -* IP Address of the domain host -* Your user account Email and Password - - **WARNING:** Change the default username and password below to a more secure and private combination of your preference. - -:: - - In: - - # run this cell - try: - domain_client = sy.login( - port=8081, - email="info@openmined.org", - password="changethis" - ) - except Exception as e: - print("Unable to login. Please check your domain is up with `!hagrid check localhost:8081`") - - Out: - - Connecting to 20.253.155.183... done! Logging into openmined... done! - -Lovely :) You have just logged in to your Domain. - -.. note:: - Steps to change the default admin credentials for Domain Owner are shown below 👇 - -|01-upload-data-01| - - -Step 3: Prepare Dataset -~~~~~~~~~~~~~~~~~~~~~~~ - -For this tutorial, we will use a simple dataset of four peoples ``ages``. - - -:: - - In: - - # run this cell - try: - import pandas as pd - data = {'ID': ['011', '015', '022', '034'], - 'Age': [40, 39, 9, 8]} - - dataset = pd.DataFrame(data) - print(dataset.head()) - except Exception: - print("Install the latest version of Pandas using the command: %pip install pandas") - - Out: - - ID Age - 011 40 - 015 39 - 022 9 - 034 8 - -.. _step4: - -Step 4: Annotate Data for Automatic DP -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Now that we have our dataset, we can begin annotating it with -privacy-specific metadata called Auto DP metadata. Auto DP -metadata allows the PySyft library to protect and adjust the -visibility different Data Scientists will have into any one of -our data subjects. ``Data Subjects`` are the entities whose privacy -we want to protect. So, in this case, they are the individual -family members. - -.. note:: - In order to protect the ``privacy`` of the people within our dataset we - first need to specify who those people are. In this example we have - created a column with unique ``ID’s`` for each person in this dataset. - -Important steps: -^^^^^^^^^^^^^^^^ - -- ``data subjects`` are entities whose privacy we want to protect -- each feature needs to define the appropriate ``minimum`` and - ``maximum`` ranges -- when defining min and max values, we are actually defining the - ``theoretical`` amount of values that could be learned about that - aspect. -- To help obscure the variables someone may learn about these datasets - we then need to set an appropriate ``lower_bound`` to the ``lowest`` possible persons age ``(0)``, - and the ``upper_bound`` to the ``highest`` possible (mostly) persons age ``(100)``. - - -:: - - In: - - # run this cell - data_subjects = sy.DataSubjectArray.from_objs(dataset["ID"]) - - age_data = sy.Tensor(dataset["Age"]).annotate_with_dp_metadata( - lower_bound=0, upper_bound=100, data_subjects=data_subjects - ) - -.. - - **Note:** If your project has a training set, validation set and test - set, you must annotate each data set with Auto DP metadata. - -.. _step5: - -Step 5: Upload the Dataset -~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Once you have prepared your data, it’s time to upload it to the Domain -node. To help Data Scientists later ``search`` and ``discover`` our -datasets, we will add details like a ``name`` and a ``description`` of -what this dataset represents. - - **Note:** If your project has a train, validation and test set, you - need to add them as assets. In this case, it is just our age column. - -:: - - In: - - # run this cell - domain_client.load_dataset( - name="Family_Age_Dataset", - assets={ - "Age_Data": age_data, - }, - description="Our dataset contains the Ages of our four Family members with unique ID's. There are 2 columns and 4 rows in our dataset." - ) - - Out: - - Dataset is uploaded successfully !!! - - -Step 6: Check the Dataset -~~~~~~~~~~~~~~~~~~~~~~~~~~~ -To ``check`` the dataset you uploaded to the Domain Node, go ahead and -run the below command, and it will list ``all`` the datasets on this -Domain with their Names, Descriptions, Assets, and Unique IDs. - -:: - - In: - - # run this cell - domain_client.datasets - -Awesome 👏 !! You have uploaded the dataset onto your Domain Server! -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -By uploading the dataset onto the Domain Node, Data Owners are opening -up the possibilities of different Data Scientists being able to study it -without downloading it and without the Data Owners doing any -experiment-specific work while Data Scientists are studying their -private data. - -What’s Next? ------------- -Alright, so we have walked through :doc:`How to deploy a -Domain Node <00-deploy-domain>` and :doc:`How to prepare and upload a dataset to that Domain -Node <01-upload-data>` so that Data Scientists can study our datasets without being -able to download them. - - In the following tutorial, we will see how Data Scientists can find - datasets and work across all the different Domain nodes. - -.. |01-upload-data-00| image:: ../../_static/personas-image/data-owner/01-upload-data-00.jpg - :width: 95% - -.. |01-upload-data-01| image:: ../../_static/personas-image/data-owner/01-upload-data-01.gif - :width: 95% \ No newline at end of file diff --git a/docs/source/guides/data-owner/02-create-account-configure-pb.rst b/docs/source/guides/data-owner/02-create-account-configure-pb.rst deleted file mode 100644 index 9d98384d4a8..00000000000 --- a/docs/source/guides/data-owner/02-create-account-configure-pb.rst +++ /dev/null @@ -1,328 +0,0 @@ -Creating User Accounts on your Domain Server -=============================================== - -**Data Owner Tutorials** - -☑️ :doc:`00-deploy-domain <00-deploy-domain>` - -☑️ :doc:`01-upload-data <01-upload-data>` - -◻️ 02-create-account👈 - -HAGrid Quickstart Setup ---------------------------- - -To run this tutorial interactively in Jupyter Lab on your own machine type, -you need to start a ``HAGrid Quickstart environment`` as follows: - -:: - - pip install -U hagrid - hagrid quickstart data-owner - - -If you already have a HAGrid Quickstart environment operating, run the following to download the tutorials notebooks: - -:: - - from hagrid import quickstart - quickstart.download(“data-owner”) - - ------ - - -Domain Owners can directly ``create`` user accounts for Data Scientists to use their -domain nodes. When the domain owner creates a new user account, by default that user -will have the lowest level of permissions to access that data (means data is highly private) -and will be assigned ``0`` Privacy Budget. - -In today's tutorial we will learn how to create a user account, how to check permissions, -and how to assign a privacy budget to that user. Then we'll touch on why setting a privacy -budget is important later in your workflow. - - -🚨 Pre-Requisites Steps ---------------------------- - -Before you can create user accounts on your domain, you have to first: - -#. :ref:`Annotate your dataset with the appropriate DP metadata ` -#. :ref:`Upload your dataset to Domain Server ` - -.. note:: - The above prerequisite steps are covered in the previous tutorial :doc:`How to upload private data to the Domain - Node <01-upload-data>`. Please execute those steps before implementing this tutorial. - -📒 Overview of this tutorial ------------------------------- - -#. **Import** Syft & **Login** to Domain Server -#. **Define** account credentials -#. **Check** account permissions - -|02-create-account-configure-pb-00| - -Step 1: Import Syft & Login to Domain Server -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -To utilize the privacy-enhancing features offered in PyGrid and to -create an account for the user, you must first ``import`` OpenMined's -``private`` deep learning library: PySyft. - -Let's import Syft by running the below cell: - -:: - - In: - - # run this cell - try: - import syft as sy - print("Syft is imported") - except: - print("Syft is not installed. Please use the 🧙🏽‍♂️ Install Wizard above.") - - Out: Syft is imported - -To login to your Domain node, you will need to define which Domain you are logging into and who you are. In this case, it will take the form of: - -* IP Address of the domain host -* Your user account Email and Password - -.. WARNING:: - ``info@openmined.org`` and ``changethis`` are the default admin credentials for any domain node that is launched by - the user in the documentation. Change the default email and password below to a more secure and - private combination of your preference. - -:: - - In: - - # run this cell - try: - domain_client = sy.login( - port=8081, - email="info@openmined.org", - password="changethis" - ) - except Exception as e: - print("Unable to login. Please check your domain is up with `!hagrid check localhost:8081 --silent`") - - Out: - - Connecting to 20.253.155.183... done! Logging into openmined... done! - -Lovely :) You have just logged in to your Domain. - - -Step 2: Create a User Account -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -After you have launched and logged into your domain as an ``admin``, you can create user accounts for others to use. - -.. WARNING:: - In this case, we will create an account for a Data Scientist from within our own team or organization. - -.. note:: - You should only create direct user accounts on your domain node for those who have been - appropriately vetted and verified by your organization. To expand research done on your - datasets to those not directly within or verified by your organization, you should ``connect`` - your ``domain`` to one or more networks so that proper verification measures have been taken. - You can learn more about this in our "Connect Your Domain to a Network" tutorial. - -There are ``three`` different ways for a new user account to be created on your domain. - -* **Option A**, by a Domain Owner creating a new user account and specifying their - credentials directly through the notebook API. -* **Option B**, by a Domain Owner creating a new user account and specifying their credentials - through PyGrid’s default UI interface. -* **Option C**, by a potential user finding or being given the Domain node’s profile URL and - submitting an application that a Domain Owner can triage. (This functionality is currently in Beta). - -.. note:: - In all three cases, the user of your domain will be assigned the role of Data Scientist by default. - -A. Using PySyft: Create account from Domain Client -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -To create a Data Scientist account for someone within your team or organization, you need to tell your Domain 4 things: - -#. **Name**: Name of the individual -#. **Email**: Associated email address of the individual -#. **Password**: Password they would need to login into your domain (this can be changed later when they customize their ``account settings``) -#. **Budget**: When you specify a ``budget``, you assign this account with a ``privacy budget`` of ``0``. This privacy budget, set in units of ``epsilon``, is the limiter that blocks a data scientist from knowing too much about any one data subject in your dataset. - - **Note:** In future exercises, we will explore how privacy budget limits affect data subject visibility. - Still, for now, we will set the ``privacy budget`` to its default of ``0`` (means data is highly private), - the lowest level of permission to access the data. - Also, by default, the role assigned to a user is a Data Scientist. - -:: - - In: - - # run this cell - data_scientist_details = domain_client.create_user( - name="Jane Doe", - email="jane@email.com", - password="supersecurepassword", - budget=0 - ) - - Out: - - User created successfully! - -Once you have created an account, you can ``verify`` if the user account was made successfully. - -:: - - In: - - # list the users that have registered to the domain - domain_client.users - -Print the details of the account you created and share the ``credentials`` with the Data Scientists. - -:: - - In: - - # run the cell then copy the output - print("Please give these details to the Data Scientists ⬇️") - print(data_scientist_details) - - Out: - - Please give these details to the Data Scientists ⬇️ - {'name': 'Jane Doe', 'email': 'jane@email.com', 'password': 'supersecurepassword', 'url': '20.253.155.183'} - - -B. Using PySyft: Create account from Domain URL -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -A user can also ``sign-up`` or create an account on a Domain node if they have access to the ``URL`` to the Domain. -Instead of creating an account individually for each Data Scientist, a Data Owner can ``share`` the URL to their -Domain node and ask their team members to ``register`` to the Domain. - -To register to a Domain, you need the following details: - -#. **Name**: Name of the individual -#. **Email**: Email of the individual that will be used to log into the Domain -#. **Password**: A secured password to log into the Domain -#. **Url**: Url to the domain node. -#. **Port**: Port number - -:: - - In: - - # run this cell - import syft as sy - domain_client = sy.register( - name="Jane Doe", - email="jane@email.com", - password="supersecurepassword", - url="localhost", - port=8081 - ) - -On successful registration, the user is auto-logged into the domain. - -.. note:: - By default the role assigned to the registered user is of a ``Data Scientist`` and the assigned ``privacy budget`` is ``0``. The future tutorial series will cover a better explanation of `setting the privacy budget`. - -C. Using PyGrid UI: Create account as a Domain Admin -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -PyGrid's UI is meant to help Domain Owners get a bigger picture view of their domains and manage them. - -When we use the ``hagrid launch`` command to start our private data server, we define the ``port`` where -we want to launch the server. By default, the port is launched at ``8081``. - - **Note:** Make sure your docker application is up and running in the background. - -We will use this ``port number`` to visit the following UI interface at the URL: - -:: - - http://localhost: - - e.g. - - http://localhost:8081 - - -Once you are on PyGrid's web page, execute following steps to create an account for Data Scientist: - -.. WARNING:: - ``info@openmined.org`` and ``changethis`` are the default admin credentials for any domain node that is launched by - the user in the documentation. Change the default email and password below to a more secure and - private combination of your preference. - -#. Login using your admin credentials (**Email:** info@openmined.org | **Password:** changethis) -#. Create a new user account by clicking on the ``+ Create User`` button -#. Specify the following fields - * **Name**: Name of the individual - * **Email**: Email of the individual that will be used to log into the Domain - * **Password**: A secured password to log into the Domain - * **Role**: Assign them the role of Data Scientist (By default user account will take the role with the lowest amount of permission which in this case is the **Data Scientist** role.) -#. Set appropriate Privacy Budget (By default, they have ``0e`` privacy budget) - -|02-create-account-configure-pb-04| - - -Step 3: Check Permissions -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Now that we have created an account for our Data Scientist, let's check to see if it -was made and if we need to change any permissions. - -.. note:: - Permissions are determined by the ``role`` a user has been assigned by the Data Owner. - By default a user will be created with the role with the ``lowest`` set of ``permissions``. - To simplify the concepts, let us consider the below scenario. - -Scenario ------------ - -Let's login to our PyGrid's UI as we did earlier when we had to create an account -for the user in the prior steps. On the homepage, go to the ``Permissions`` tab, -where you will notice the different roles and associated permissions with them. - -.. note:: - Each role has a set of default ``permissions``, but they can be changed according to the norms of each organization. - -|02-create-account-configure-pb-01| - -#. **Data Scientist (default)**: This role is for users who will be performing computations on your datasets. They may be known users or those who found your domain through search and discovery. By default, this user can see a list of your datasets and can request to get results. This user will also be required to sign a Data Access Agreement if you have required one in the Domain Settings Configurations. -#. **Compliance Officer**: This role is for users who will help you manage requests made on your node. They should be users you trust. They cannot change domain settings or edit roles but are, by default, able to accept or deny user requests on behalf of the domain node. -#. **Administrator**: This role is for users who will help you manage your node. These should be users you trust. The main difference between this user and a Compliance Officer is that this user, by default, not only can manage requests but can also edit Domain Settings. This is the highest level of permission outside of an Owner. -#. **Owner**: Only one Owner account is assigned to any domain node. The owner account is the highest level permission and is a requirement for deploying a domain node. If you ever want to transfer ownership of your domain node to someone else, you can do so by following these steps. - -Suppose you created a user account for a person named ``John Smith``; by default, -the role assigned to John will be a ``Data Scientist``. But you want to change the -role of John to ``Data Protection Officer`` instead of a Data Scientist. - -#. Select the user and click on its name. -#. Go to ``Change role``, and in the drop-down option, select ``Compliance Officer``. -#. You can see the permissions given to the Compliance Officer below their role. The default permissions can be changed in the ``Permissions`` tab, as shown in the above image. -#. Click ``Change Role``, and the role of John Smith has now successfully changed to the Compliance Officer. - -|02-create-account-configure-pb-02| - - -Now our domain node is available for the data scientists to use 👏 ---------------------------------------------------------------------- - -.. |02-create-account-configure-pb-00| image:: ../../_static/personas-image/data-owner/02-create-account-configure-pb-00.jpg - :width: 95% - -.. |02-create-account-configure-pb-01| image:: ../../_static/personas-image/data-owner/02-create-account-configure-pb-01.gif - :width: 95% - -.. |02-create-account-configure-pb-02| image:: ../../_static/personas-image/data-owner/02-create-account-configure-pb-02.gif - :width: 95% - -.. |02-create-account-configure-pb-04| image:: ../../_static/personas-image/data-owner/02-create-account-configure-pb-04.gif - :width: 95% diff --git a/docs/source/guides/data-owner/03-join-network.rst b/docs/source/guides/data-owner/03-join-network.rst deleted file mode 100644 index 89a227a7f62..00000000000 --- a/docs/source/guides/data-owner/03-join-network.rst +++ /dev/null @@ -1,169 +0,0 @@ -Joining a Network -=============================================== - -**Data Owner Tutorials** - -☑️ 00-deploy-domain - -☑️ 01-upload-data - -☑️ 02-create-account - -◻️ 03-join-network👈 - -.. note:: - **TIP:** To run this tutorial interactively in Jupyter Lab on your own machine type: - -:: - - pip install -U hagrid - hagrid quickstart data-owner - - -A Network Node is a node that connects different domains to a broader base of data scientists (also known as a network's members). It is a server which exists outside of any data owner's institution, providing search & discovery, VPN, and authentication services to the network of data owners and data scientists. - -.. note:: - Data is only stored on the separate Domain Servers. Network Nodes do not contain data, they simply provide an extra layer of services to Domain Nodes and Data Science users. - -Let us give an example: assume you are in a hospital and the hospital has different cancer-related datasets hosted on their domain. The hospital now wants to increase the research impact their datasets can have but does not want to do so at the cost of risking a privacy leak nor at the risk of moving their data. By joining a network (for example one hosted by WHO) a Domain Owner can increase the searchability of their datasets to appropriate audiences without those datasets needing to leave the Domain servers. - -In today's tutorial we will learn how to join a network and apply our domain to it. - - -🚨 Pre-Requisites Steps ---------------------------- - -Before you can create user accounts on your domain, you have to first: - -* `Login to your Domain Node` - -.. note:: - The above prerequisite step is covered in an existing tutorial `How to deploy a Domain Node `_. Please execute those steps before implementing this tutorial. - -📒 Overview of this tutorial --------------------------------- - -#. **Login** to your Domain Server -#. **Finding** a Network -#. **Applying** our Domain to the Network -#. **Verifying** our Domain on the Network - -Step 1: Import Syft -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Syft is the main library our Domain servers run off of, so to start we will need to import Syft so that our methods in later steps will work. -:: - - In: - - # run this cell - import syft as sy - - -Step 2: Login to your domain -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Once you have imported syft, and have your domain node up along with it's credentials available to you, connect and login to the domain hosted at the URL generated on the Step 4 of the Deploy Domain notebook. - -.. WARNING:: - The below cell has default credentials, please change accordingly. - -:: - - In: - - # run this cell - domain_client = sy.login( - url="http://localhost:8081/", email="info@openmined.org", password="changethis" - ) - -Step 3: Fetch all Available Networks -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Now we’ve come to the main part, let’s take a look at what networks are available for us to join. -The command below will fetch all of the currently available networks, this list may change as more networks get created or as they go on and offline. - -:: - - In: - - # run this cell - sy.networks - -You can now choose the network that suits best your needs. After looking at the available networks, let’s choose a network that best fits our domain. For this tutorial we are going to choose the **OpenMined** network. - -Step 4: Connect to the Network -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -In future iterations of PyGrid Network nodes will be able to have domains join as Members or as Guests, but in today’s current iteration of PyGrid all domains start out by joining as Guests. To apply to a network as a guest we first need to connect to the network server. - -Connecting to a network can be done via it's name/URL/index in the above list. - -:: - - In: - - # run this cell - network_client = sy.networks[0] - -On successful login, the `network_client` will contain an authenticated client to the network. - -Step 5: Fetch all Domains on the Network -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Now that we have an authenticated client with the network, let's fetch and see the currently connected domains on the network. - -We can list all of them with the below command: - -:: - - In: - - # run this cell - network_client.domains - -Since we have not applied our domain yet, it should not be visible on the output of the above command. - -Step 6: Apply our Domain to the Network -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -In this step, we will be joining the OpenMined network. If our application to join gets accepted, our domain will then be listed among the available domains on this network which will help Data Scientists find and work from our datasets. - -.. note:: - This step might have multiple retries before actually getting connected, so please don’t worry! - -The below command will apply our domain node to the network we just authenticated with - -:: - - In: - - # run this cell - domain_client.apply_to_network(network_client) - - -Step 7: Verify our Domain on the same Network -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -In this step, we will verify whether we have successfully joined the network node or not. We will do this by listing the domains available on this network and seeing whether our domain appears. - -:: - - In: - - # run this cell - network_client.domains - -If you can see your domain's name here, then hoorah! - -If you haven't, don’t worry, go through the above steps and see if you missed anything. - -Step 8: Verify the VPN status -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Now, let us verify that our domain is succesfully connected to the Network node via VPN. - -Run the cell below as mentioned: - -:: - - In: - - # run this cell - domain_client.vpn_status() - -You should receive the domain ID in the `peers list` in the connected field. This confirms our connection to the network, Yay! - -Now our domain node applied on the network and we have succesfully joined it!👏 \ No newline at end of file diff --git a/docs/source/guides/data-owner/04-configure-pb.rst b/docs/source/guides/data-owner/04-configure-pb.rst deleted file mode 100644 index 3fbbde4ed70..00000000000 --- a/docs/source/guides/data-owner/04-configure-pb.rst +++ /dev/null @@ -1,369 +0,0 @@ -Configuring Privacy Budget on your Domain Server -================================================== - -**Data Owner Tutorials** - -☑️ 00-deploy-domain - -☑️ 01-upload-data - -☑️ 02-create-account - -☑️ 03-join-network - -◻️ 04-configure-pb👈 - -.. note:: - **TIP:** To run this tutorial interactively in Jupyter Lab on your own machine type: - -:: - - pip install -U hagrid - hagrid quickstart data-owner - - -A privacy budget is a collection of quantitative measures through which a Data Owner can -pre-determine the degree of information access they grant to a user using their domain server. -For PyGrid, you can think of a privacy budget as a specified limit to the ``visibility`` a user -can have into any one data subject on your domain server. -As we saw in the :doc:`creating user accounts tutorial <02-create-account-configure-pb>`, when you -create a user account in PyGrid, by default that user is assigned the lowest level of ``permissions`` -and is given a privacy budget of ``0`` which means that they have ``0`` visibility into your domain’s data subjects. - -In today's tutorial, you will discover the underlying concept behind Differential Privacy and -how setting a privacy budget for a user determines how much can be learned from any data subject - - -🚨 Pre-Requisites Steps ---------------------------- -Before you can specify a privacy budget for your domain users, you must first ``prepare`` the dataset, ``upload`` it, and -``create`` a user account for your team members or Data Scientists. -The prerequisite steps are covered in the previous -tutorial :doc:`Creating User Accounts on your Domain Server <02-create-account-configure-pb>` and -:ref:`Uploading Private Data to a Domain Server `. -Please execute those steps before implementing this tutorial. - -📒 Overview of this tutorial ---------------------------- - -#. **Introduction** to Differential Privacy -#. **Login** to PyGrid UI as a Domain Admin -#. **Explore** different Privacy Budgets - -Step 1: Introduction to Differential Privacy -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -In this step, lets understand the concept behind differential privacy and privacy budget by considering a simple scenario. - -A. Scenario -############## -Consider there are ``500`` patients represented in ``2`` different datasets. One dataset is -about general ``medical history``; the other has some but not all of the ``500`` patients -and is focused on patients who have had ``mammography`` images taken in the past year. Now -let's say that ``Jane Doe`` is a patient in both and is open to being studied for -``breast cancer research`` as long as she can remain unidentifiable in the study. - -B. Quick Definition: Differential Privacy -############################################ -A core feature of Syft is that Syft allows you to use a ``PET(Privacy Enhancing technology)`` called -Differential Privacy to protect the ``Privacy`` of the individuals or data subjects -within your datasets. In this case, Differential Privacy is maintained when a -query across both datasets ``with`` Jane Doe in it versus that same query on both -datasets ``without`` Jane Doe creates the ``same output``. Noise is added to help average -out and make up the difference between having Jane there versus not. In other words, Jane Doe becomes a very -difficult, if not impossible, straw to find within the haystack. - -From a top-level view, this means a couple of things: - -* Differential Privacy can help a Data Scientist see trends in data ``without`` being able to ``identify`` the participants. -* The more a specific data subject involved in the query ``stands out`` in a dataset, the more noise has to be added to ``obfuscate`` them. -* There is a natural ``tradeoff`` between how much ``Privacy`` is preserved versus how much ``Accuracy`` is given to the Data Scientist.. -* You can set a privacy limit in PyGrid and trust that a Data Scientist will not be able to get answers to a query that surpasses that limit on any one ``Data Subject``. (see the image 👇 for reference) -* Data scientists can download answers that remain within specified ``privacy limits``, creating a streamlined flow where answering questions using an org's Domain Server will be as easy as going to the organization's public website. (see the image 👇 for reference) - -|04-configure-pb-02| - -C. Quick Definition: Epsilon or Privacy Budget -################################################ -Differential Privacy in practice is an algorithm that obscures an individual data subject's -contributions to the given ``results`` of a ``query``. Privacy Budget measured in units of ``Epsilon`` -is a way to measure the potential ``privacy loss`` or ``visibility`` you are allowing into any one of those data subjects. - -.. note:: - Syft specifically ``tracks`` privacy budgets against individual data subjects instead - of the ``dataset`` as a whole. This may be different from other tools that use - Differential Privacy. This allows more ``utility`` on the dataset. - -D. Takeaway -############### -When you assign a ``privacy budget`` in Syft, you specify a ``risk tolerance`` on what -level of ``visibility`` you feel comfortable having that Data Scientist have on your -data subjects. You are balancing this with keeping the ``accuracy`` they get on a -helpful level and maximizing the benefit of your dataset(s). - -Let's say, in the above scenario, you allow your ``Data Scientist`` to have ``0.5e`` to -conduct their Breast Cancer Research. You can interpret ``e`` to mean: - -* That this Data Scientist will have ``0.5x`` more ``visibility`` into any one data subject like Jane Doe -* That this Data Scientist is ``0.5x`` more likely to ``learn`` something unique about Jane Doe -* That this Data Scientist can ``learn no more than 0.5e`` on Jane Doe - -.. note:: - If a query would expose more than ``0.5e`` about ``Jane Doe``, then Jane Doe would get - dropped from the result, and noise would be used to mitigate the difference. - -Step 2: Login to PyGrid UI as a Domain Admin -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -When we use the ``hagrid launch`` command to start our private data server, we define -the ``port`` where we want to launch the server. - -.. note:: - By default, the port is launched at ``8081``. - -|04-configure-pb-00| - -We will use this port number to visit the following UI interface at the ``URL``: - -:: - - http://localhost: - - e.g. - - http://localhost:8081 - -|04-configure-pb-01| - -The default email and password for the domain are: - -* **email:** info@openmined.org -* **password:** changethis - -Once we're logged in, you can move to the next section, which explores setting a privacy budget. - -Step 3: Explore Different Privacy Budget -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -.. _step3a: - -A. Assign Data Scientist Account with 0.5e Privacy Budget -############################################################## -When you create a user account on your domain server, the privacy budget assigned to the -user is ``0e``, and the role assigned will be a data scientist by default. - -Follow the steps in the image below to change the privacy budget of our data scientist to ``0.5e``. - -.. note:: - John Smith is a Data Scientist whose account we created for demonstration purposes - in the :doc:`create user accounts tutorial <02-create-account-configure-pb>`. - -|04-configure-pb-03| - - -B. Make a Query With 0.5e Privacy Budget As a Data Scientist -################################################################# - -After you have changed the privacy budget to ``0.5e``, it's time for Domain Owners to -wear the hat of a Data Scientist. Let's make a ``query`` using 0.5e and then analyze the ``results`` -to compare how close the value of the results is to the actual value. - -Firstly, we should ``login`` to the domain as a data scientist using the same credentials through which -we created a data scientist account in :doc:`creating user accounts tutorial <02-create-account-configure-pb>`. - -The credentials to login as a Data Scientist are: - -* **Email:** janedoe@email.com -* **Password:** supersecretpassword - -.. WARNING:: - We will use the same ``age dataset`` defined in the previous tutorial to keep things simple. - So, before Data Scientists can make a ``query``, Domain Owners have - to :ref:`prepare the dataset and upload it to the Domain Servers`. -:: - - In: - - # run this cell - import syft as sy - - ds_domain_client = sy.login( - email="janedoe@email.com", - password="supersecretpassword", - port=8081, - url="localhost" - ) - -Now, as a Data Scientist, you can ``verify`` the privacy budget using the below command ⬇️ - -:: - - In: - - # run this cell - print("Allotted PB: ", ds_domain_client.privacy_budget) - - Out: - - Allotted PB: 0.5 - -Let's grab the age data from the domain and define a simple query to calculate the ``mean age``. - -:: - - In: - - age_data = ds_domain_client.datasets[0]["Age_Data"] - age_mean = age_data.mean() - age_mean_public = age_mean.publish(sigma=20) - - # Check if mean data exists - age_mean_public.exists - - # Download/Get mean age - age_mean_public.get(delete_obj=False) - - print("Remaining PB: ", ds_domain_client.privacy_budget) - - Out: - - Remaining PB: 0.000120578321 - -.. note:: - Remember, sigma represents how much noise the user wants added to the result. - The noise is selected randomly from a Gaussian distribution with sigma as the - standard deviation and zero mean. - -So the first thing we need to remember while setting ``sigma`` is that if we set a very low sigma -compared to the published value, it might not add enough noise, and the user would require a -large ``privacy budget`` to get the accurate result. - -Now we want the noise to be picked randomly with a standard deviation of ``20``. -Thus decreasing the value of ``sigma`` will result in more accurate results but at -the expense of a more privacy budget being spent and leaking more information -about private data. - -**Example:** Let's assume the value being published is ``100000``, then adding a slight noise of ``20`` -will result in ``100020``, which isn't significant noise comparatively and thus would require a large -budget to be spent. Similarly, if the value being published is ``0.1`` and you add noise of ``20``, then -the result value is ``20.1`` which is way off from the actual result and thus affects the accuracy of -the result, although having spent low PB. - -C. Make a Query With 7.5e Privacy Budget As a Data Scientist -################################################################# - -The privacy budget is cumulative and doesn't represent the actual spent value. Once something is -known, you can't remove that knowledge. Let us ``increase`` the ``privacy budget`` and perform again with -the same query as above and compare the accuracy of the result and the privacy budget spent. - -.. WARNING:: - You need to go to :ref:`Step 3.A ` and change the privacy budget to ``7.5e`` this time, as shown in the image. - -After you have changed the privacy budget to ``7.5e``, we will again make a ``query`` and then ``analyze`` the results. - -:: - - In: - - import syft as sy - - ds_domain_client = sy.login( - email="janedoe@email.com", - password="supersecretpassword", - port=8081, - url="localhost" - ) - - print("Allotted PB: ", ds_domain_client.privacy_budget) - - age_data = ds_domain_client.datasets[0]["Age_Data"] - age_mean = age_data.mean() - age_mean_public = age_mean.publish(sigma=20) - - # Check if mean data exists - age_mean_public.exists - - # Download/Get mean age - age_mean_public.get(delete_obj=False) - - print("Remaining PB: ", ds_domain_client.privacy_budget) - - Out: - - Allotted PB: 7.5 - Remaining PB: 1.0740261245118496 - -Now, if you try to view the variable `age_mean` in a new cell, you will notice three things about this pointer: - -#. **PointerID:** ID of the pointer -#. **Status [Ready/ Processing]:** Tells if the results to the pointer have been calculated or not on the server side -#. **Representation:** This shows synthetic data/ values that the pointer could represent. - -:: - - In: - - print(age_mean) - - Out: - - PointerId: da75693b1fd0439ab0a623dd183ff8ce - Status: Ready - Representation: array([64.31603086]) - - (The data printed above is synthetic - it is an imitation of the real data.) - -D. Make a Query With 10e Privacy Budget As a Data Scientist -################################################################# -For the last time, let us change the value of the ``privacy budget`` to ``10e``, perform again with the -same query as above, and compare the accuracy of the result and the privacy budget spent. - -.. WARNING:: - You need to go to :ref:`Step 3.A ` and change the privacy budget to ``10e`` this time, as shown in the image. - -After you have changed the privacy budget to ``10e``, we will again make a ``query`` and then ``analyze`` the results. - -:: - - In: - - import syft as sy - - ds_domain_client = sy.login( - email="janedoe@email.com", - password="supersecretpassword", - port=8081, - url="localhost" - ) - - print("Allotted PB: ", ds_domain_client.privacy_budget) - - age_data = ds_domain_client.datasets[0]["Age_Data"] - age_mean = age_data.mean() - age_mean_public = age_mean.publish(sigma=20) - - # Check if mean data exists - age_mean_public.exists - - # Download/Get mean age - age_mean_public.get(delete_obj=False) - - print("Remaining PB: ", ds_domain_client.privacy_budget) - - Out: - - Allocated PB: 10.0 - Remaining PB: 3.5740261245118496 - -Congratulations 👏 You have learned to configure your Privacy Budget on your Domain Server!! ----------------------------------------------------------------------------------------------- - -.. |04-configure-pb-00| image:: ../../_static/personas-image/data-owner/04-configure-pb-00.png - :width: 95% - -.. |04-configure-pb-01| image:: ../../_static/personas-image/data-owner/04-configure-pb-01.png - :width: 50% - -.. |04-configure-pb-02| image:: ../../_static/personas-image/data-owner/04-configure-pb-02.gif - :width: 95% - -.. |04-configure-pb-03| image:: ../../_static/personas-image/data-owner/04-configure-pb-03.gif - :width: 95% \ No newline at end of file diff --git a/docs/source/guides/data-owner/04-create-network.rst b/docs/source/guides/data-owner/04-create-network.rst deleted file mode 100644 index 7358d9a7932..00000000000 --- a/docs/source/guides/data-owner/04-create-network.rst +++ /dev/null @@ -1,53 +0,0 @@ -Creating a Network -=============================================== - - -What is a Network Node? ------------------------------------------------------ - -A Network Node is a node that connects different domains to a broader base of data scientists (also known as a network's members). It is a server which exists outside of any data owner's institution, providing services to the network of data owners and data scientists. - -In short, a Network node provides a secure interface between its cohorts or Domains and its members or Data Scientists. - -Let us give an example: assume you are in a hospital and the hospital has different cancer related datasets hosted on their domain. The hospital's data owners now want to increase the visibility and searchability of these datasets, so that more and more researches and doctors can utilise these datasets and advance our understanding and diagnosis of cancer. - -However, due to privacy concerns, they do not want to provide access to random actors, such as sharing the URL of the domain with everyone. In order to tackle this privacy issue and make the dataset still accessible, the domain owner can join a Network Node (for example the one hosted by WHO) hence opening the accessibility of their datasets to a much larger audience in a private and secure manner. - - -Why do you need a new Network Node? ---------------------------------------------------------------------------------- -Before requesting a Network Creation, please read the following carefully: - - -Ask yourself the below questions based on your use-case: - -* Do you want to enable data owners to host their dataset without sharing their domain URL? -* Do you have similar-purpose serving datasets? -* Do you want to improve the visibility and searchability of the datasets hosted on your Network Node? -* Do you want data scientists and researchers connect to your Network Node to perform remote Data Science? - - -If you answer the above questions with a **Yes**, then you might be looking to create your own network. Fill up the form below and we will get back to you with further instructions on how to proceed. - -.. note:: - We will be using the email you provide here for further communication. - - - -.. raw:: html - - - - - - diff --git a/docs/source/guides/data-scientist/00-connect-to-domain.rst b/docs/source/guides/data-scientist/00-connect-to-domain.rst deleted file mode 100644 index c912aefd341..00000000000 --- a/docs/source/guides/data-scientist/00-connect-to-domain.rst +++ /dev/null @@ -1,136 +0,0 @@ -Connecting to a Domain Server -==================================== - -**Data Scientist Tutorials** - -◻️ 00-connect-to-domain👈 - -◻️ 01-search-for-datasets - -.. note:: - **TIP:** To run this tutorial interactively in Jupyter Lab on your own machine type: - -:: - - pip install -U hagrid - hagrid quickstart data-scientist - - -Data Scientists are end users who want to perform ``computations`` or ``answer`` specific questions using -the dataset(s) of one or more data owners. The very first thing Data Scientists have to do in order -to submit their requests is ``login`` and ``connect`` to the Domain Server that hosts the data they would -like to make requests off of or to connect to a network by which they can search for different -datasets. Today's tutorial will show you how you as a Data Scientist can connect to an -organization's domain server using PySyft. - -For connecting to a Domain Server, we will use the login credentials assigned to us by -the Domain Owner. By default, we as Data Scientists have the lowest level of ``permission`` -to access the data (which means data is highly private) and will be assigned a Privacy Budget of ``0``. - -.. note:: - Check out this tutorial to understand how Domain Owners - can :doc:`create a user account <../data-owner/02-create-account-configure-pb>` on their Domain Servers. - - Throughout the tutorials, we also mean Data Scientists - whenever we refer to users. Both are used interchangeably. - -Steps to Connect to a Domain Server -------------------------------------- - -📒 Overview of this tutorial: - -#. **Obtain** Login Credentials -#. **Login** to the Domain as a Data Scientist -#. **Explore** some useful starting commands - - -.. note:: - PyGrid Admin (the UI) is only meant to be used by domain or data owners so a data scientist - would never login to the domain node via the UI. - -.. _step-ds-1: - -Step 1: Obtain Login Credentials -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -To utilize the ``privacy-enhancing`` features and play around with your ``privacy budget``, as a -Data Scientist you must first get your login ``credentials`` from the domain owner. -What you will need to login to the domain server is the following information: - -* email -* password -* URL of the domain -* port of the domain - -.. WARNING:: - Change the default username and password below to a more secure and private combination of your preference. - -:: - - In: - - # run this cell - import syft as sy - domain_client = sy.register( - name="Alice", - email="alice@email.com", - password="supersecurepassword", - url="localhost", - port=8081 - ) - -.. note:: - By default, the role assigned to the registered user is of a Data Scientist, and the assigned privacy budget is 0. - - -Step 2: Login to the Domain as a Data Scientist -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Once you have the above information you can open a ``Jupyter Notebook`` and begin ``logging`` into the domain server. - -To start you will need to install syft - -:: - - In: - - import syft as sy - -Then you can provide your login credentials by typing: - -:: - - In: - - domain = sy.login(email="____", password="____", url="____",port=8081) - - -Step 3: Explore some useful starting commands -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -As a Data Scientist, you can ``explore`` the Domain Server using the Python ``Syft`` library. - -.. note:: - We will explore more about each command in the next series of tutorials. - -:: - - In: - - # name of the domain - domain.name - - # View datasets on the domain - domain.datasets - - # View store on the domain - domain.store - -Awesome 👏 You have now successfully connected to a Domain Node !! -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -What's Next? ---------------- -Alright, now that you are connected to a Domain node, we would first like to look for the -available datasets on the public network which users can join. - - The following tutorial will show how Data Scientists can search for a dataset on the Domain Node. diff --git a/docs/source/guides/data-scientist/01-search-for-datasets.rst b/docs/source/guides/data-scientist/01-search-for-datasets.rst deleted file mode 100644 index af8b10abcd5..00000000000 --- a/docs/source/guides/data-scientist/01-search-for-datasets.rst +++ /dev/null @@ -1,169 +0,0 @@ -Search for Datasets a Domain Server -============================================================ - -**Data Scientist Tutorials** - -☑️ 00-connect-to-domain - -◻️ 01-search-for-datasets👈 - -.. note:: - **TIP:** To run this tutorial interactively in Jupyter Lab on your own machine type: - -:: - - pip install -U hagrid - hagrid quickstart data-scientist - - - -In the last tutorial, you learned :doc:`How to Connect to a Domain Server <00-deploy-domain>` -that allows us to connect to your organization’s private data servers. - -Once we are connected to the data servers, the first thing that we -would like to do is to look for the available datasets on it. This -is exactly what we are going to cover in this tutorial. - -After today’s tutorial, you will learn how to ``search for datasets`` -on the ``domain node`` you are connected to. - - **Note:** Throughout the tutorials, we mean Domain Servers - whenever we refer to Domain Node. Both point to the same and are used - interchangeably. - -Steps to Search for Datasets on a Domain ---------------------------- - -📒 Overview of this tutorial: - -#. **Login** to the Domain -#. **List** the Datasets on the Domain -#. **Choose** a Dataset -#. **Preview** the Description of the chosen Dataset - -|01-upload-data-00| - -Step 1: Import Syft -~~~~~~~~~~~~~~~~~~~ - -To utilize the privacy-enhancing features offered in PyGrid and to -communicate with your domain node, you must first ``import`` OpenMined's -``private`` deep learning library: PySyft. - -Let's import Syft by running the below cell: - -:: - - In: - # run this cell - - import syft as sy - print("Syft is imported") - - # If Syft is not installed. Please use the 🧙🏽‍♂️ Install Wizard above - - Out: Syft is imported - -Step 2: Log into Domain -~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Let's login to our Domain with the credentials provided to you by a domain owner. -If you do not have a domain owner, you can create one locally for yourself following -the tutorials starting here: `data-owner/00-deploy-domain <../data-owner/00-deploy-domain.html>`_. - -To login to your Domain node, you will need to define which Domain you are logging into and who you are. -In this case, it will take the form of: - -* IP Address and Port of the domain host -* Your user account Email and Password - -.. warning:: - Make sure to use the Data Scientist credentials provided to you. - -:: - - In: - - # Modify the port, email, and password accordingly! We are using the ones that will be generated for those who followed the Data-Owner tutorials and are now here. - domain_client = sy.login( - url="localhost", - port=8081, - email="jane@email.com", - password="supersecurepassword" - ) - - Out: - Connecting to ... done! Logging into ... done! - -Amazing :) You have just logged in to your Domain and have a domain client with us to explore further. - -Step 3: Search for Datasets on the Domain -~~~~~~~~~~~~~~~~~~~~~~~ - -Now that we have an authenticated domain client with -us, we will look out for the datasets available -on this domain with the following command: - -:: - - In: - domain_client.datasets - - -|01-upload-data-01-datasets| - - -This should show you all the available datasets -on the domain node along with its own metadata for -each of the datasets. - -Step 4: Select a Dataset and Preview It -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Now that we can view the available datasets, we -can fetch a dataset using the index within the -datatsets list and store a pointer to (here -called family_age_dataset) to refer to it easily afterwords. - -:: - - In: - - family_age_dataset=domain_client.datasets[0] - family_age_dataset - - -|01-upload-data-02-pointer-to-dataset| - -.. note:: - Note: We are assuming that you are following the - data-owner tutorial hence we are naming as well - as selecting the family-age dataset. Feel free to - change the variable accordingly for easier - readability based on your use case. - - -Awesome 👏 !! -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -You have fetched all the available datasets, created a pointer for one of them and preview it! - -Now that we have a pointer to a dataset on the domain, we are one step -close to performing remote data science and perform various methods. - -What’s Next? ------------- -Alright, so now is the perfect time to utilize the pointer we just created -to a dataset and explore it in detail and see the amazing operations that we -can perfrom on it. - - In the following tutorial, we will see how Data Scientists can explore - a dataset securely. - -.. |01-upload-data-00| image:: ../../_static/personas-image/data-scientist/01-search-for-datasets-00.png - :width: 95% - -.. |01-upload-data-01-datasets| image:: ../../_static/personas-image/data-scientist/01-search-for-datasets-01-datasets.png - :width: 95% - -.. |01-upload-data-02-pointer-to-dataset| image:: ../../_static/personas-image/data-scientist/01-search-for-datasets-02-pointer-to-dataset.png - :width: 95% diff --git a/docs/source/guides/index.rst b/docs/source/guides/index.rst index a302342a005..ff7da37d0b1 100644 --- a/docs/source/guides/index.rst +++ b/docs/source/guides/index.rst @@ -22,9 +22,6 @@ while using these new ``privacy-enhancing techniques``. **TIP:** To run all the tutorials interactively in Jupyter Lab on your own machine, type: :: - - pip install -U hagrid - hagrid quickstart Once you have the installation completed, the best place to start is by ``identifying`` your role. diff --git a/docs/source/install_tutorials/have_prerequisites.rst b/docs/source/install_tutorials/have_prerequisites.rst deleted file mode 100644 index 736f909d960..00000000000 --- a/docs/source/install_tutorials/have_prerequisites.rst +++ /dev/null @@ -1,56 +0,0 @@ -.. _have_prerequisites: - -================================== -I have all the dependencies -================================== - -.. toctree:: - :maxdepth: 3 - - -1. **Create a new env specifying the Python version (we recommend Python 3.8/3.9) in the terminal:** - - .. code-block:: bash - - conda create -n syft_env python=3.9 - conda activate syft_env - -2. **Install PySyft and Hagrid** - -To install the OpenMined stack that you need in order to deploy a node, please run: - -.. code-block:: bash - - pip install -U syft hagrid - - -PySyft is a library which contains the tools to run privacy preserving machine learning. -Hagrid is a commandline tool that speeds up the deployment of PyGrid, the provider of a peer-to-peer network of -data owners and data scientists who can collectively train AI model using Syft. - -3. **Launch the Doman Node** - -You only have one final step remaining now, before you unleash the power of Hagrid! -The final step is to launch a domain node, which is as easy as: - -.. code-block:: bash - - hagrid launch - -To stop the running domain, run: - -.. code-block:: bash - - hagrid land - -But before stopping it, you can go to ``localhost:8081`` in your `browser `_ to actually interact with the PyGrid Admin UI, where you can manage as a Data Owner your datasets, as well as incoming requests from data scientist. -You can log in using the following credentials: - -.. code-block:: python - - info@openmined.org - - - changethis - -Now you're all set up to fully start using PySyft! diff --git a/docs/source/install_tutorials/linux.rst b/docs/source/install_tutorials/linux.rst deleted file mode 100644 index 694b38be0a9..00000000000 --- a/docs/source/install_tutorials/linux.rst +++ /dev/null @@ -1,187 +0,0 @@ -.. _linux_install: - -================================== -Installation on Linux -================================== - -.. toctree:: - :maxdepth: 3 - -This documentation is to help you install and be able to deploy a Domain Node on Ubuntu Linux, with a version of ``20.04.03`` or newer, in the simplest way possible. - -.. note:: - Do you use a different distribution other than Ubuntu? Don't worry, just replace the ``apt`` & ``apt-get`` with your package manager. - -.. seealso:: - - For more advanced tutorials, such as cloud deployment, ansible, vagrant, kubernetes, or virtualbox deployment, please check - `advanced deployment documentation `__. - - - -1. **Launching a Terminal Instance** - -We will use the Linux Terminal to install all the prerequisites and launch the domain. A quick way to launch the terminal is by pressing ``Ctrl+Alt+T``. Let's go! - -2. **Installing Python 3.9** - -We'll be working with Python 3.9 or newer. To check if you have it installed, you may run: - -.. code-block:: bash - - python3 --version - -Your output should looks something like ``Python 3.x.y`` where x>=9. - -If you don't have the correct version of Python, installing it is as easy as running the following: - -.. code-block:: bash - - sudo apt update - sudo apt install python3.9 - python3 --version - -3. **Installing and using Pip** - -`Pip `__ is the most widely used package installer for Python and will help us to install the required dependencies MUCH easier. -You can install it by running the following: - -.. code-block:: bash - - python -m ensurepip --upgrade - -If you already have it installed, you can check to make sure it's the latest version by running: - -.. code-block:: bash - - python -m pip install --upgrade pip - -Your output should looks something like ``Requirement already satisfied: pip in ``. - -4. **Conda and setting up a virtual environment** - -Conda is a package manager that helps you to easily install a lot of data science and machine learning packages, but also to create a separated environment when a certain set of dependencies need to be installed. -To install Conda, you can: - -a. Download the `Anaconda installer `__. - -b. Run the following code, modifying it depending on where you downloaded the installer (e.g. `~/Downloads/`): - - .. code-block:: bash - - bash ~/Downloads/Anaconda3-2020.02-Linux-x86_64.sh - - .. note:: - - Please note that the naming might be different given it could be a newer version of Anaconda. - -c. Create a new env specifying the Python version (we recommend Python 3.8/3.9) in the terminal: - - .. code-block:: bash - - conda create -n syft_env python=3.9 - conda activate syft_env - - -d. To exit, you can run: - - .. code-block:: bash - - conda deactivate - -5. **Install Jupyter Notebook** - -A very convenient way to interact with a deployed node is via Python, using a Jupyter Notebook. You can install it by running: - -.. code-block:: bash - - pip install jupyterlab - -If you encounter issues, you can also install it using Conda: - -.. code-block:: bash - - conda install -c conda-forge notebook - -To launch the Jupyter Notebook, you can run the following in your terminal: - -.. code-block:: bash - - jupyter notebook - -6. **Installing and configuring Docker** - -`Docker `__ is a framework which allows us to separate the infrastructure needed to run PySyft in an isolated environment called a ``container`` which you can use off the shelf, without many concerns. -If it sounds complicated, please don't worry, we will walk you through all steps, and you'll be done in no time! -Additionally, we will also use `Docker Composite V2 `_, which allows us to run multi-container applications. - - -a. Install **Docker**: - - .. code-block:: bash - - sudo apt-get upgrade docker & docker run hello-world - -b. Install **Docker Composite V2** as described `here `__. - -c. Run the below command to verify the install: - - .. code-block:: bash - - docker compose version - - You should see somthing like ``Docker Compose version 2.x.y`` in the output when runnning the above command. - -d. If you see something else, go through the `instructions here `__ or if you are using Linux, you can try to do: - - .. code-block:: bash - - mkdir -p ~/.docker/cli-plugins - curl -sSL https://github.com/docker/compose/releases/download/v2.2.3/docker-compose-linux-x86_64 -o ~/.docker/cli-plugins/docker-compose - chmod +x ~/.docker/cli-plugins/docker-compose - -e. Also, make sure you can run without sudo: - - .. code-block:: bash - - echo $USER //(should return your username) - sudo usermod -aG docker $USER - -6. **Install PySyft and Hagrid** - -The hardest part is done! To install the OpenMined stack that you need in order to deploy a node, please run: - -.. code-block:: bash - - pip install -U syft hagrid - - -PySyft is a library which contains the tools to run privacy preserving machine learning. -Hagrid is a commandline tool that speeds up the deployment of PyGrid, the provider of a peer-to-peer network of -data owners and data scientists who can collectively train AI model using Syft. - -7. **Launch the Domain Node** - -Congrats for making it this far! You only have one final step remaining, before you unleash the power of Hagrid! -The final step is to launch a domain node, which is as easy as: - -.. code-block:: bash - - hagrid launch - -To stop the running domain, run: - -.. code-block:: bash - - hagrid land - -But before stopping it, you can go to ``localhost:8081`` in your `browser `_ to actually interact with the PyGrid Admin UI, where you can manage as a Data Owner your datasets, as well as incoming requests from data scientist. -You can log in using the following credentials: - -.. code-block:: python - - info@openmined.org - - changethis - -Now you're all set up to fully start using PySyft! diff --git a/docs/source/install_tutorials/osx_11_5_1.rst b/docs/source/install_tutorials/osx_11_5_1.rst deleted file mode 100644 index d7af44e999f..00000000000 --- a/docs/source/install_tutorials/osx_11_5_1.rst +++ /dev/null @@ -1,466 +0,0 @@ -.. _macOS_install: - -================================= -macOS Tutorial (Big Sur - 11.5.1) -================================= - -Welcome to the beginner installation tutorial for domain deployment on your personal macOS machine! - -If your macOS machine runs on M1, follow accurately the special steps listed for your machine in the tutorial. - -Step 1: Double check macOS version (optional) -============================================= -Before you start this tutorial, let's make sure you're running the right version of -macOS. Click the Apple logo at the top left corner, then click "About this Mac" and you'll -see something like: - -|find_osx_version| - -See where this image says "11.5.1"? Yours should say the same! If it does, then you're -ready to begin! - - -Step 2: Open Terminal -===================== - -Almost every step of this tutorial will be conducted within the Terminal app of macOS. Start by -opening up the Terminal application by typing and typing "Terminal". Then hit . -When Terminal opens, it should look something like this (colors may differ). - -|osx_terminal| - -If you see something like this (again... colors my differ), then you're all set to proceed to the next step! - -Step 3: Install Conda -===================== - -(These steps are from https://docs.anaconda.com/anaconda/install/mac-os/ and are copied here -for your convenience and clarity. If any part of your installation doesn't work, please fall -back on the official documentation page.). - -* Step 3.1: Open the Anaconda Installer download page by clicking `here `__. -* Step 3.2: Find the big green "Download" button and click it. It looks like this: - - |conda_button| - -* Step 3.3: When prompted with the download, click 'Save' (saving to your Desktop is fine) - - |click_save| - -* Step 3.4: Navigate to where you saved the file (probably either your Desktop or Downloads folder), and double click the icon. - - When you do so, you might see a warning like the following: - - |conda_icon| - - If so, just click 'Allow' and then you'll see a screen like: - - |conda_install_1| - -* Step 3.5: Click "Continue" and you'll see a screen like this: - - |conda_install_2| - -* Step 3.6: Click "Continue" and you'll see a screen like this: - - |conda_install_3| - -* Step 3.6: Click "Continue" and you'll see a screen like this: - - |conda_install_4| - -* Step 3.6: Click "Accept" and you'll see a screen like this: - - |conda_install_5| - -* Step 3.6: Click "Install" and you'll see a screen like this: - - |conda_install_6| - - After a moment or two a popup will appear like this: - - |conda_install_6_popup| - - Click "OK" and keep waiting... - - After a moment or two a popup *might* appear like this: - - - - Click "OK" and keep waiting... - - While you wait... if you see a dialog like this... - - |conda_install_6_popup_already_installed| - - Then you already have conda installed. Click "OK" and then click "Continue" - until the installation dialog finishes (It'll tell you the installation "Failed" - but that's only because you already have conda installed. ) and then proceed to - Step 4 of this tutorial. - - If, however, you didn't get a warning saying that conda was already installed, - proceed to step 3.7. - -* Step 3.7: Keep waiting until the window changes to this: - - |conda_install_7| - -* Step 3.8: Click "Continue" and you'll see a screen like this: - - - - CONGRATULATIONS!!!! You installed Anaconda!!! You may click the "Close" button and - proceed to Step 4. - -Step 4: Activate Conda Environment -================================== - -* Step 4.1: If you have the 'Terminal' app open from Step 2, quit it (CMD-Q) and -re-open it using the same technique you used in Step 2 to open the application. - (This is to ensure that Terminal is aware of your new conda installation.) - -* Step 4.2: Check to make sure conda is properly installed - - In your freshly opened Terminal window, type the following: - - .. code-block:: bash - - conda --v - - This should print something like "conda 4.10.1". If instead says "conda not found", - return to Step 3 and re-install conda. - -* Step 4.3: Update Conda - - .. code-block:: bash - - conda update conda --y - -* Step 4.4: Create conda virtual environment with Python 3.9 - - .. code-block:: bash - - conda create -n syft_env python=3.9 --y - -* Step 4.5: Activate conda environment - - .. code-block:: bash - - conda activate syft_env - - When you run this command, you'll see the word 'syft_env' in your terminal to indicate that you're - now in the syft virtual environment. For the rest of this tutorial, enter all of your commands - into this particular terminal. If ever you close this window, when you re-open a new Terminal - window, just re-run this step (4.5) and you'll be ready to start again! - - -Step 5: Install Necessary Python Packages -========================================= - -* Step 5.0: If you closed your Terminal window since Step 4, open a new Terminal application window and run the following. - - .. code-block:: bash - - conda activate syft_env - - If your Terminal window is still open from Step 4, you can skip this step and proceed directly to step 5.1. - -* Step 5.1: Update Pip - - Within our virtual environment, we're going to use the 'pip' package manager to install all of our - necessary python libraries. But before we do, we need to make sure we're running the latest version of pip. - You can do so by running the following command. - - .. code-block:: bash - - pip install --upgrade pip - -* Step 5.2: Install Jupyter Lab - - .. code-block:: bash - - pip install jupyterlab - - If you encounter an error when running this command, try the following instead: - - .. code-block:: bash - - conda install -c conda-forge jupyterlab - -* Step 5.3: Confirm you have git installed - - For the python package in step 5,4, you'll need to have git installed. - Most modern macOS machines come with git already installed, but if the following - command doesn't work for you... - - .. code-block:: bash - - git --version - - ...then follow git's installation instructions for macOS here: https://git-scm.com/book/en/v2/Getting-Started-Installing-Git - -* Step 5.4: Install Hagrid - - .. code-block:: bash - - pip install -U hagrid - -* Step 5.5: Install Syft - - .. code-block:: bash - - pip install -U syft - - -Step 6: Install Docker -====================== - -* Step 6.0: If you are using Apple MacOS M1 device, install Rosetta2 prior to installing Docker: - - .. code-block:: bash - - softwareupdate --install-rosetta - -* Step 6.1: Open the macOS Docker Install Page: https://docs.docker.com/desktop/mac/install/ - - |docker_install_1| - -* Step 6.2: Click whichever button corresponds to the chip in your macOS ('Mac with Intel chip' if you're not sure). When you do so you'll see something that looks like this: - - |docker_install_2| - -* Step 6.3: Once you hit Save, Docker.dmg will save onto your hard disk (likely in Desktop or Downloads). Find it and double click it. - - |docker_install_3| - -* Step 6.4: Once you double clicked Docker.dmg, a window should come up that looks like: - - |docker_install_4| - -* Step 6.5: In the window that appeared, drag the Docker logo into the Applications folder. A dialog might appear which takes a few minutes to disappear as files are transferred. - -* Step 6.6: Once the dialog closes, find the 'Docker' application in your Applications folder: - - |docker_install_5| - -* Step 6.7: When you double-click it, you'll see a dialog like the following: - - - -* Step 6.8: Click "Open" and after a few moments the following screen will appear: - - |docker_install_6| - -* Step 6.9: The yellow color in the bottom right means Docker is still booting. Wait until it's green to proceed. It will look like: - - |docker_install_12| - - Do not close docker. Proceed to the next step. - - -Step 7: Increase the RAM Docker uses to 8GB -=========================================== - -* Step 7.0: If the Docker window has been closed, look at teh top bar of your screen on the right for a small whale logo that looks like this: - - |docker_logo| - - Click the logo and then click "Dashboard" to bring up the Docker window you may recognize from Step 6. - - |docker_install_7| - -* Step 7.1: Click the Gear icon in the top right corner of the Docker window and you'll see a screen like so: - - |docker_install_8| - -* Step 7.2: Click "Resources" and the window will change to: - - |docker_install_9| - -* Step 7.3: Drag the small blue circle next to "Memory" until 8GB of memory have been allocated. Your window will look like this. - - |docker_install_10| - -* Step 7.4: Click "Apply & Restart" and then wait until the bottom left tab returns from Yellow to Green. - - - -* Step 7.5: You are done! You may now proceed back to the main Docker dashboard by clicking the "X" in the top right corner of the docker window, taking you to a dashboard that looks like: - - |docker_install_11| - -Congratulations! You're now fully installed and ready to go!!! You may now close your Terminal! - -Step 8: Test Hagrid (optional) -============================== - -* Step 8.1: Launch Hagrid - - Just to make sure our installation is correct and working. Open a new terminal and run the following: - - .. code-block:: bash - - conda activate syft_env - hagrid launch test - - Wait several minutes. You should see LOTS of logging. The logging will occasionally hang during downloads. If your - internet is slow you'll need to be patient. The logging should eventually stop with the message "Application startup complete." - - |hagrid_startup_complete| - - You can then load "http://localhost:8081" to see a UI deployed which looks like: - - |pygrid_ui| - - Congratulations! Looks like everything was installed properly! - -* Step 8.2: Launch Juptyer Lab - - With hagrid still running, open a new terminal (Command-N if you have Terminal selected) and run the following: - - .. code-block:: bash - - conda activate syft_env - jupyter lab - - A new browser window should open up. - - |syft_1| - -* Step 8.3: Open a new Jupyter Notebook by clicking the "Python 3" square icon (with the python logo). The window will change to: - - |syft_2| - -* Step 8.4: Enter the following code into the top cell and then hit "Shift Enter". - - - .. code-block:: python - - import syft as sy - domain = sy.login(email="info@openmined.org", password="changethis", port=8081) - - - After typing , you should see the following output (or something similar): - - |syft_3| - - And if so, Congratulations!!! You're 100% setup and we've tested to make sure! - -* Step 8.5: Close Jupyter Lab - - Close the jupyter lab browser tab. Then find the terminal window where we ran 'jupyter lab' and close the terminal window. If - a dialog box pops up saying "Do you want to terminate running processes in this window?", click "Terminate" - -* Step 8.6: Land Hagrid - - Open a new terminal window and run: - - .. code-block:: bash - - conda activate syft_env - hagrid land test - conda deactivate syft_env - -Well done! - -.. |osx_terminal| image:: ../_static/install_tutorials/osx_terminal.png - :width: 50% - -.. |find_osx_version| image:: ../_static/install_tutorials/find_osx_version.png - :width: 50% - -.. |conda_button| image:: ../_static/install_tutorials/conda_button.png - :width: 50% - -.. |click_save| image:: ../_static/install_tutorials/click_save.png - :width: 50% - -.. |conda_icon| image:: ../_static/install_tutorials/conda_icon.png - :width: 50% - -.. |conda_install_1| image:: ../_static/install_tutorials/conda_install_1.png - :width: 50% - -.. |conda_install_2| image:: ../_static/install_tutorials/conda_install_2.png - :width: 50% - -.. |conda_install_3| image:: ../_static/install_tutorials/conda_install_3.png - :width: 50% - -.. |conda_install_4| image:: ../_static/install_tutorials/conda_install_4.png - :width: 50% - -.. |conda_install_5| image:: ../_static/install_tutorials/conda_install_5.png - :width: 50% - -.. |conda_install_6| image:: ../_static/install_tutorials/conda_install_6.png - :width: 50% - -.. |conda_install_6_popup| image:: ../_static/install_tutorials/conda_install_6_popup.png - :width: 50% - -.. |conda_install_6_popup_already_installed| image:: ../_static/install_tutorials/conda_install_6_popup_already_installed.png - :width: 50% - -.. |conda_install_6_popup_access| image:: ../_static/install_tutorials/conda_install_6_popup_access.png - :width: 50% - -.. |conda_install_7| image:: ../_static/install_tutorials/conda_install_7.png - :width: 50% - -.. |conda_install_8| image:: ../_static/install_tutorials/conda_install_8.png - :width: 50% - -.. |docker_install_1| image:: ../_static/install_tutorials/docker_install_1.png - :width: 50% - -.. |docker_install_2| image:: ../_static/install_tutorials/docker_install_2.png - :width: 50% - -.. |docker_install_3| image:: ../_static/install_tutorials/docker_install_3.png - :width: 50% - -.. |docker_install_4| image:: ../_static/install_tutorials/docker_install_4.png - :width: 50% - -.. |docker_install_5| image:: ../_static/install_tutorials/docker_install_5.png - :width: 50% - -.. |docker_install_6| image:: ../_static/install_tutorials/docker_install_6.png - :width: 50% - -.. |docker_install_7| image:: ../_static/install_tutorials/docker_install_7.png - :width: 50% - -.. |docker_install_8| image:: ../_static/install_tutorials/docker_install_8.png - :width: 50% - -.. |docker_install_9| image:: ../_static/install_tutorials/docker_install_9.png - :width: 50% - -.. |docker_install_10| image:: ../_static/install_tutorials/docker_install_10.png - :width: 50% - -.. |docker_install_11| image:: ../_static/install_tutorials/docker_install_11.png - :width: 50% - -.. |docker_install_12| image:: ../_static/install_tutorials/docker_install_12.png - :width: 50% - -.. |docker_logo| image:: ../_static/install_tutorials/docker_logo.png - :width: 50% - -.. |hagrid_startup_complete| image:: ../_static/install_tutorials/hagrid_startup_complete.png - :width: 50% - -.. |pygrid_ui| image:: ../_static/install_tutorials/pygrid_ui.png - :width: 50% - -.. |syft_1| image:: ../_static/install_tutorials/syft_1.png - :width: 50% - -.. |syft_2| image:: ../_static/install_tutorials/syft_2.png - :width: 50% - -.. |syft_3| image:: ../_static/install_tutorials/syft_3.png - :width: 50% diff --git a/docs/source/install_tutorials/overview.rst b/docs/source/install_tutorials/overview.rst deleted file mode 100644 index 895ff982cee..00000000000 --- a/docs/source/install_tutorials/overview.rst +++ /dev/null @@ -1,103 +0,0 @@ -Beginner-level PySyft and PyGrid Installation Tutorials -******************************************************* - -.. toctree:: - :maxdepth: 3 - -Welcome to the domain deployment installation tutorials! -This section of our documentation is designed to be the -simplest way to get you started deploying a PyGrid Domain -to an OSX, Linux, or Windows machine and interacting with it -as a data scientist using PySyft. If you're looking -for cloud deployment, or more advanced tutorials such as -ansible, vagrant, kubernetes, or virtualbox deployment, please see the -`advanced deployment documentation `__. - -The purpose of these tutorials is to help you install everything -you need to run a Domain node from your personal machine (such -as if you're running through OpenMined -`courses `__ -or -`tutorials `__). -To that end, we will also be installing everything you might need to run Jupyter -notebooks with PySyft installed, such as if you're pretending to be -both Data Owner and Data Scientist as a part of a tutorial or course. - -Step 1: Are you on OSX, Windows, or Linux? -========================================== - -Installation differs greatly depending on whether your personal machine is -running OSX, Linux, or Windows. PySyft and PyGrid are relatively new pieces -of software so not all versions of these are supported. However, the first -step of your journey is to figure out which operating system you are running -and choose the right tutorial for installation. Then within the dropdowns below, -choose which version is right for you. Once you've found the right version, -and completed the tutorial for that version, you'll be all done!!! Good luck! - -There are 3 types of operating systems for you to choose from: OSX, Linux, and Windows. - -OSX Tutorials -~~~~~~~~~~~~~ - -If you know you're running OSX but you're not sure what version you're running, -click the Apple logo at the top left corner, then click "About this Mac" and you'll -see something like: - -|find_osx_version| - -See where this image says "11.5.1"? Figure out what number yours says in that place -and use that number to determine which of these installation tutorials you should -follow to complete your installation. If you don't see your number, choose the -closest that you can. - -#. `Big Sur (11.5.1) `__. - -Linux Tutorials -~~~~~~~~~~~~~~~ - -If you know that you're running Linux but you're not sure what version you're running, -open up a command line and type: - -.. code-block:: bash - - $ lsb_release -a - -Which should print something like the following: - -|find_ubuntu_version| - -See where this image says "20.04.3"? Figure out what number yours says in that place - -#. `Ubuntu (20.04.3 - Focal Fossa) `__. - -Windows Tutorials -~~~~~~~~~~~~~~~~~ - -If you know that you're running Windows but you're not sure what version you're running, -press (Windows Key + R) and then in the text box that appears type: - -.. code-block:: bash - - $ winver - -and hit (Enter)! This should print something like the following: - -|find_windows_version| - -See where this image says "Windows 10" and "20H2"? Figure out what numbers yours says in those place -and use those number to determine which of these installation tutorials you should -follow to complete your installation. If you don't see one of your numbers, choose the -closest that you can. - -#. `Windows 10 (20H2) `__. - -Best of luck on your journey! - -.. |find_osx_version| image:: ../_static/install_tutorials/find_osx_version.png - :width: 50% - -.. |find_ubuntu_version| image:: ../_static/install_tutorials/find_ubuntu_version.png - :width: 50% - -.. |find_windows_version| image:: ../_static/install_tutorials/find_windows_version.png - :width: 50% diff --git a/docs/source/install_tutorials/windows.rst b/docs/source/install_tutorials/windows.rst deleted file mode 100644 index 708ebca4b4b..00000000000 --- a/docs/source/install_tutorials/windows.rst +++ /dev/null @@ -1,207 +0,0 @@ -.. _windows_install: - -================= -Windows Tutorials -================= - -The following instructions are for Windows 10 version 2004 or higher. - -Now, traditionally, getting things as big and imposing as PySyft to work on Windows is... really, really challenging. -Luckily for us, we've got a few tricks up our sleeves to make the process super easy. - -So sit back, relax, grab a few cookies, and *enjoy!* - -Step 1: Enabling WSL2 -===================== - -Our first and most important step is going to be to enable the Windows Subsystem for Linux (WSL). -This lets you run a Linux-based environment (including most command line tools and applications!) directly on Windows, -unmodified, and without any of the drawbacks of more traditional solutions like virtual machines or dual-booting. - - -Installing this incredible piece of software is as easy as opening PowerShell or Command Prompt in the Start Menu, and entering:: - - wsl --install - -And that's it! It'll start installing all the dependencies and getting things in order. -If you run into any issues here, please refer to `this link `_, which covers common WSL installation issues. - -.. Specifying an alternate way to install wsl along with distro from microsoft store start -**Alternate way** -================= - -**Install WSL from Microsoft Store** -If the command line has you feeling confused, fear not! There's a more user-friendly approach to installing WSL on Windows. We can bypass the command line altogether and download a package of all the components from the Microsoft Store. Not only that, but this method runs WSL isolated from Windows 11 and updates will be available through the Microsoft Store, so you won't have to wait for the next version of the operating system to install the newest version. - -To install WSL from the Microsoft Store, use these steps: - - -1. Enable Virtual Machine Platform -================================== - - - Open **Start** - - Search for **Turn Windows Features on or off** and click the - top result to open the app - - Check the **Virtual Machine Platform** - - Click the **OK** button - - Click the **Restart button** - -After completing these steps, you can download the app from the Microsoft Store. - - 2. Install Windows Subsystem for Linux app - ========================================== - -- Open the `Windows Subsystem for Linux Store Page `_ -- Click the **Get** button -- Click the **Open** button -- Click the **Get** button again - - 3. Install Linux Distro - ======================= -- Open **Microsoft Store** app. -- Search for Linux distro. For example `Ubuntu `_` -- Click the **Get** button. -- Click the **Open** button. - -*Congratulations! Once you complete the steps, WSL will install on Windows 11, including the support for Linux GUI apps and the Linux distribution.* - -*To access the command line for your Linux distribution, search for "wsl" in the search bar and select the top result, which should be a penguin logo* - - .. end - -Step 2: Setting up Linux User Info -================================== - -Well done! You've *almost* got an entire Linux kernel and distribution on your machine, and you did this with **barely one line of code!** -There's just one last step needed. And luckily for us, it's an easy one... - -We now have to add a new User to our brand new and shiny Linux distro. To do this, we'll have to pick a username and password. -Please note- this account, this password- doesn't have any relation with your regular Windows username or password. It's specific to the Linux -distro that you just installed. - -Once you provide a username and password, **congratulations!** You have a fully fledged Linux distro. You may not have realized it, but you've just unlocked -a whole new universe of possibilities and interesting tools. - -Step 3: Updating & Upgrading -============================ - -Now that you have a shiny new copy of Linux, your next step will be to update and upgrade it. -This is pretty easy to do in Linux, and it's something we can do with *just one command!* - -In your new Ubuntu terminal, enter the following command:: - - sudo apt update && sudo apt upgrade - -You might need to enter the password of the account you created in Step 2. You might also need to press Y and hit enter to allow the updates. -But you're on a roll- nothing will stop you from getting the most up-to-date, and secure version of your Linux distro! - -Note: We'd actually recommend doing this reasonably often (once every few days) to maintain a safe and up-to-date distro. - -Optional: Installing Windows Terminal -===================================== - -We'd recommend installing the Windows Terminal, and using that to launch your Linux Distribution instead of PowerShell, Command Prompt, or the default -Ubuntu shell that comes bundled in. - -This isn't strictly necessary, but it doesn't take too long, improves the command line experience, and will probably make you happier. - -Please go `here `_ if you're interested. - -Step 4: Installing Conda -======================== - -Wow! We've made it pretty far together in a pretty short amount of time. - -We've already installed a Linux distribution, (and if you followed the Optional step, have a *swanky* new terminal!) and we're getting *really* close to installing our software. -Our next step is an important one. It'll help us make sure our software can install without any conflicts, and once installed, that it will be stable, and work as intended! - -We're going to use a tool called Anaconda to do this. It'll help us create something called a "Virtual Environment." - -To install Anaconda, please follow the yellow brick road I lay down here below: - -- `Head to the Anaconda website `_, and find the latest Linux installer. -- Right click the installer, and select **"Copy Link Address"** -- Head back to your WSL terminal, and type "wget " and then right click next to it. This should paste the link you copied, which should produce something like:: - - wget https://repo.anaconda.com/archive/Anaconda3-2022.05-Linux-x86_64.sh - -- You got it! Not only did you get it, you made it look **easy.** Now just hit enter. -- At this point, Conda will start installing. Type "yes" and hit Enter for all the various prompts that follow (Accepting the Terms and Conditions, Running Conda Init, etc) -- Once this is done, close and restart your WSL terminal. -- Once restarted, verify that conda is working using the following command:: - - conda env list - -Wait wait wait wait just a second. -Do you realize what just happened? - -You've just successfully installed Anaconda!! Hooray! -Trust me, your life is about to become a LOT easier. - - -- Let's now tap into your newfound powers with Anaconda and create a new virtual environment called "syft_env" by running the following in your WSL shell:: - - conda create -n syft_env python=3.9 --y - -- Let's verify that we created our "syft_env" successfully with the following command (Deja Vu, anyone?):: - - conda env list - -- You should see two environments in the output. Hooray! Now let's activate the syft virtual env, and let the fun *really* begin:: - - conda activate syft_env - -- Now let's use it to conveniently install a few packages:: - - sudo apt install python3-pip - pip3 install pandas matplotlib numpy - pip3 install jupyterlab - -- If the last command fails, try the following instead:: - - conda install -c conda-forge jupyterlab - - -Step 5: Become the Docker Doctor -================================ - -The last tool needed to complete your arsenal is called Docker. -You can install it by following the instructions `here `_. - -Note: The windows user account that launches wsl 2 has to be added to the local group "docker-users". On Windows 10 Home, run netplwiz to add the Windows user to the group "docker-users". - -Once you have it running, you just have to ensure the following: -- You've allocated a sufficient amount of RAM (we recommend atleast 8GB, but you can get by with less) -- You're using the WSL2 backend - -Congratulations, you have reached the end of your journey. Now it is time for your **ultimate test!** Deploying a domain node. - -Note that your ultimate test is **optional**- you can do this part later. - - -Step 6: Install Hagrid and PySyft -================================= - -- With the power of WSL and Anaconda, installing our software is as easy as:: - - pip3 install syft - pip3 install hagrid - - -Optional: Deploy a Domain Node! -=============================== - -Everything we've done so far has been to make this next part as easy as possible. This is the moment we've all been waiting for. - -To launch a domain node called "test_domain", ensure your Virtual Environment ("syft_env" in the steps above) is active, that Docker Desktop is running, and run the command below on your WSL terminal:: - - hagrid launch test_domain - -Note: If you get the error message "test_domain is not valid for node_type please use one of the following options: ['domain', 'network']" then rerun the command by changing test_domain to domain. - -You should see the containers begin to appear on Docker! - -**CONGRATULATIONS!!!** - -You have reached the promise land. You're ready to begin remote data science. -It was a pleasure walking you through the installation process. Now be sure to use your newfound powers and abilities for good! diff --git a/notebooks/tutorials/hello-syft/01-hello-syft.ipynb b/notebooks/tutorials/hello-syft/01-hello-syft.ipynb index 8a7f6a674d2..12f01679e3c 100644 --- a/notebooks/tutorials/hello-syft/01-hello-syft.ipynb +++ b/notebooks/tutorials/hello-syft/01-hello-syft.ipynb @@ -83,7 +83,7 @@ "source": [ "## Launch a dummy server \n", "\n", - "In this tutorial, for the sake of demonstration, we will be using in-memory workers as dummy servers. For details of deploying a server on your own using `syft` and `hagrid`, please refer to the `quickstart` tutorials." + "In this tutorial, for the sake of demonstration, we will be using in-memory workers as dummy servers. For details of deploying a server on your own using `syft`." ] }, { diff --git a/packages/grid/frontend/src/lib/components/Datasets/DatasetModalNew.svelte b/packages/grid/frontend/src/lib/components/Datasets/DatasetModalNew.svelte index 66587a5fd63..3b0cb81318a 100644 --- a/packages/grid/frontend/src/lib/components/Datasets/DatasetModalNew.svelte +++ b/packages/grid/frontend/src/lib/components/Datasets/DatasetModalNew.svelte @@ -43,10 +43,10 @@ > 2 -

Install HAGrid by running the code below in your Jupyter Notebook

+

-

pip install -U hagrid

+

  • @@ -57,12 +57,10 @@ 3

    - Once HAGrid is installed open the "Upload Dataset" quickstart tutorial notebook by - running the code below in your Jupyter Notebook.

    -

    hagrid quickstart

    +