{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"https://www.bitmat.it/wp-content/uploads/2020/10/google-cloud-logo.jpg\" style=\"width: 25%; float: left; margin-top: 7em;\" />\n",
    "\n",
    "<div style=\"font-size: 20px;\">\n",
    "Master in Computer Science\n",
    "</div>\n",
    "\n",
    "\n",
    "<div style=\"font-size: 24px; margin-top: 2rem;\">\n",
    "Architectures for big data</div>\n",
    "\n",
    "<h1 style=\"font-size: 36px; margin-top: 5rem; margin-bottom: 3rem; text-align: center;\">A (partial) view on the Google Cloud Platform</h1>\n",
    "\n",
    "<h2>Prof. Dario Malchiodi</h2>\n",
    "\n",
    "<div style=\"width: 50%; margin: auto;\">\n",
    "  <img src=\"img/logo-di.jpg\" style=\"height: 9rem; margin-top: 4.75rem;\" align=\"left\">\n",
    "  <img src=\"img/logo-unimi.jpg\" style=\"height: 8rem; margin-top: 5rem;\" align=\"left\">\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a href=\"http://etc.ch/aA8N\"><img src=\"http://etc.ch/aA8N/?qr\" style=\"width: 30%; float: left;\"></a>\n",
    "\n",
    "# First things first...\n",
    "\n",
    "## A poll!\n",
    "\n",
    "<a href=\"https://directpoll.com/r?XDbzPBdEt8j1rJnVSCHYrxgAN2vGi3SL1RsJkHj\">Start poll / next question</a>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "hide_input": true,
    "scrolled": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<iframe src=\"https://directpoll.com/r?XDbzPBd3ixYqg8cKH16NgmVzCrk5XsHAqGh8Z6Y\" width=\"900\" height=\"400\"></iframe>\n"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "%%html\n",
    "<iframe src=\"https://directpoll.com/r?XDbzPBd3ixYqg8cKH16NgmVzCrk5XsHAqGh8Z6Y\" width=\"900\" height=\"400\"></iframe>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Setting up the computational environment\n",
    "\n",
    "This document describes the steps to be executed to install and setup the tools necessary in order to be able to implement the algorithms explained in the course.\n",
    "\n",
    "The material described in this document is frequently subject to change, especially when it comes to registration and use of Web-based tools. If anything described here should differ from what you experience:\n",
    "<ul>\n",
    "    <li>you are following a different procedure from the one here described (thus be sure to stick to these instructions);</li>\n",
    "    <li>the terms of service are changed and this document is no more adherent to them (thus check that you are not reading material from previous years and if the problem persists send me an e-mail);</li>\n",
    "    <li>you have exhausted your credits.</li>\n",
    "</ul>    \n",
    "    \n",
    "The technical instructions refer to a Fedora-like operating system. Most of the instructions will work out of the box for other Linux- or MacOS-based installations. Windows users can either refer to the WSL or to other tools simulating a Linux environment, or install a Linux-based virtual machine.\n",
    "\n",
    "**Notation**\n",
    "\n",
    "- Text delimited by caporali quotes (such as «File») refers to something to be searched verbatim in a Web-based interface, such as the name of a menu, or the text on a button;\n",
    "- monospaced inline text with gray background (such as `ls`) denotes either a bash command, or a file or path name;\n",
    "- a block of monospaced text starting by `local>` and `VM>` contains a terminal excerpt (where the two situations represent, respectively, a terminal on the local machine or on a virtual machine);\n",
    "- a block of monospaced text with gray background shows the content of a text file."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"https://www.rightinbox.com/wp-content/uploads/How-to-Change-Your-Gmail-Address-.png\" style=\"width: 40%; float: left; margin-right: 2em;\" />\n",
    "\n",
    "## Account creation\n",
    "\n",
    "This workshop refers to a subset of the Google cloud ecosystem. This means that a Google account is needed in order to be able to run the experiments in this part. In the unlikely case that you don't have a Google account, go to https://accounts.google.com/SignUp and create one, possibily specifying some form of secure authentication such as two-factors login. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"https://miro.medium.com/max/776/1*Lad06lrjlU9UZgSTHUoyfA.png\" style=\"width: 30%; float: right; margin-top: 4em;\" />\n",
    "\n",
    "## The easy way: Colaboratory\n",
    "\n",
    "The simpler option you can follow in order to run experiments is that of using Google colaboratory, a jupyter-like environment accessible via Web and automatically backed by machines hosted by Google. On the one hand, this is accessible for free, it provides powerful hardware, and it requires a very low effort in terms of software installation. On the other one, resources are allocated only temporarily and with specific limits. Thus this tools represent a meaningful solution in the early phases of study and experimentation, but it is not recommended to be used for big projects. Activating it is simple: just connect to [Google Colaboratory](http://colab.research.google.com) and sign in with your account. You'll be able to directly work with a jupyter-like notebook and to save it within Google drive."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"https://www.bitmat.it/wp-content/uploads/2020/10/google-cloud-logo.jpg\" style=\"width: 25%; float: left; margin-top: 6em; margin-bottom: 6em\" />\n",
    "\n",
    "## The hard way: Google Cloud Platform\n",
    "\n",
    "In order to have more control on the resources used, and to be able to set up a stable development and production envorinment without having to physically manage hardware, there are several alternatives based on service-oriented architectures. Basically speaking, it is possible to use remotely available resources. Here, \"resources\" has to be intended in a broad term and might refer to computational power, storage, memory, bandwith and so on. Most of the time these resources are cloud-based and rely on virtualization techniques. In opposition to the on-premises installation on one or more local machines, which are 24/7 available to their owner, users pay for the actual use of these resources, which therefore need a proper management in order to avoid unneeded costs.\n",
    "\n",
    "There are several providers offering cloud-based computing services, and among these the most known are [Amazon Web Services](https://aws.amazon.com/) (AWS), [Microsoft Azure](http://azure.microsoft.com/), and [Google Cloud Platform](https://cloud.google.com/) (GCP). We will refer here to the latter, although the concepts we will speak about have their counterpart on the other providers."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Registration and project creation\n",
    "\n",
    "First of all, note that the services we will refer to hereafter are not provided for free. When you register to GCP you are offered a 300US\\\\$ credit (as a comparison, running the code shown in this worksop costs significantly less than 1US\\\\$).\n",
    "\n",
    "<div class=\"alert alert-warning\" role=\"alert\">\n",
    "Registration requires you to enter a valid credit card number which will be automatically charged when the obtained credits have been used. In any case never provide such kind of information, <span style=\"font-weight: bold;\">unless you are ready to pay for resources used after your credits expire.</span>\n",
    "    \n",
    "Moreover, always double-check that you have not disposed unused resources, as you risk to pay for them even if they are idle.\n",
    "</div>\n",
    "\n",
    "### The intimidating Google cloud console\n",
    "\n",
    "Go to https://console.cloud.google.com: if you never created a project you will automatically be prompted to do so, otherwise click on the dropdown in the upper-left part of the page (just after «Google Cloud Platform») and click on «NEW PROJECT»; in both cases:\n",
    "\n",
    "- provide a name and an ID for your project (in the rest of this document I will use «AMD-unimi» for both, feel free to choose the name you prefer, unless this is already been used by someone else or it does not respect the format required), noting that the it will not be possible to modify the ID in the future,\n",
    "- click on «CREATE».\n",
    "\n",
    "After a while your project is listed on the console and thus becomes available. Go back to the console and check that your project is selected (its name should appear right after «Google Cloud Platform» in the upper part of the page).\n",
    "\n",
    "Select «Billing» in the menu on the left of the page (to be possibly visualized through clicking on the three-bars button on the left of «Google Cloud Platform») and connect your billing account (notably, the one you have associated to your credit for students). Should you have not yet associated any credit to your account, you can do so clicking on «Manage billing accounts».\n",
    "\n",
    "Get back to the console, select «APIs and Services»/«Dashboard», and click the «ENABLE APIS AND SERVICES» button in the upper part of the page. You will be redirected to a page listing the accessible API: search «Compute Engine API» section and click on it (you can type this directly in the search box in order to speed up the process), then click on the «ENABLE API» button. Repeat the process for the «Cloud Dataproc» API."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Local installation of the Google cloud SDK\n",
    "\n",
    "Although all the operations we will refer to can be accomplished via Web-based interfaces, we will make use of command-line tools, which can be accessed either using a special, cloud-based shell (see the next sections), or on your local machine using a terminal. Note that the use of command-line tools is essential when some form of process automatization is needed.\n",
    "\n",
    "<div class=\"alert alert-info\" role=\"alert\">\n",
    "Each linux flavor has its own package manager: here I refer to <code>dnf</code>, shipped with the recent releases of Fedora.\n",
    "</div>\n",
    "\n",
    "Fedora's package manager is not aware of Google cloud SDK, thus the corresponding repository should be added to the list of resources accessed by <code>dnf</code>. The best option in order to add a new file <code>/etc/yum.repos.d/google-cloud-sdk.repo</code>(note that root permissions are required, thus this is an action to be performed likely through <code>sudo</code>), having the following content:\n",
    "\n",
    "<pre class=\"file\">[google-cloud-sdk]\n",
    "name=Google Cloud SDK\n",
    "baseurl=https://packages.cloud.google.com/yum/repos/cloud-sdk-el7-x86_64\n",
    "enabled=1\n",
    "gpgcheck=1\n",
    "repo_gpgcheck=1\n",
    "gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg\n",
    "       https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg\n",
    "</pre>\n",
    "\n",
    "Now the installation can be carried out as usual:\n",
    "\n",
    "<pre class=\"terminal\">local> sudo dnf install google-cloud-sdk\n",
    "</pre>\n",
    "\n",
    "The next step consists in initializing the SDK, associating it to the project we have created. This is done through the following command:\n",
    "\n",
    "<pre class=\"terminal\">local> gcloud init\n",
    "</pre>\n",
    "\n",
    "The initialization process is composed by several steps, detailed here below.\n",
    "\n",
    "1. Selection of a Google account: you should already find the one you registered with as a possible choice, although there is the possibility to insert another account (note that an authentication step will be fired up, for instance through the mediation of a Web browser).\n",
    "2. Selection of a project: you should be prompted with a list with your existing projects, within which you can select the project you have created for the course.\n",
    "3. Selection of region and zone: answer «Y» when asked if you want to select the compute region and zone. This allows you to specify a default region and zone for the commands you will issue using the SDK. Although several architectural patterns might apply here, we will keep it simple and refer to the one called «single-region deployment», consisting in placing everything in the same zone, selecting a region close to our ISP. We will select «europe-west6-a», corresponding to one of the zones in the Zurich datacenter (one of the nearest in Milano at the time of writing). Note that only some of the available zones are listed, and it might be required to type <code>list</code> in order to show all options.\n",
    "\n",
    "\n",
    "<div class=\"alert alert-info\" role=\"alert\">\n",
    "Google servers are physically located in different locations worldwide, and this localization is hierarchically organized into regions (such as «western Europe», or «central US»), with each region hosting datacenters in different zones. Moving data between zones typically involves a non-negligible service cost. Refer to <a href=\"https://cloud.google.com/compute/docs/regions-zones/\">https://cloud.google.com/compute/docs/regions-zones/</a> for a list of regions and zones. \n",
    "</div>\n",
    "\n",
    "<img src=\"img/gcp-zones.png\" style=\"width: 70%;\" />"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Launch a virtual machine\n",
    "\n",
    "Everything is not set up in order to easily fire up a virtual machine (VM) with preinstalled software: on a local terminal, just execute\n",
    "\n",
    "<pre  class=\"terminal\">local> export IMAGE_FAMILY=\"tf2-latest-cpu\"\n",
    "local> export ZONE=\"europe-west6-a\"\n",
    "local> export INSTANCE_NAME=\"amd-instance\"\n",
    "local> export INSTANCE_TYPE=\"n1-standard-1\"\n",
    "\n",
    "local> gcloud compute instances create $INSTANCE_NAME \\\n",
    "              --zone=$ZONE \\\n",
    "              --image-family=$IMAGE_FAMILY \\\n",
    "              --image-project=deeplearning-platform-release \\\n",
    "              --maintenance-policy=TERMINATE \\\n",
    "              --machine-type=$INSTANCE_TYPE \\\n",
    "              --boot-disk-size=200GB \\\n",
    "              --preemptible\n",
    "</pre>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-warning\" role=\"alert\">\n",
    "When you use this command later on, you might get an error due to the fact that your auth token is expired. In such case, just execute `gcloud auth login` in order to login again (note that you should be prompted some credential access mechanism, typically through a Web browser).\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The code is organized so that personalization can easily be accomplished through modification of the variables initially set. More precisely:\n",
    "\n",
    "- `IMAGE_FAMILY` refers to the software to be automatically installed when the VM is created;\n",
    "- `ZONE` contains the zone where the virtual machine is created (cfr. previous paragraph);\n",
    "- `INSTANCE_NAME` is a chosen ID allowing to subsequently connect to the VM;\n",
    "- `INSTANCE_TYPE` denotes the type of instance (basically a name for the combination of CPU type and RAM amount)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"img/gcp-machine-types.png\" style=\"width:70%;\" />\n",
    "\n",
    "<img src=\"img/gcp-machine-subtype.png\" style=\"width:70%;\" />"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Thus the previous command creates a VM named «amd-instance», which is placed in the same zone of our project, is equipped with the latest version of TensorFlow 2 (a deep learning library we will use later on in the course), and has the simplest instance type, containing 1 CPU and slightly less than 4 Gbytes of RAM (note that the more powerful a VM, the higher its usage cost: https://cloud.google.com/compute/docs/machine-types lists the available instance types, and https://cloud.google.com/compute/vm-instance-pricing shows the corresponding prices). In the creation command, also the following options have been specified:\n",
    "\n",
    "- `image-project` declares the project within which all referenced image families are organized (see `IMAGE_FAMILY` in previous list);\n",
    "- `maintenance-policy` dictates what should be done if the phisical machine hosting the VM need to be placed under maintenance (here, `TERMINATE` requires that in this case the VM should be terminated); \n",
    "- `boot-disk-size` sets up a 200 Gbytes-sized boot disk;\n",
    "- `preemptible` marks the VM as _preemptible_, which means that it will automatically disposed if its resources are needed, and anyhow they will run for a limited amount of time."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that the purpose here is that of verifying that our setup is actually working, rather than to provide a production environment. This is why the provided values refer to a VM with very limited resources, which have a particularly low cost. When designing experiments, it is important to reflect on a proper balance between resource capacity (considering, for instance, the use of one ore more GPUs) and costs."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It take some minutes to the command above in order to finish its execution, whose progress can be monitored within the Google Cloud console, either under the tab «Resources» in the main dashboard or within «Compute Engine»/«VM instances». Once the VM has been created, it is fully accessible: you can for instance log in via ssh, either clicking on the «SSH» button (this opens up a web-backed ssh client) or using a terminal as follows."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<pre class=\"terminal\">local> gcloud compute ssh --zone=$ZONE \\\n",
    "              jupyter@$INSTANCE_NAME -- -L 8080:localhost:8080\n",
    "</pre>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Connecting via terminal could require to add your SSH keys to the Google console. Note however that, although a terminal might be handy for several reasons, most of the time we will interact with the VM only via Web, using one or more jupyter notebooks (which have been installed according to the specification of `IMAGE_FAMILY` during the VM creation). Indeed the above ssh connection also sets up a forwarding between the port 8080 on the local machine and the analogous one in the VM, as everything follows `--` in previous command is automatically dispatched to the ssh client. This allows us to open a browser and connect to http://localhost:8080/tree and directly use jupyter notebooks on the VM. Note also that a directory `tutorials` is automatically created and populated with some tutorials on the use of GCP within jupyter.\n",
    "\n",
    "In order to verify that everything has been installed, create a new notebook based on python 3 and execute the following code."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import scipy as sp\n",
    "import seaborn as sns\n",
    "import matplotlib.pyplot as plt\n",
    "import tensorflow as tf\n",
    "\n",
    "tf.__version__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you obtain any import error, some of the libraries has not been correctly installed. Moreover, if you don't get as output a string starting by `'2.'`, the latest TensorFlow release has not been installed. In both cases, check that you followed the described procedure."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-info\" role=\"alert\">\n",
    "If you find up a Web interface different from the usual one in jupyter, double-check the URL entered in the browser:\n",
    "«http://localhost:8080/tree» points to the standard notebook, while «http://localhost:8080» leads to jupyter lab, a more recent interface for jupyter notebooks.\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Shutting down the VM\n",
    "\n",
    "Always take in mind that as a resource is alive, it will consume your credit (or, worse, it will cost you money if you have enabled billing and consumed your credits). Thus you should learn the habit of shutting down all resources when they are not needed anymore. In our case, this requires:\n",
    "\n",
    "1. to save and shutdown our notebook: actually we didn't really write anything but a couple of python idioms in order to check the actual availability of the installed software, thus we could simply shut down the instances as shown in the next point; in a development context we should save (using the «File»/«Save and Checkpoint» menu item), then shutdown the notebook («File»/«Close and halt») and close the browser window;\n",
    "2. to shutdown the cloud instances, issuing the following command on our local machine:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<pre class=\"terminal\">local> gcloud compute instances delete $INSTANCE_NAME</pre>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "which deletes (upon explicit confirmation) all the allocated resources. Note that this also includes the disk containing the notebook we previously created. We will deal with this issue in a while.\n",
    "\n",
    "If now you get back to the Google console, you should see a peak in the graph displaying the amount of API requested, as well as notice that there are no more resources allocated."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Decoupling VM and storage\n",
    "\n",
    "As we have seen in previous section, on the one hand it is recommended to shut down a VM once it is no more needed in order to perform a computation, altough this also has the effect of permanently deleting its attached storage, thus including the results of the computation itself, which typically are notebooks, textual or binary data generally stored as one or more files. An obvious solution to this problem is that of keeping VMs alive at least for the time of transferring such results in a safe place, such as within our local machine. This can be done through the `gcloud` utility as follows:\n",
    "\n",
    "<pre class=\"terminal\">\n",
    "local> export LOCAL_PATH='/home/local'\n",
    "local> export INSTANCE_FILE='~/data.txt'\n",
    "local> gcloud compute scp $LOCAL_PATH jupyter@$INSTANCE_NAME:$INSTANCE_FILE\n",
    "</pre>\n",
    "\n",
    "Here\n",
    "- `INSTANCE_NAME` contains the name of a running VM instance,\n",
    "- `LOCAL_PATH` is the pathname of the local directory where data should be saved (replace «/home/local» with an existing path),\n",
    "- `INSTANCE_FILE` is the pathname of the file to be copied from the VM (again, replace the contents of this variable with an appropriate value)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Directory-recursive copies are possible, via using the `--recurse` option. Of course it is possible to use the same tool in order to copy something from the local machine to the VM, for instance configuration files or data to be processed.\n",
    "\n",
    "Another long-term storage place is a cloud-based resource which can live independently of a VM. Such storage is organized in terms of file-system-like objects called _buckets_. Buckets can be managed, as well as VMs, either via the Web console or using a terminal, although in this case the command to be used is `gsutil` (automatically installed alongside the Google Cloud SDK). For instance\n",
    "\n",
    "<pre class=\"terminal\">local> gsutil mb -l europe-west6 gs://amd-bucket/</pre>\n",
    "\n",
    "has the effect of creating a bucket identified by the name «amd-bucket». The URI-like reference «gs://amd-bucket» underlines the fact that, exactly as with projects, we are here dealing with global identifiers, thus it is not possible to create buckets having names already chosen by someone else. Note that in this case we specified only a region («europe-west6») instead than a specific zone. The `gsutil` tool can be straightforwardly used in order to move data between a bucket and a local machine, exactly with the same semantics of `cp` in a bash. For insance\n",
    "\n",
    "<pre class=\"terminal\">local> export LOCAL_PATH='~'\n",
    "local> export BUCKET_NAME='amd-bucket'\n",
    "local> export BUCKET_PATH='data.txt'\n",
    "local> gsutil cp gs://$BUCKET_NAME/$BUCKET_PATH $LOCAL_PATH\n",
    "</pre>\n",
    "\n",
    "copies a file named «data.txt» from a bucket to the home directory of the local machine. The tool is also available when we ssh into a VM, and thus can be used in order to transfer data between the ephemeral disk of the VM and the persistent storage of a bucket, using the URI-like namespace for objects inside a bucket, such as in\n",
    "\n",
    "<pre class=\"terminal\">VM> export VM_PATH='~/notebooks/result.csv'\n",
    "VM> export BUCKET_NAME='amd-bucket'\n",
    "VM> export BUCKET_PATH='result.csv'\n",
    "VM> gsutil cp $VM_PATH gs://$BUCKET_NAME/$BUCKET_PATH\n",
    "</pre>\n",
    "\n",
    "which exports the results of a computation done on a VM (the file «results.csv») in a bucket. Finally, recall that, although cheaper than VMs, buckets have themselves a cost and thus it is recommended to dispose of them if they are not needed for long-term storage (which could be a good option in case of massive data which cannot be saved in local HDs/SSDs). This can be done again via `gsutil`:\n",
    "\n",
    "<pre class=\"terminal\">local> gsutil rm -r gs://amd-bucket</pre>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Setup a small cloud\n",
    "\n",
    "If needed, we can adapt the procedure shown in previous sections to the case of creating a cloud of machines, for instance in order to build up a distributed environment based on Hadoop and/or Spark (don't worry if these names do not sound to you, we'll learn about them during classes). Just to give an example, issuing the following command in the local terminal has the effect of creating a small cloud with one master machine and two workers.\n",
    "\n",
    "<pre class=\"terminal\">local> gcloud dataproc clusters create amd-cluster \\\n",
    "              --num-workers=2 \\\n",
    "              --scopes=cloud-platform \\\n",
    "              --worker-machine-type=n1-standard-1 \\\n",
    "              --master-machine-type=n1-standard-1 \\\n",
    "              --zone=$ZONE\n",
    "</pre>\n",
    "\n",
    "Besides `zone`, having the same meaning of before, the options are:\n",
    "- `num-workers`, setting the number of worker machines in the cloud,\n",
    "- `scopes`, declaring one ore more scope (each referring to a set of API which can be accessed by the cloud);\n",
    "- `worker-machine-type` and `master-machine-type`, describing the type of images to be used for workers and master, respectively.\n",
    "\n",
    "Also in this case it will take some time, but if you check your Google console you'll notice under the tab «Resources» that you are running three Compute Engine instances and using one bucket. Here _instance_ and _bucket_ mean \"computing resource\" and \"disk space\", respectively.  Clicking on that section of the tab will show you some more details, and in particular you'll notice that VMs are named «amd-cluster-m», «amd-cluster-w0», and «amd-cluster-w1» in order to differentiate between master and workers. You can ssh into these instances, again clicking on the «SSH» button or using a terminal as follows. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<pre class=\"terminal\">local> export USER='amd'\n",
    "local> gcloud compute ssh --zone=$ZONE $USER@amd-cluster-m\n",
    "</pre>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here, the `USER` variable needs to be set to the username for the master machine, which is equal to the Google username of your account (typically, the part on the left of the 'at' sign in the related <span style=\"font-family: 'Roboto Mono', monospace;\">@gmail.com</span> address).\n",
    "\n",
    "Also in this case, an explicit cleaning phase needs to be executed when the cluster have been used.\n",
    "\n",
    "<pre class=\"terminal\">local> gcloud dataproc clusters delete amd-cluster\n",
    "</pre>\n",
    "\n",
    "Coming back to the Google console, you should see that there are no more computing resources allocated, although the «Resources» tab should report that some storage is still in use. Clicking on the related section shows up a list of buckets, likely containing only one item whose name starts by \"dataproc\". For now, just select this item and click the «DELETE» button, confirming the action."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Double-check unused resources and keep an eye on your billing costs\n",
    "\n",
    "If you have followed the instructions described in this document, you should have deleted all VMs and buckets and thus you should not be paying for unused, yet still running, resources. However it is always better to perform a double-check: connect to the Google Cloud console and verify that you have not left anything alive.\n",
    "\n",
    "Finally, still in the console, have a look at the «Billing» tab, where you should see an estimate of the costs for the services we have used so far. It is likely that, unless it took you more than half an hour (well, probabily a fairly higher amount of time) in order to reproduce all the described steps, you will see an estimated charge of 0 USD. If you check the detailed charges, however, you will notice that the resources we used have actually been accounted for, although they were used for so small a time that we did not incur in any cost charge. Note nonetheless that the shown charges only represent an estimate, thus take the habit of periodically control the effective costs which have been charged on your account."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Connecting Colab and GCP\n",
    "\n",
    "Colab provides limited resources, that is the provided virtual machine is automatically shut down after a fixed period (at the time of writing, 12 hours). Moreover, despite the fact that this virtual machine has a remarkable computational power (it can be boosted using GPUs or TPUs), it could represent an insufficient resource for dealing with complex problems. When a more reliable computing environment is needed and the Colab frontend can be hosted on a machine provided by us, for instance a VM spinned up using GCP, as previously explained. We repeat here the code to be executed from the local machine in order to ssh to the VM when the latter is up and running:\n",
    "\n",
    "<pre class=\"terminal\">local> gcloud compute ssh --zone=europe-west6-a \\\n",
    "              jupyter@amd-instance -- -L 8080:localhost:8080\n",
    "</pre>\n",
    "\n",
    "Recall that this command also puts in place an ssh-based forwarding of the 8080 local port to the same one on the VM: we have seen that this is important in order to access the remote jupyter installation, and using the same technique it will be possible to tie that installation to a Colab frontend. This is done as follows:\n",
    "\n",
    "1. pointing to https://colab.research.google.com/ using a browser,\n",
    "2. click on the arrow right after the button «Connect» (in the upper right part of the page),\n",
    "2. selecting «Connect to local runtime...»,\n",
    "3. inserting «http://localhost:8080/» in the shown form.\n",
    "\n",
    "This will execute all the following cell computations within the virtual machine."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Challenge!\n",
    "\n",
    "- Chose a problem.\n",
    "- Set up a solution for that problem which:\n",
    "  - spins up a virtual machine on GCP,\n",
    "  - transfers software and data to be processed,\n",
    "  - execute sowftware,\n",
    "  - saves the results in a persistent location."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Example\n",
    "\n",
    "[Adversarial examples](https://colab.research.google.com/github/tensorflow/docs/blob/master/site/en/tutorials/generative/adversarial_fgsm.ipynb)\n",
    "\n",
    "<div class=\"row\">\n",
    "<img src=\"img/dog-labrador.png\" style=\"width: 20%; float: left; margin-top: 1em;\" />\n",
    "<img src=\"img/dog-hound.png\" style=\"width: 20%; float: left;\" />\n",
    "<img src=\"img/dog-whippet.png\" style=\"width: 20%; float: left;\" />\n",
    "<img src=\"img/dog-saluki.png\" style=\"width: 20%; float: left;\" />\n",
    "</div>\n",
    "\n",
    "<div class=\"row\">\n",
    "<span style=\"width: 20%; float: left; text-align: center;\">Labrador</span>\n",
    "<span style=\"width: 20%; float: left; text-align: center;\">Hound</span>\n",
    "<span style=\"width: 20%; float: left; text-align: center;\">Whippet</span>\n",
    "<span style=\"width: 20%; float: left; text-align: center;\">Saluki</span>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Example\n",
    "\n",
    "[Neural style trasfer](https://colab.research.google.com/github/tensorflow/docs/blob/master/site/en/tutorials/generative/style_transfer.ipynb)\n",
    "\n",
    "<div class=\"row\">\n",
    "<img src=\"img/dog-kandinsky.png\" style=\"width: 20%; float: left; margin-top: 1em;\" />\n",
    "<img src=\"img/dog-vangogh.png\" style=\"width: 20%; float: left;\" />\n",
    "<img src=\"img/dog-munch.png\" style=\"width: 20%; float: left;\" />\n",
    "<img src=\"img/dog-mondrian.png\" style=\"width: 20%; float: left;\" />\n",
    "</div>\n",
    "\n",
    "<div class=\"row\">\n",
    "<span style=\"width: 20%; float: left; text-align: center;\">Kandinsky</span>\n",
    "<span style=\"width: 20%; float: left; text-align: center;\">Van Gogh</span>\n",
    "<span style=\"width: 20%; float: left; text-align: center;\">Munch</span>\n",
    "<span style=\"width: 20%; float: left; text-align: center;\">Mondrian</span>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a href=\"http://etc.ch/aA8N\"><img src=\"http://etc.ch/aA8N/?qr\" style=\"width: 30%; float: left;\"></a>\n",
    "\n",
    "# Before we go...\n",
    "\n",
    "## A last question\n",
    "\n",
    "<a href=\"https://directpoll.com/r?XDbzPBdEt8j1rJnVSCHYrxgAN2vGi3SL1RsJkHj\">Next question</a>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "hide_input": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<iframe src=\"https://directpoll.com/r?XDbzPBd3ixYqg8cKH16NgmVzCrk5XsHAqGh8Z6Y\" width=\"900\" height=\"400\"></iframe>\n"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "%%html\n",
    "<iframe src=\"https://directpoll.com/r?XDbzPBd3ixYqg8cKH16NgmVzCrk5XsHAqGh8Z6Y\" width=\"900\" height=\"400\"></iframe>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  },
  "latex_envs": {
   "LaTeX_envs_menu_present": true,
   "autoclose": true,
   "autocomplete": true,
   "bibliofile": "biblio.bib",
   "cite_by": "apalike",
   "current_citInitial": 1,
   "eqLabelWithNumbers": true,
   "eqNumInitial": 1,
   "hotkeys": {
    "equation": "Ctrl-E",
    "itemize": "Ctrl-I"
   },
   "labels_anchors": false,
   "latex_user_defs": false,
   "report_style_numbering": true,
   "user_envs_cfg": false
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": false,
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}