This content originally appeared on DEV Community 👩‍💻👨‍💻 and was authored by Lucas M. Ríos

💠 Simple way to start a cluster in Dataproc from GCP.

🔗Related content

You can find repo related in:

🐱‍🏍GitHub

You can connect with me in:

🧬LinkedIn

Resume 🧾

All this blog is using shell.

We will start a cluster in Dataproc where we need first have SDK's Google and have instanced a bucket in Cloud Storage.

For more information and potential of this, click here.

Pre-Conditions:

You need have installed SDK. You can do that following next steps Note: I set alias to use SDK like msdk.
You need have instanced a bucket. You can do that following next steps Note: I set a bucket called xlmriosx_bucket.

1st - Set a name to cluster ✍️

We assign to a variable the name that will have cluster.

I use following command:

CLUSTER_NAME="xlmriosx_cluster"

2nd - Create bucket 💠

Using SDK installed with alias like msdk I use following command to create a cluster:

With alias:

msdk gcloud beta dataproc clusters create $CLUSTER_NAME \
    --optional-components=ANACONDA,JUPYTER \
    --image-version=$IMAGE_VERSION \
    --enable-component-gateway \
    --region=$REGION \
    --num-workers=$NUM_WORKERS \
    --master-machine-type=$MASTER_MACHINE_TYPE \
    --worker-machine-type=$WORKER_MACHINE_TYPE \
    --bucket=$BUCKET_NAME \
    --tags=$TAGS

Where:
$CLUSTER_NAME -> Is name that will have your cluster. In this case xlmriosx_cluster.

$IMAGE_VERSION -> Is version of image that will we use. Ex.: 1.3

$REGION -> Region where you want stay your storage bucket. Ex.: us-central1

$NUM_WORKERS -> Is amount of workers or nodes that will process data. Ex.: 2

$MASTER_MACHINE_TYPE -> Is type of machine that will use our master machine. Ex.: n1-standard-1

$WORKER_MACHINE_TYPE -> Is type of machine that will use our workers machine. Ex.: n1-standard-1

$BUCKET_NAME -> Is name of our cluster where will use like Source and Destination. Ex.: xlmriosx_bucket

$TAGS -> Is a tag to know about what is the process that make cluster. Ex.: datascience

3rd - Verify that your bucket was created ✅

We will list clusters to verify that was created:

gcloud beta dataproc clusters list --region=$REGION

OUTPUT:
| NAME | WORKER_COUNT | PREEMPTIBLE_WORKER_COUNT | STATUS | ZONE | SCHEDULED_DELETE |
| --- | --- | --- | --- | --- | --- |
| xlmriosx_cluster | 3 | - | - | RUNNING | us-central1-c |

4th - Open the Jupyter notebook in your local browser 🕸

See Viewing and Accessing Component Gateway URLs to click Component Gateway links on the Cloud Console to open the Jupyter notebook and JupyterLab UIs running on the cluster's master node in your local browser.

4th - Open the Jupyter notebook in your local browser 🕸

5th - Say thanks, give like and share if this has been of help/interest 😁🖖

This content originally appeared on DEV Community 👩‍💻👨‍💻 and was authored by Lucas M. Ríos

Print Share Comment Cite Upload Translate Updates

APA

Lucas M. Ríos | Sciencx (2023-02-16T20:07:16+00:00) How create cluster in Dataproc and install Jupyter?. Retrieved from https://www.scien.cx/2023/02/16/how-create-cluster-in-dataproc-and-install-jupyter/

MLA

" » How create cluster in Dataproc and install Jupyter?." Lucas M. Ríos | Sciencx - Thursday February 16, 2023, https://www.scien.cx/2023/02/16/how-create-cluster-in-dataproc-and-install-jupyter/

HARVARD

Lucas M. Ríos | Sciencx Thursday February 16, 2023 » How create cluster in Dataproc and install Jupyter?., viewed ,<https://www.scien.cx/2023/02/16/how-create-cluster-in-dataproc-and-install-jupyter/>

VANCOUVER

Lucas M. Ríos | Sciencx - » How create cluster in Dataproc and install Jupyter?. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2023/02/16/how-create-cluster-in-dataproc-and-install-jupyter/

CHICAGO

" » How create cluster in Dataproc and install Jupyter?." Lucas M. Ríos | Sciencx - Accessed . https://www.scien.cx/2023/02/16/how-create-cluster-in-dataproc-and-install-jupyter/

IEEE

" » How create cluster in Dataproc and install Jupyter?." Lucas M. Ríos | Sciencx [Online]. Available: https://www.scien.cx/2023/02/16/how-create-cluster-in-dataproc-and-install-jupyter/. [Accessed: ]

rf:citation

» How create cluster in Dataproc and install Jupyter? | Lucas M. Ríos | Sciencx | https://www.scien.cx/2023/02/16/how-create-cluster-in-dataproc-and-install-jupyter/ |

Please log in to upload a file.

There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.

💠 Simple way to start a cluster in Dataproc from GCP.

🔗Related content

You can find repo related in:

You can connect with me in:

Resume 🧾

1st - Set a name to cluster ✍️

2nd - Create bucket 💠

3rd - Verify that your bucket was created ✅

4th - Open the Jupyter notebook in your local browser 🕸

4th - Open the Jupyter notebook in your local browser 🕸

5th - Say thanks, give like and share if this has been of help/interest 😁🖖

Related Posts