My recommendation is going with Open JDK8. Note: you will have to perform this step for all machines involved.
#Install spark on windows with jupyter install#
#Install spark on windows with jupyter windows 10#
I have not seen Spark running on native windows so far.įor this tutorial I have used a MacBook Air with Ubuntu 17.04 and my desktop system with Windows 10 running Linux Subsystem for Windows (yeah!) with Ubuntu 16.04 LTS.
![install spark on windows with jupyter install spark on windows with jupyter](https://i.bosity.com/product_img/274/52046187/52046187_10_image.jpg)
![install spark on windows with jupyter install spark on windows with jupyter](https://i0.wp.com/exitcondition.com/wp-content/uploads/2019/04/Apache-Spark-Download-Page.png)
Parallel computing: you use not one but many computers to speed your calculations.
![install spark on windows with jupyter install spark on windows with jupyter](https://miro.medium.com/max/1200/1*O372tkXzgxk2O5-5V3wXpg.jpeg)
Spark gives you two features you need to handle these data monsters: Even with a powerful computer it is crazy. Now think that you have to process a 1Tb (or bigger) dataset and train a ML algorithm on it. You will probably load the entire dataframe using Pandas, R or your tool of choice and after some quick cleaning and visualization you will be almost done with no major hassles related with computing performance if you are using a proper computer (or cloud infrastructure). Why do you need something like Spark? Think for example about a small dataset that fit easily into memory, let’s say some Gb maximum. Spark is a framework to make computations with large amounts of data.