To understand the advantages cloud computing provides when it comes to data science, let’s imagine a world with as much data as we have today, but without servers. In such an unfortunate scenario, firms would need databases that run locally, right?
So, every time when you, as a data scientist, want to engage in new analyses or refresh an existing algorithm, you’d have to transfer information to your machine from the central database, and then proceed to operate locally. This unfortunate world would have several main drawbacks…
For example, manual intervention would be necessary to retrieve data… Your machine becomes a single point of failure for the analyses you have worked on locally… Processing speed would be equivalent to the computing power of your computer… Chances are you will be able to work with a limited amount of data due to the limited computing resources at your disposal… Moreover, under this setup, you wouldn’t be able to leverage real-time data to build recommender systems or any type of machine learning algorithms that require ‘live’ data.
Doesn’t sound like the perfect scenario, does it?
Well, that’s why we invented servers. And then these servers had drawbacks of their own. Fortunately, we now have clouds. They overshadow local servers in almost every conceivable aspect. And, in fact, data scientists should be focused on developing great algorithms, testing hypothesis, taking advantage of all available data without having to wait hours to see the results of the tests they are performing and certainly without having to worry how much memory space they have left on their computer.
And yes, sometimes data scientists do end up waiting for long hours for an algorithm to train, but with a cloud, they have the option to pay more and get the job done faster. That’s yet another advantage of cloud computing over servers.