A scalable on-line movie recommender using Spark and Flask

This Apache Spark tutorial will guide you step-by-step into how to use the MovieLens dataset to build a movie recommendations web service using collaborative filtering with Spark’s Alternating Least Saqures implementation and Python/Flask.

Data Science Engineering, your way

Today we just made public a series of tutorials on Data Science Engineering. In them we will try to compare how different concepts in the discipline can be implemented in the two dominant ecosystems nowadays: R and Python.

Spark & Python Notebooks VI: SQL & Dataframes

The fifth episode in our Spark series introduced Decision Trees with MLlib. This new notebook moves away from MLlib for a while in order to introduce SparkSQL and the concept of Dataframe, that will speed up our analysis and make it easier to communicate.

Spark & Python Notebooks V: Decision Trees & Model Selection

The fourth episode in our Spark series introduced Logistic Regression with MLlib. This new notebook explains how to use the library to build a classifier using Decision Trees on a large dataset. It also shows how powerful trees are in order to understand our data and even perform model selection.