A look at the world wine market using Python, Pandas, and Seaborn

In this article we want to have a look at present wine market prices by region and appellation from the point of view of the Wine.com website catalog. We will use Python-based libraries such as Pandas and Seaborn.

Exploring geographical data with SparkR and ggplot2

The present analysis will make use of SparkR’s power to analyse large datasets in order to explore the 2013 American Community Survey dataset, more concretely its geographical features. For that purpose, we will aggregate data using the different tools introduced in the SparkR documentation and our series of notebooks, and then use ggplot2 mapping capabilities to put the different aggregations into a geographical context.

Linear Models with SparkR 1.5: uses and present limitations

In this analysis we will use SparkR machine learning capabilities in order to try to predict property value in relation to other variables in the 2013 American Community Survey dataset. You can also check the associated Jupyter notebook. By doing so we will show the current limitations of SparkR’s MLlib and also those of linear methods as a predictive method, no matter how much data we have.

A visual on tuberculosis evolution using Python and Bokeh

In this second approach to the World situation of infectious tuberculosis from 1990 to 2007, we want to make a point about how a simple visual representation of tabular data, a Bokeh heatmap in this case, can provide a lot of information that, although is already there in the tabular data, might be more difficult to percieve.