Data Insight blog

Jun 15, 2015

Spark & Python Notebooks V: Decision Trees & Model Selection

The fourth episode in our Spark series introduced Logistic Regression with MLlib. This new notebook explains how to use the library to build a classifier using Decision Trees on a large dataset. It also shows how powerful trees are in order to understand our data and even perform model selection.

Jun 8, 2015

Spark & Python Notebooks IV: Logistic Regression & Model Selection

The third episode in our Spark series introduced the MLlib library and its Statistics and Exploratory Data Analysis capabilities. This fourth notebook explains how to use the library to build a classifier using Logistic Regression on a large dataset. It also describes two different approaches to model selection.

Oct 4, 2014

Ridge regression model selection with R

If recently we used best subset as a way of reducing the unnecessary model complexity, this time we are going to use the Ridge regression technique.

Sep 27, 2014

Best subset model selection with R

Linear regression models are easy to fit and interpret. Moreover, they are suprisingly accurate in many real world situations where the relationship between the response and the predictors is approximately linear. However, it is often the case that not all the variables used in a multiple regression model are in associated with the response.