
Checking linearity with scatter plots
A basic way of checking the linearity assumption is to make a scatter plot with the dependent variable in the y axis and an independent variable in the x axis. If the relation appears to be linear, the assumption is validated. In any interesting problem it's extremely hard to find a scatter plot that shows a very clear linear relation, and if it does happen we should be a little suspicious and careful with the data. To avoid reinventing the wheel, we will use the plot_scatterlot() function we created in Chapter 2, Understanding Votes with Descriptive Statistics:
plot_scatterplot( data = data, var_x = "Age_18to44", var_y = "Proportion", var_color = FALSE, regression = TRUE ) plot_scatterplot( data = data, var_x = "Students", var_y = "Proportion", var_color = FALSE, regression = TRUE )
As we can see, the scatter plot on the left shows a clear linear relation, as the percentage of people between 18 and 44 years of age (Age_18to44) increases, the proportion of people in favor of leaving the EU (Proportion) decreases. On the right hand, we see that the relation among the percentage of students in a ward (Students) and Proportion is clearly linear in the initial area (where Students is between 0 and 20), after that the relation too seems to be linear, but it is polluted by observations with very high percentage of students. However, we can still assume a linear relation between Students and Proportion.
When we're doing a Multiple Linear Regression as we're doing here, the assumption should be checked for the rest of the variables, which we omit here to preserve space, but we encourage you to do so. Keep in mind that it's very hard to find a linear relation in all of them, and this assumption is mostly an indicator of the predictive power of the variable in the regression. As long as the relation appears to be slightly linear, we should be all set.