1. Researchers continuously try to understand why people do or do not vote in elections. Using the
dataset, examine the relationship between the dependent variable percent of the population voting
and two independent variables of your choice, representing education, poverty, unemployment,
crime, or income. Use interval data.
a. Write a hypothesis about the relationship between the dependent variable and each independent
variable. Remember to indicate an anticipated direction.
I will work with the variable % of population voting (C27) and SAT grand total (C21). My
hypotheses are:
H1: As the SAT Scores increase, the % of population voting will increase.
H0: There is no relationship between SAT scores and % of population voting.
b. Create a scatterplot for each relationship. Do the data fit a linear model? Explain why or why
not.
For the scatterplot, we use Graph, Scatterplot, Simple, select the y and x variables.
Here is the scatterplot:
The scatterplot seems to show that % of population voting does increase with SAT score. The
relationship does appear to be linear.
c. Obtain the following statistics for each relationship: constant, slope or regression coefficient, r,
standard error of the slope
To find the constant, slope, etc, in Minitab express we use Statistics, Regression, Simple Regression.
Enter the response variable (C27 % of population voting) and the continuous predictor (C21 SAT
grand total). You don’t need to use the options or the graph tab.
Here is the output:
The correlation coefficient is √0.0784 = 0.28
We can also calculate the correlation coefficient using the commands Statistics, Correlation,
Correlation. Identify the variables, hit OK.
Here is the output:
d. Write the regression equation for each relationship.
The regression equation appears at the bottom of the regression output in Minitab (see part c).
e. How well is each relationship described by the linear regression model? Cite evidence to support
your observation.
Use the regression output to answer this question.
f. For each relationship decide whether or not it supports the hypotheses you wrote.
Use the regression output, scatterplot to answer this question.
2. Use the variables in the dataset to create and test a multiple regression model of your own
design.
a. Write a verbal model that includes four to five independent variables. Your write-up should
suggest the value of studying the relationships, and provide evidence supporting the
hypothesized direction.
My model will use per pupil expenditures (C19) as the response variable and public school final
enrollment (C16), public school faculty (C17), SAT grand total (C21) and mean household income
(C40) as the independent variables.
I predict that expenditures will have a direct relationship with all four independent variables.
b. Test your model and present your findings in a table or equation.
To test my model, I use Stats, Regression, Multiple Regression. Enter the response variable [per
pupil expenditures C19)] and the continuous predictors [public school final enrollment (C16), public
school faculty (C17), SAT grand total (C21) and mean household income (C40)]. Click on OK.
Here is the regression output:
c. Write a verbal summary of your findings. If your hypotheses were not supported, suggest
possible reasons for the lack of support.
All of variables are contributing with the exception of SAT grand total and Mean household income
(why?). R2 =26.6% which tells me that about 27% of the variation in ____ (you fill in) is explained
by _____ (you fill in).
It is interesting to note that public school final enrollment has a negative coefficient, indicating an
inverse relationship with expenditures – is this what we expected? What does this mean in the
context of this model?