Figure 1: Scatterplot Matrix of Data


Figure 2: Histogram of Median Income


Figure 3: Histogram of the Proportion of Residents with a Bachelor’s Degree


Figure 4: Histogram of Unemployment Rate


Figure 5: Histogram of Average Household Size


Figure 6: Boxplot of Region


Figure 7: Initial Model Summary

## 
## Call:
## lm(formula = medianIncome ~ pctBach + pctUnemployed + avgHouse + 
##     region, data = income)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -34.902  -3.997  -0.219   3.692  25.481 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      3.33032    3.12608   1.065   0.2872    
## pctBach          1.19672    0.06105  19.603  < 2e-16 ***
## pctUnemployed   -1.07963    0.10388 -10.394  < 2e-16 ***
## avgHouse        14.65802    1.28533  11.404  < 2e-16 ***
## regionNortheast  6.48713    1.08507   5.979 4.00e-09 ***
## regionSoutheast -1.90928    0.76426  -2.498   0.0128 *  
## regionSouthwest -4.08575    1.01331  -4.032 6.29e-05 ***
## regionWest      -2.00312    1.03756  -1.931   0.0540 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.74 on 562 degrees of freedom
## Multiple R-squared:  0.6583, Adjusted R-squared:  0.6541 
## F-statistic: 154.7 on 7 and 562 DF,  p-value: < 2.2e-16

Figure 8: Distribution of Population Variable Before Log Transformation


Figure 9: Distribution of Population Variable After Logarithmic Transformation


Figure 10: Scatterplot of Poverty Rate and Median Income


Figure 11: Scatterplot of Median Income and log(population) by Region


Figure 12: Summary Output with Interaction Term on Average House Size

## 
## Call:
## lm(formula = medianIncome ~ pctBach + pctUnemployed + avgHouse + 
##     log(population) * region, data = income)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -32.211  -3.923   0.005   3.659  24.364 
## 
## Coefficients:
##                                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                      -1.29143    4.29127  -0.301  0.76357    
## pctBach                           1.05553    0.07264  14.532  < 2e-16 ***
## pctUnemployed                    -1.18570    0.10556 -11.233  < 2e-16 ***
## avgHouse                         13.21591    1.32590   9.968  < 2e-16 ***
## log(population)                   1.07142    0.39107   2.740  0.00635 ** 
## regionNortheast                 -34.13597   10.12528  -3.371  0.00080 ***
## regionSoutheast                   4.35310    6.22527   0.699  0.48468    
## regionSouthwest                   6.13410    6.75428   0.908  0.36418    
## regionWest                        9.05138    5.92776   1.527  0.12734    
## log(population):regionNortheast   3.34152    0.87525   3.818  0.00015 ***
## log(population):regionSoutheast  -0.62392    0.59803  -1.043  0.29726    
## log(population):regionSouthwest  -0.98777    0.65297  -1.513  0.13091    
## log(population):regionWest       -1.02977    0.57415  -1.794  0.07343 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.577 on 557 degrees of freedom
## Multiple R-squared:  0.6775, Adjusted R-squared:  0.6705 
## F-statistic: 97.51 on 12 and 557 DF,  p-value: < 2.2e-16

Figure 13: Scatterplot Matrix of Strong Linear Relationships (pctHS, pctEmployed, pctMarried, and pctMarriedHouse)


Figure 14: Scatterplot Matrix of Strong Linear Relationships (pctPrivateHC, pctPublicHC, and pctEmployerHC)


Figure 15: Scatterplot of Median Income vs Population with a logarithmic transformation


Figure 16: Model Output for Initial Model with pctHS as additional predictor

## 
## Call:
## lm(formula = medianIncome ~ pctBach + pctUnemployed + avgHouse + 
##     pctHS + region, data = income)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -34.725  -3.981  -0.219   3.608  25.313 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     -0.60444    5.37226  -0.113 0.910458    
## pctBach          1.26982    0.10156  12.503  < 2e-16 ***
## pctUnemployed   -1.05365    0.10782  -9.772  < 2e-16 ***
## avgHouse        14.78915    1.29377  11.431  < 2e-16 ***
## pctHS            0.06704    0.07443   0.901 0.368149    
## regionNortheast  6.33726    1.09793   5.772  1.3e-08 ***
## regionSoutheast -1.81973    0.77082  -2.361 0.018579 *  
## regionSouthwest -3.83806    1.05013  -3.655 0.000282 ***
## regionWest      -1.74486    1.07663  -1.621 0.105650    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.741 on 561 degrees of freedom
## Multiple R-squared:  0.6588, Adjusted R-squared:  0.6539 
## F-statistic: 135.4 on 8 and 561 DF,  p-value: < 2.2e-16

Figure 17: Model Output for Initial Model with pctPublicHC as additional predictor

## 
## Call:
## lm(formula = medianIncome ~ pctBach + pctUnemployed + avgHouse + 
##     pctPublicHC + region, data = income)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -34.139  -3.554   0.172   3.277  23.704 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     44.46475    4.84029   9.186  < 2e-16 ***
## pctBach          0.73993    0.07070  10.466  < 2e-16 ***
## pctUnemployed   -0.44499    0.11250  -3.956 8.61e-05 ***
## avgHouse         7.34271    1.36538   5.378 1.11e-07 ***
## pctPublicHC     -0.60738    0.05767 -10.532  < 2e-16 ***
## regionNortheast  7.02737    0.99368   7.072 4.56e-12 ***
## regionSoutheast -1.72441    0.69917  -2.466  0.01395 *  
## regionSouthwest -2.90111    0.93353  -3.108  0.00198 ** 
## regionWest      -0.27622    0.96297  -0.287  0.77434    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.164 on 561 degrees of freedom
## Multiple R-squared:  0.7147, Adjusted R-squared:  0.7107 
## F-statistic: 175.7 on 8 and 561 DF,  p-value: < 2.2e-16

Figure 18: Model Output for Initial Model with pctPrivateHC as additional predictor

## 
## Call:
## lm(formula = medianIncome ~ pctBach + pctUnemployed + avgHouse + 
##     pctPrivateHC + region, data = income)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -40.184  -2.946  -0.205   2.719  24.931 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     -34.56687    3.77524  -9.156  < 2e-16 ***
## pctBach           0.71386    0.06235  11.449  < 2e-16 ***
## pctUnemployed    -0.28900    0.10493  -2.754  0.00607 ** 
## avgHouse         14.18280    1.10272  12.862  < 2e-16 ***
## pctPrivateHC      0.57884    0.04060  14.257  < 2e-16 ***
## regionNortheast   5.97321    0.93118   6.415    3e-10 ***
## regionSoutheast   0.62774    0.67911   0.924  0.35569    
## regionSouthwest   1.67395    0.95827   1.747  0.08121 .  
## regionWest        2.08065    0.93472   2.226  0.02641 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.779 on 561 degrees of freedom
## Multiple R-squared:  0.7492, Adjusted R-squared:  0.7456 
## F-statistic: 209.5 on 8 and 561 DF,  p-value: < 2.2e-16

Figure 19: Model Output for Initial Model with pctEmployerHC as additional predictor

## 
## Call:
## lm(formula = medianIncome ~ pctBach + pctUnemployed + avgHouse + 
##     pctEmployerHC + region, data = income)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -30.8301  -3.3400  -0.1718   2.9261  23.9453 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     -5.25492    2.75695  -1.906   0.0572 .  
## pctBach          0.82407    0.05879  14.017  < 2e-16 ***
## pctUnemployed   -0.50938    0.09808  -5.194 2.89e-07 ***
## avgHouse         9.23565    1.17057   7.890 1.59e-14 ***
## pctEmployerHC    0.52071    0.03694  14.094  < 2e-16 ***
## regionNortheast  4.91738    0.93992   5.232 2.38e-07 ***
## regionSoutheast -0.60407    0.66385  -0.910   0.3632    
## regionSouthwest  0.03900    0.91939   0.042   0.9662    
## regionWest       1.77718    0.93187   1.907   0.0570 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.797 on 561 degrees of freedom
## Multiple R-squared:  0.7477, Adjusted R-squared:  0.7441 
## F-statistic: 207.8 on 8 and 561 DF,  p-value: < 2.2e-16

Figure 20: Model Output for Initial Model with pctMarried as additional predictor

## 
## Call:
## lm(formula = medianIncome ~ pctBach + pctUnemployed + avgHouse + 
##     pctMarried + region, data = income)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -37.074  -3.936  -0.210   3.247  21.820 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     -28.04716    4.28148  -6.551 1.30e-10 ***
## pctBach           1.28153    0.05700  22.482  < 2e-16 ***
## pctUnemployed    -0.56114    0.10921  -5.138 3.84e-07 ***
## avgHouse         14.85696    1.18676  12.519  < 2e-16 ***
## pctMarried        0.48622    0.04901   9.921  < 2e-16 ***
## regionNortheast   7.52896    1.00720   7.475 2.98e-13 ***
## regionSoutheast  -1.05613    0.71076  -1.486  0.13786    
## regionSouthwest  -2.92412    0.94276  -3.102  0.00202 ** 
## regionWest       -2.49602    0.95914  -2.602  0.00950 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.222 on 561 degrees of freedom
## Multiple R-squared:  0.7093, Adjusted R-squared:  0.7052 
## F-statistic: 171.1 on 8 and 561 DF,  p-value: < 2.2e-16

Figure 21: Model Output for Initial Model with pctPoverty as additional predictor

## 
## Call:
## lm(formula = medianIncome ~ pctBach + pctUnemployed + avgHouse + 
##     pctPoverty + I(pctPoverty^2) + region, data = income)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -14.7751  -2.4285  -0.1139   2.2769  18.9909 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     44.952435   2.612758  17.205  < 2e-16 ***
## pctBach          0.701727   0.040951  17.136  < 2e-16 ***
## pctUnemployed    0.117395   0.075872   1.547   0.1224    
## avgHouse        10.184353   0.807770  12.608  < 2e-16 ***
## pctPoverty      -2.845072   0.135196 -21.044  < 2e-16 ***
## I(pctPoverty^2)  0.039348   0.003021  13.025  < 2e-16 ***
## regionNortheast  4.289551   0.669844   6.404 3.21e-10 ***
## regionSoutheast  1.926483   0.485602   3.967 8.22e-05 ***
## regionSouthwest  1.144354   0.647474   1.767   0.0777 .  
## regionWest      -0.119917   0.641727  -0.187   0.8518    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.136 on 560 degrees of freedom
## Multiple R-squared:  0.8718, Adjusted R-squared:  0.8697 
## F-statistic:   423 on 9 and 560 DF,  p-value: < 2.2e-16

Figure 22: Model Output for Initial Model with pctMarriedHouse as additional predictor

## 
## Call:
## lm(formula = medianIncome ~ pctBach + pctUnemployed + avgHouse + 
##     pctMarriedHouse + region, data = income)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -41.694  -3.719  -0.212   3.161  23.365 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     -16.75544    3.39452  -4.936 1.05e-06 ***
## pctBach           1.25832    0.05584  22.533  < 2e-16 ***
## pctUnemployed    -0.50621    0.10831  -4.674 3.71e-06 ***
## avgHouse          9.45179    1.26435   7.476 2.97e-13 ***
## pctMarriedHouse   0.53750    0.04956  10.846  < 2e-16 ***
## regionNortheast   6.81804    0.98791   6.901 1.40e-11 ***
## regionSoutheast  -1.34442    0.69744  -1.928  0.05440 .  
## regionSouthwest  -2.60536    0.93218  -2.795  0.00537 ** 
## regionWest       -2.20778    0.94440  -2.338  0.01975 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.133 on 561 degrees of freedom
## Multiple R-squared:  0.7175, Adjusted R-squared:  0.7135 
## F-statistic: 178.1 on 8 and 561 DF,  p-value: < 2.2e-16

Figure 23: Model Output for Initial Model with log(population) as additional predictor

## 
## Call:
## lm(formula = medianIncome ~ pctBach + pctUnemployed + avgHouse + 
##     log(population) + region, data = income)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -33.241  -4.038  -0.246   3.489  26.403 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      0.75753    3.26507   0.232  0.81661    
## pctBach          1.10030    0.07124  15.445  < 2e-16 ***
## pctUnemployed   -1.14677    0.10655 -10.762  < 2e-16 ***
## avgHouse        13.59488    1.34310  10.122  < 2e-16 ***
## log(population)  0.68906    0.26602   2.590  0.00984 ** 
## regionNortheast  5.74146    1.11732   5.139 3.83e-07 ***
## regionSoutheast -2.03266    0.76189  -2.668  0.00785 ** 
## regionSouthwest -4.01150    1.00861  -3.977 7.88e-05 ***
## regionWest      -1.60416    1.04376  -1.537  0.12488    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.706 on 561 degrees of freedom
## Multiple R-squared:  0.6624, Adjusted R-squared:  0.6575 
## F-statistic: 137.6 on 8 and 561 DF,  p-value: < 2.2e-16

Figure 24: Model Output of Model with Added Variables

## 
## Call:
## lm(formula = medianIncome ~ pctBach + pctUnemployed + pctPublicHC + 
##     pctPrivateHC + pctEmployerHC + pctMarried + pctMarriedHouse + 
##     avgHouse + pctPoverty + I(pctPoverty^2) + log(population) + 
##     region, data = income)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -13.9888  -2.2627  -0.0614   2.0926  19.1450 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     64.508155   6.915550   9.328  < 2e-16 ***
## pctBach          0.483897   0.051860   9.331  < 2e-16 ***
## pctUnemployed    0.252294   0.078867   3.199 0.001458 ** 
## pctPublicHC     -0.242579   0.048105  -5.043 6.23e-07 ***
## pctPrivateHC    -0.145378   0.045023  -3.229 0.001316 ** 
## pctEmployerHC    0.147528   0.040336   3.657 0.000279 ***
## pctMarried      -0.105996   0.073333  -1.445 0.148911    
## pctMarriedHouse  0.192539   0.073325   2.626 0.008882 ** 
## avgHouse         3.160867   1.208195   2.616 0.009134 ** 
## pctPoverty      -2.792358   0.144430 -19.334  < 2e-16 ***
## I(pctPoverty^2)  0.040955   0.002981  13.737  < 2e-16 ***
## log(population)  0.623204   0.173646   3.589 0.000361 ***
## regionNortheast  3.590693   0.656014   5.473 6.70e-08 ***
## regionSoutheast  1.331682   0.460446   2.892 0.003976 ** 
## regionSouthwest  1.376873   0.647401   2.127 0.033880 *  
## regionWest       0.951050   0.630222   1.509 0.131851    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.779 on 554 degrees of freedom
## Multiple R-squared:  0.8941, Adjusted R-squared:  0.8913 
## F-statistic: 311.9 on 15 and 554 DF,  p-value: < 2.2e-16

Figure 25: Comparing models with a quadratic term and one without


Figure 26: Plot of Residuals for Model of the Second Power


Figure 27: Model with third power exponent

## 
## Call:
## lm(formula = medianIncome ~ pctBach + pctUnemployed + pctPublicHC + 
##     pctPrivateHC + pctEmployerHC + pctMarried + pctMarriedHouse + 
##     avgHouse + pctPoverty + I(pctPoverty^2) + I(pctPoverty^3) + 
##     log(population) + region, data = income)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -14.7294  -2.2431  -0.0443   1.8356  17.5094 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     75.4935674  6.9495323  10.863  < 2e-16 ***
## pctBach          0.4298655  0.0510807   8.415 3.36e-16 ***
## pctUnemployed    0.1970514  0.0770211   2.558  0.01078 *  
## pctPublicHC     -0.2134773  0.0468948  -4.552 6.53e-06 ***
## pctPrivateHC    -0.1114624  0.0440187  -2.532  0.01161 *  
## pctEmployerHC    0.1268039  0.0392631   3.230  0.00131 ** 
## pctMarried      -0.1012288  0.0711116  -1.424  0.15515    
## pctMarriedHouse  0.1756318  0.0711542   2.468  0.01388 *  
## avgHouse         2.9738844  1.1719297   2.538  0.01144 *  
## pctPoverty      -4.8812225  0.3742212 -13.044  < 2e-16 ***
## I(pctPoverty^2)  0.1433403  0.0172535   8.308 7.55e-16 ***
## I(pctPoverty^3) -0.0014901  0.0002476  -6.019 3.19e-09 ***
## log(population)  0.7619289  0.1699441   4.483 8.94e-06 ***
## regionNortheast  3.4398760  0.6365926   5.404 9.72e-08 ***
## regionSoutheast  1.1640698  0.4473361   2.602  0.00951 ** 
## regionSouthwest  1.6153092  0.6289959   2.568  0.01049 *  
## regionWest       1.4815323  0.6174127   2.400  0.01674 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.664 on 553 degrees of freedom
## Multiple R-squared:  0.9006, Adjusted R-squared:  0.8978 
## F-statistic: 313.3 on 16 and 553 DF,  p-value: < 2.2e-16

Figure 28: Residual Plot for Final Model


Figure 29: QQ Plot for Final Model


Figure 30: VIF Output for Final Model

##                       GVIF Df GVIF^(1/(2*Df))
## pctBach           3.071871  1        1.752676
## pctUnemployed     2.757156  1        1.660468
## pctPublicHC       5.397817  1        2.323320
## pctPrivateHC      9.032486  1        3.005410
## pctEmployerHC     5.992370  1        2.447932
## pctMarried        9.081810  1        3.013604
## pctMarriedHouse   8.399578  1        2.898203
## avgHouse          3.433620  1        1.853003
## pctPoverty      230.999949  1       15.198682
## I(pctPoverty^2) 833.986076  1       28.878817
## I(pctPoverty^3) 229.031705  1       15.133793
## log(population)   2.422183  1        1.556336
## region            3.045808  4        1.149378

Figure 31: Expected Change in Response for a one Standard Deviation Increase in each variable

##  pctBach 
## 2.265543
## pctUnemployed 
##      0.652522
## pctPublicHC 
##   -1.624543
## pctPrivateHC 
##    -1.168933
## pctEmployerHC 
##      1.214346
## pctMarried 
## -0.6589385
## pctMarriedHouse 
##        1.098819
##  avgHouse 
## 0.7222598
## log(population) 
##        1.071782

Figure 32: Model Output without pctMarried

## 
## Call:
## lm(formula = medianIncome ~ pctBach + pctUnemployed + pctPublicHC + 
##     pctPrivateHC + pctEmployerHC + pctMarriedHouse + avgHouse + 
##     pctPoverty + I(pctPoverty^2) + +I(pctPoverty^3) + log(population) + 
##     region, data = income)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -14.6110  -2.2452  -0.1178   1.9644  17.3239 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     70.6485643  6.0646488  11.649  < 2e-16 ***
## pctBach          0.4331119  0.0510770   8.480  < 2e-16 ***
## pctUnemployed    0.2111851  0.0764492   2.762 0.005928 ** 
## pctPublicHC     -0.2143074  0.0469346  -4.566 6.13e-06 ***
## pctPrivateHC    -0.1006821  0.0434025  -2.320 0.020718 *  
## pctEmployerHC    0.1375627  0.0385645   3.567 0.000392 ***
## pctMarriedHouse  0.0893411  0.0372958   2.395 0.016930 *  
## avgHouse         3.8280148  1.0076119   3.799 0.000161 ***
## pctPoverty      -4.8356300  0.3731933 -12.957  < 2e-16 ***
## I(pctPoverty^2)  0.1431291  0.0172688   8.288 8.72e-16 ***
## I(pctPoverty^3) -0.0014941  0.0002478  -6.030 3.00e-09 ***
## log(population)  0.7493768  0.1698724   4.411 1.23e-05 ***
## regionNortheast  3.6292745  0.6231100   5.824 9.72e-09 ***
## regionSoutheast  1.2355500  0.4449205   2.777 0.005672 ** 
## regionSouthwest  1.6984049  0.6268613   2.709 0.006950 ** 
## regionWest       1.5260297  0.6171919   2.473 0.013715 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.667 on 554 degrees of freedom
## Multiple R-squared:  0.9003, Adjusted R-squared:  0.8976 
## F-statistic: 333.4 on 15 and 554 DF,  p-value: < 2.2e-16