library(dplyr);
library(ggplot2);
library(GGally);
library(vioplot);
library(corpcor);
library(ppcor);
library(mctest);
library(ggfortify);
library(lmtest);
library(MASS);
library(car);
library(DAAG);
library(jtools);
library(relaimpo);
data = read.csv("Admission_Data.csv")
df = data[,2:9]
Response variable: Chance.of.Admit Predictors: Everything else
summary(df)
GRE.Score TOEFL.Score University.Rating SOP LOR
Min. :290.0 Min. : 92.0 Min. :1.000 Min. :1.000 Min. :1.000
1st Qu.:308.0 1st Qu.:103.0 1st Qu.:2.000 1st Qu.:2.500 1st Qu.:3.000
Median :317.0 Median :107.0 Median :3.000 Median :3.500 Median :3.500
Mean :316.5 Mean :107.2 Mean :3.114 Mean :3.374 Mean :3.484
3rd Qu.:325.0 3rd Qu.:112.0 3rd Qu.:4.000 3rd Qu.:4.000 3rd Qu.:4.000
Max. :340.0 Max. :120.0 Max. :5.000 Max. :5.000 Max. :5.000
CGPA Research Chance.of.Admit
Min. :6.800 Min. :0.00 Min. :0.3400
1st Qu.:8.127 1st Qu.:0.00 1st Qu.:0.6300
Median :8.560 Median :1.00 Median :0.7200
Mean :8.576 Mean :0.56 Mean :0.7217
3rd Qu.:9.040 3rd Qu.:1.00 3rd Qu.:0.8200
Max. :9.920 Max. :1.00 Max. :0.9700
par(mfrow=c(4, 2))
colnames = names(df)
for(name in colnames) {
vioplot(df[name], horizontal=TRUE, col='gold', lineCol='gold', lty=0, colMed='floralwhite', yaxt='n',rectCol='dodgerblue4')
title(main=name)
}
From our plots, we can see a bell-shape for all variables other than Research, which is a binary variable. This indicates that both predictors and response seem to follow a roughly normal distribution.
There is no extreme skew for the variables. this makes the confidence intervals for estimating parameters for our predictors and estimating the mean response more meaningful.
From the last row, we can observe that most of the IVs seem to have a linear relationship with our response variable except for the binary variable Research. Therefore the assumption for linearity between DV and each of IVs hold.
The pairwise correlation for all variables are fairly high. This seems to violate the Multiple Linear Regression’s assumption of No Multicollinearity.
To account for confounding effects from the other predictors.
pcorr = as.data.frame(cor2pcor(cov(df)))
names(pcorr) = names(df)
rownames(pcorr) = names(df)
pcorr = format(pcorr, digits=1)
print.data.frame(pcorr)
GRE.Score TOEFL.Score University.Rating SOP LOR CGPA Research
GRE.Score 1.00 0.43 0.04 -0.03 -0.08 0.26 0.25
TOEFL.Score 0.43 1.00 0.08 0.09 -0.03 0.18 -0.07
University.Rating 0.04 0.08 1.00 0.37 0.13 0.09 0.04
SOP -0.03 0.09 0.37 1.00 0.29 0.15 0.02
LOR -0.08 -0.03 0.13 0.29 1.00 0.09 0.02
CGPA 0.26 0.18 0.09 0.15 0.09 1.00 -0.06
Research 0.25 -0.07 0.04 0.02 0.02 -0.06 1.00
Chance.of.Admit 0.16 0.14 0.07 0.02 0.18 0.48 0.16
Chance.of.Admit
GRE.Score 0.16
TOEFL.Score 0.14
University.Rating 0.07
SOP 0.02
LOR 0.18
CGPA 0.48
Research 0.16
Chance.of.Admit 1.00
The partial correlation coefficients suggest otherwise, that there is less multicollinearity with only GRE.Score & TOEFL.Score having a value > 0.4. Partial correlation between CGPA and our response variable Chance.of.Admit is fairly high but it does not violate the “No Multicollinearity between its IVs assumption” of MLR.
imcdiag(df[,1:7],df$Chance.of.Admit)
Call:
imcdiag(x = df[, 1:7], y = df$Chance.of.Admit)
All Individual Multicollinearity Diagnostics Result
VIF TOL Wi Fi Leamer CVIF Klein
GRE.Score 4.4642 0.2240 284.6458 342.2678 0.4733 -0.2921 0
TOEFL.Score 3.9042 0.2561 238.6295 286.9363 0.5061 -0.2555 0
University.Rating 2.6210 0.3815 133.1951 160.1584 0.6177 -0.1715 0
SOP 2.8352 0.3527 150.7931 181.3188 0.5939 -0.1855 0
LOR 2.0336 0.4917 84.9238 102.1152 0.7012 -0.1331 0
CGPA 4.7780 0.2093 310.4250 373.2656 0.4575 -0.3127 0
Research 1.4940 0.6693 40.5910 48.8080 0.8181 -0.0978 0
1 --> COLLINEARITY is detected by the test
0 --> COLLINEARITY is not detected by the test
University.Rating , SOP , coefficient(s) are non-significant may be due to multicollinearity
R-square of y on all x: 0.8219
* use method argument to check which regressors may be the reason of collinearity
===================================
All the predictors have a VIF (=1/(1-R^2)) value of <5 which indicates that the multicollinearity is not so problematic.
fit = lm(Chance.of.Admit ~ ., data=df)
summary(fit)
Call:
lm(formula = Chance.of.Admit ~ ., data = df)
Residuals:
Min 1Q Median 3Q Max
-0.266657 -0.023327 0.009191 0.033714 0.156818
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.2757251 0.1042962 -12.232 < 2e-16 ***
GRE.Score 0.0018585 0.0005023 3.700 0.000240 ***
TOEFL.Score 0.0027780 0.0008724 3.184 0.001544 **
University.Rating 0.0059414 0.0038019 1.563 0.118753
SOP 0.0015861 0.0045627 0.348 0.728263
LOR 0.0168587 0.0041379 4.074 5.38e-05 ***
CGPA 0.1183851 0.0097051 12.198 < 2e-16 ***
Research 0.0243075 0.0066057 3.680 0.000259 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 0.05999 on 492 degrees of freedom
Multiple R-squared: 0.8219, Adjusted R-squared: 0.8194
F-statistic: 324.4 on 7 and 492 DF, p-value: < 2.2e-16
Fit: Chance.of.Admit = -1.28 + 0.00186(GRE.Score) + 0.00278(TOEFL.Score) + 0.00594(University.Rating) + 0.00159(SOP) + 0.0169(LOR) + 0.118(CGPA) + 0.0243(Research) (3s.f.)
This indicates that on average, a unit increase in GRE.Score/TOEFL.Score/University.Rating/SOP/LOR/CGPA/Research increases Chance.of.Admit by 0.00186/0.00278/0.00594/0.00159/0.0169/0.118/0.0243, while holding all other variables constant.
The p-value for the F-statistic is <2.2e-16, indicating that we can reject the null hypothesis that the intercept-only model is the same fit as the MLR model even at alpha=0.001. Therefore, the MLR model is highly statistically significant at the 0.01 significance level.
The Adjusted R-squared: 0.8194 is high which suggests that the model is a good fit.
The coefficients for GRE.Score, TOEFL.Score, LOR, CGPA, Research are statistically significant at alpha=0.01 where the respective pvalues < 0.01 as we reject the null that their coeffs is 0 at the 0.01 significance level.
The coefficients for University.Rating (0.118) and SOP (0.728263) are >0.01 and we fail to reject the null that their coeffs is 0 at the 0.01 significance level.
autoplot(fit)
bptest(fit)
studentized Breusch-Pagan test
data: fit
BP = 30.516, df = 7, p-value = 7.634e-05
Using the Breusch-Pagan test to confirm, we can reject the null hypothesis at the 0.05 significance level that variance of the residuals is constant and infer that heteroscedasticity is present. Therefore, this makes our coefficient estimates less precise and increases the likelihood that the estimates are further from the true population value.
Transforming response variable using Box-Cox Power transformation to make it normal and handle heteroscedasticity
bc = boxcox(Chance.of.Admit ~ ., data=df);
lambda = bc$x[which.max(bc$y)]
powerTransform <- function(y, lambda1, lambda2 = NULL, method = "boxcox") {
boxcoxTrans <- function(x, lam1, lam2 = NULL) {
# if we set lambda2 to zero, it becomes the one parameter transformation
lam2 <- ifelse(is.null(lam2), 0, lam2)
if (lam1 == 0L) {
log(y + lam2)
} else {
(((y + lam2)^lam1) - 1) / lam1
}
}
switch(method
, boxcox = boxcoxTrans(y, lambda1, lambda2)
, tukey = y^lambda1
)
}
# re-run with transformation
bcfit <- lm(powerTransform(Chance.of.Admit, lambda) ~ ., data=df)
The procedure identifies an appropriate exponent (Lambda = l) to use to transform data into a "normal shape. The Lambda value indicates the power to which all data should be raised and it is suggested to use lambda=2.
summary(bcfit)
Call:
lm(formula = powerTransform(Chance.of.Admit, lambda) ~ ., data = df)
Residuals:
Min 1Q Median 3Q Max
-0.165938 -0.019448 0.006959 0.024234 0.103178
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.6494147 0.0677829 -24.334 < 2e-16 ***
GRE.Score 0.0013096 0.0003265 4.011 6.97e-05 ***
TOEFL.Score 0.0021897 0.0005670 3.862 0.000128 ***
University.Rating 0.0064119 0.0024709 2.595 0.009741 **
SOP 0.0018655 0.0029653 0.629 0.529580
LOR 0.0101724 0.0026892 3.783 0.000174 ***
CGPA 0.0814929 0.0063074 12.920 < 2e-16 ***
Research 0.0179019 0.0042931 4.170 3.60e-05 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 0.03899 on 492 degrees of freedom
Multiple R-squared: 0.8493, Adjusted R-squared: 0.8471
F-statistic: 396.1 on 7 and 492 DF, p-value: < 2.2e-16
The Adjusted R-squared increased from 0.8194 to 0.8471 while the predictors remain significant. However, this model is less interpretable and we want our models to be parsimonious as possible. We will explore more models later on.
cooksd <- cooks.distance(fit)
sample_size <- nrow(df)
plot(cooksd, pch="*", cex=2, main="Influential Obs by Cooks distance")
abline(h = 4/sample_size, col="red")
text(x=1:length(cooksd)+1, y=cooksd, labels=ifelse(cooksd>4/sample_size, names(cooksd),""), col="red")
influential = as.numeric(names(cooksd)[(cooksd > (4/sample_size))])
df2 = df[-influential, ]
fit2 = lm(Chance.of.Admit ~ ., data=df2)
summary(fit2)
Call:
lm(formula = Chance.of.Admit ~ ., data = df2)
Residuals:
Min 1Q Median 3Q Max
-0.165185 -0.026028 0.003584 0.027486 0.109968
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.2229063 0.0803825 -15.214 < 2e-16 ***
GRE.Score 0.0019839 0.0003834 5.175 3.42e-07 ***
TOEFL.Score 0.0023900 0.0006541 3.654 0.000288 ***
University.Rating 0.0041892 0.0029437 1.423 0.155386
SOP 0.0065602 0.0036266 1.809 0.071120 .
LOR 0.0150463 0.0030929 4.865 1.58e-06 ***
CGPA 0.1128585 0.0075521 14.944 < 2e-16 ***
Research 0.0239311 0.0050812 4.710 3.29e-06 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 0.04367 on 459 degrees of freedom
Multiple R-squared: 0.8932, Adjusted R-squared: 0.8916
F-statistic: 548.6 on 7 and 459 DF, p-value: < 2.2e-16
By removing the highly influential outliers, we refitted the model on the filtered data and the Adjusted R-squared increased to 0.8194 to 0.8916 without introducing complexity to the model.`
fit3 = lm(exp(Chance.of.Admit) ~ ., data=df2)
summary(fit3)
Call:
lm(formula = exp(Chance.of.Admit) ~ ., data = df2)
Residuals:
Min 1Q Median 3Q Max
-0.253893 -0.056622 0.007272 0.055939 0.216996
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.9539180 0.1581358 -12.356 < 2e-16 ***
GRE.Score 0.0039627 0.0007542 5.254 2.28e-07 ***
TOEFL.Score 0.0054752 0.0012868 4.255 2.54e-05 ***
University.Rating 0.0131840 0.0057911 2.277 0.0233 *
SOP 0.0118772 0.0071346 1.665 0.0966 .
LOR 0.0281279 0.0060847 4.623 4.93e-06 ***
CGPA 0.2331331 0.0148571 15.692 < 2e-16 ***
Research 0.0495501 0.0099961 4.957 1.01e-06 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 0.08592 on 459 degrees of freedom
Multiple R-squared: 0.9038, Adjusted R-squared: 0.9023
F-statistic: 616 on 7 and 459 DF, p-value: < 2.2e-16
By regressing the exponent of our response on the predictors, we got an increase in Adjusted R-squared from 0.8916 to 0.9023 while the predictors still remain significnant.
fit4 = lm(exp(Chance.of.Admit) ~ GRE.Score*University.Rating+TOEFL.Score+Research+SOP+LOR+CGPA, data=df2)
summary(fit4)
Call:
lm(formula = exp(Chance.of.Admit) ~ GRE.Score * University.Rating +
TOEFL.Score + Research + SOP + LOR + CGPA, data = df2)
Residuals:
Min 1Q Median 3Q Max
-0.254880 -0.051757 0.006082 0.056042 0.217137
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.9107422 0.3463524 -2.630 0.008838 **
GRE.Score 0.0007173 0.0012168 0.590 0.555800
University.Rating -0.3143396 0.0971980 -3.234 0.001309 **
TOEFL.Score 0.0054205 0.0012726 4.259 2.49e-05 ***
Research 0.0500247 0.0098858 5.060 6.07e-07 ***
SOP 0.0131953 0.0070659 1.867 0.062478 .
LOR 0.0288533 0.0060208 4.792 2.23e-06 ***
CGPA 0.2304806 0.0147127 15.665 < 2e-16 ***
GRE.Score:University.Rating 0.0010313 0.0003055 3.376 0.000799 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 0.08496 on 458 degrees of freedom
Multiple R-squared: 0.9061, Adjusted R-squared: 0.9045
F-statistic: 552.7 on 8 and 458 DF, p-value: < 2.2e-16
The model shows a signifcant interaction between GRE.Score & University.Rating as the p-value=0.000799 < 0.001 and is significant at the 0.001 significance level.
Interaction arises as the relationship between Chance.of.Admit and the IVs: GRE.Score and University.Rating is affected by the interaction between the GRE.Score & University.Rating. This makes it hard to predict the consequence of changing the value of GRE.Score & University.Rating without controlling for this interaction.
anova(fit3, fit4)
Analysis of Variance Table
Model 1: exp(Chance.of.Admit) ~ GRE.Score + TOEFL.Score + University.Rating +
SOP + LOR + CGPA + Research
Model 2: exp(Chance.of.Admit) ~ GRE.Score * University.Rating + TOEFL.Score +
Research + SOP + LOR + CGPA
Res.Df RSS Df Sum of Sq F Pr(>F)
1 459 3.3883
2 458 3.3061 1 0.082249 11.394 0.0007995 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
The first order model is nested within the interaction model. By using ANOVA to compare the simpler first order model vs the more complex model with interaction term, the p-value=0.0007995 is <0.001. The null hypothesis that the reduced simpler model is adequate is rejected at the 0.001 significance level. Therefore, the complex model did significantly improve the fit over the simpler model.
fit5 = lm(exp(Chance.of.Admit) ~ GRE.Score*University.Rating+TOEFL.Score+Research+LOR+CGPA, data=df2)
summary(fit5)
Call:
lm(formula = exp(Chance.of.Admit) ~ GRE.Score * University.Rating +
TOEFL.Score + Research + LOR + CGPA, data = df2)
Residuals:
Min 1Q Median 3Q Max
-0.258558 -0.049296 0.007292 0.055636 0.227183
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.9891596 0.3447277 -2.869 0.00430 **
GRE.Score 0.0008017 0.0012193 0.658 0.51118
University.Rating -0.3000927 0.0971603 -3.089 0.00213 **
TOEFL.Score 0.0056502 0.0012701 4.449 1.08e-05 ***
Research 0.0498970 0.0099124 5.034 6.92e-07 ***
LOR 0.0319995 0.0057959 5.521 5.65e-08 ***
CGPA 0.2360230 0.0144492 16.335 < 2e-16 ***
GRE.Score:University.Rating 0.0009998 0.0003059 3.268 0.00116 **
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 0.08519 on 459 degrees of freedom
Multiple R-squared: 0.9054, Adjusted R-squared: 0.904
F-statistic: 627.7 on 7 and 459 DF, p-value: < 2.2e-16
Previously, SOP was insignificant at the 0.05 significance level and even after removing it, the model’s Adjusted R-squared is still 0.904.
step <- stepAIC(fit5, direction="both")
Start: AIC=-2292.37
exp(Chance.of.Admit) ~ GRE.Score * University.Rating + TOEFL.Score +
Research + LOR + CGPA
Df Sum of Sq RSS AIC
<none> 3.3313 -2292.4
- GRE.Score:University.Rating 1 0.07753 3.4088 -2283.6
- TOEFL.Score 1 0.14364 3.4749 -2274.7
- Research 1 0.18390 3.5152 -2269.3
- LOR 1 0.22123 3.5525 -2264.3
- CGPA 1 1.93650 5.2678 -2080.4
step$anova
Stepwise Model Path
Analysis of Deviance Table
Initial Model:
exp(Chance.of.Admit) ~ GRE.Score * University.Rating + TOEFL.Score +
Research + LOR + CGPA
Final Model:
exp(Chance.of.Admit) ~ GRE.Score * University.Rating + TOEFL.Score +
Research + LOR + CGPA
Step Df Deviance Resid. Df Resid. Dev AIC
1 459 3.331267 -2292.37
A model with fewer parameters is to be preferred to one with more. AIC considers both the fit of the model and the number of parameters used. Having more parameters result in penalty. AIC helps to balance over- and under-fitting. The stepwise model comparison iteratively adds/removes variables one at a time and compares the AIC. The lowest AIC is selected for the final model. In our case, there no further addition or removal of variables required by AIC.
calc.relimp(fit5,type="lmg", rela=TRUE)
Response variable: exp(Chance.of.Admit)
Total response variance: 0.07558123
Analysis based on 467 observations
7 Regressors:
GRE.Score University.Rating TOEFL.Score Research LOR CGPA GRE.Score:University.Rating
Proportion of variance explained by model: 90.54%
Metrics are normalized to sum to 100% (rela=TRUE).
Relative importance metrics:
lmg
GRE.Score 0.257942619
University.Rating 0.166817030
TOEFL.Score 0.156223517
Research 0.069331682
LOR 0.096284401
CGPA 0.250802724
GRE.Score:University.Rating 0.002598027
Average coefficients for different model sizes:
1X 2Xs 3Xs 4Xs 5Xs
GRE.Score 0.02074929 0.01423246 0.010334576 0.007621526 0.005180304
University.Rating 0.17676303 0.09256374 0.023688521 -0.058132939 -0.151442368
TOEFL.Score 0.03744779 0.02362103 0.015694761 0.010981393 0.008588772
Research 0.32556109 0.13545663 0.083229461 0.062005291 0.053312159
LOR 0.19917859 0.09530881 0.060420345 0.047430322 0.042830876
CGPA 0.42277371 0.35790428 0.312022662 0.278861853 0.258424700
GRE.Score:University.Rating NaN NaN 0.001107569 0.001084120 0.001057216
6Xs 7Xs
GRE.Score 0.002848126 0.0008016795
University.Rating -0.237042922 -0.3000927155
TOEFL.Score 0.007195438 0.0056502353
Research 0.050658928 0.0498970045
LOR 0.038606023 0.0319994512
CGPA 0.246952223 0.2360230300
GRE.Score:University.Rating 0.001028429 0.0009997840
GRE.Score 0.257942619 University.Rating 0.166817030 TOEFL.Score 0.156223517 Research 0.069331682 LOR 0.096284401 CGPA 0.250802724 GRE.Score:University.Rating 0.002598027
Relative importance is measured by an algorithm by Lindemann, Merenda and Gold (lmg; 1980) which decomposes total R-squared and observe the increase in R-squared by adding the predictors sequentially. The order of adding predictors matters and therefore, the algorithm takes the average of the R-squared across all orderings.
The features are ranked in this order with highest relative importance first: GRE.Score, CGPA, University.Rating, TOEFL.Score, LOR, Research and GRE.Score*University.Rating.
cv_new = CVlm(data=df2, fit5, m=3, printit=FALSE)
As there is >1 explanatory variable, cross-validation
predicted values for a fold are not a linear function
of corresponding overall predicted values. Lines that
are shown for the different folds are approximate
attr(cv_new, "ms")
[1] 0.007749426
Each of the k-fold model’s prediction accuracy isn’t varying too much for any one particular sample, and the lines of best fit from the k-folds don’t vary too much with respect the the slope and level. The average mean square error of the predictions for 3 portions is 0.00775. The value is low and represents a good accuracy result.
export_summs(fit5, error_format = "[{conf.low}, {conf.high}]", digits=5)
---------------------------------------------------------
Model 1
---------------------------
(Intercept) -0.98916 **
[-1.66660, -0.31172]
GRE.Score 0.00080
[-0.00159, 0.00320]
University.Rating -0.30009 **
[-0.49103, -0.10916]
TOEFL.Score 0.00565 ***
[0.00315, 0.00815]
Research 0.04990 ***
[0.03042, 0.06938]
LOR 0.03200 ***
[0.02061, 0.04339]
CGPA 0.23602 ***
[0.20763, 0.26442]
GRE.Score:University.Rating 0.00100 **
[0.00040, 0.00160]
---------------------------
N 467
R2 0.90542
---------------------------------------------------------
*** p < 0.001; ** p < 0.01; * p < 0.05.
Column names: names, Model 1
plot_summs(fit5)
effect_plot(fit4, pred = GRE.Score, interval = TRUE, plot.points = TRUE)
[36mUsing data df2 from global environment. This could cause incorrect results if df2 has been
altered since the model was fit. You can manually provide the data to the "data ="
argument.[39m
effect_plot(fit4, pred = University.Rating, interval = TRUE, plot.points = TRUE)
[36mUsing data df2 from global environment. This could cause incorrect results if df2 has been
altered since the model was fit. You can manually provide the data to the "data ="
argument.[39m
effect_plot(fit4, pred = TOEFL.Score, interval = TRUE, plot.points = TRUE)
[36mUsing data df2 from global environment. This could cause incorrect results if df2 has been
altered since the model was fit. You can manually provide the data to the "data ="
argument.[39m
effect_plot(fit4, pred = Research, interval = TRUE, plot.points = TRUE)
[36mUsing data df2 from global environment. This could cause incorrect results if df2 has been
altered since the model was fit. You can manually provide the data to the "data ="
argument.[39m
effect_plot(fit4, pred = LOR, interval = TRUE, plot.points = TRUE)
[36mUsing data df2 from global environment. This could cause incorrect results if df2 has been
altered since the model was fit. You can manually provide the data to the "data ="
argument.[39m
effect_plot(fit4, pred = CGPA, interval = TRUE, plot.points = TRUE)
[36mUsing data df2 from global environment. This could cause incorrect results if df2 has been
altered since the model was fit. You can manually provide the data to the "data ="
argument.[39m
summary(fit5)
Call:
lm(formula = exp(Chance.of.Admit) ~ GRE.Score * University.Rating +
TOEFL.Score + Research + LOR + CGPA, data = df2)
Residuals:
Min 1Q Median 3Q Max
-0.258558 -0.049296 0.007292 0.055636 0.227183
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.9891596 0.3447277 -2.869 0.00430 **
GRE.Score 0.0008017 0.0012193 0.658 0.51118
University.Rating -0.3000927 0.0971603 -3.089 0.00213 **
TOEFL.Score 0.0056502 0.0012701 4.449 1.08e-05 ***
Research 0.0498970 0.0099124 5.034 6.92e-07 ***
LOR 0.0319995 0.0057959 5.521 5.65e-08 ***
CGPA 0.2360230 0.0144492 16.335 < 2e-16 ***
GRE.Score:University.Rating 0.0009998 0.0003059 3.268 0.00116 **
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 0.08519 on 459 degrees of freedom
Multiple R-squared: 0.9054, Adjusted R-squared: 0.904
F-statistic: 627.7 on 7 and 459 DF, p-value: < 2.2e-16
Fit: Chance.of.Admit = -0.989 + 0.000802(GRE.Score) -0.300(University.Rating) + 0.00565(TOEFL.Score) + 0.0499(Research) + 0.0320(LOR) + 0.236(CGPA) + 0.00100(GRE.Score*University.Rating) (3s.f.)
This indicates that on average, a unit increase in GRE.Score/University.Rating/TOEFL.Score/Research/LOR/CGPA/GRE.Score*University.Rating increases Chance.of.Admit by 0.000802/-0.300/0.00565/0.0499/0.0320/0.236/0.00100, while holding all other variables constant.
The p-value for the F-statistic is <2.2e-16, indicating that we can reject the null hypothesis that the intercept-only model is the same fit as the MLR model even at alpha=0.001. Therefore, the MLR model is highly statistically significant at the 0.01 significance level.
The Adjusted R-squared: 0.904 is high which suggests that the model is a highly good fit.
The coefficients for all the IVs and the interaction term (except for GRE.Score) are statistically significant at alpha=0.01 where the respective pvalues < 0.01 as we reject the null that their coeffs is 0 at the 0.01 significance level.