1. Many Holmes Institute instructors believe that students need to spend at least 2 hours studying outside of class for every hour of lecture. They believe that the number of hours students study to prepare for the exam affect students’ marks significantly. As opposed, few of the lecturers believe that the number of preparation hours do not essentially affect students’ marks while some other factors are to be considered. To study the relationship between the preparation time spent by each student (in hours) for the exam and the reported mark, a sample of 100 students were selected randomly from a large statistics class. The data are stored in the file named “ASSIGNMENTDATA” in the course website.

**Answer below 9 questions: (22 marks)**

a. What type of survey method could be used? Explain your answer. (1.5 mark)

b. What sampling method could be used to select the sample? Explain your answer. (1.5 mark)

c. On the basis of given data, determine the dependent and independent variables we should use, and why? Also, identify the data type(s) for each variable. (2 marks)

d. What kind of issues we may face in collecting the data using this type of survey method? List and explain two cases. (1 mark)

e. Using 8 classes and intervals of 20 - 30, 30 - 40, etc for both of the variables selected in question 3, develop a distribution table including class intervals, frequency, relative frequency and cumulative relative frequency for each variable. Then, draw frequency histogram, relative frequency histogram and cumulative relative frequency histogram for each variable. Also, Comment on the shape of frequency histogram for each variable and provide reason(s) for your comment. (5.5 marks)

f. Draw and use an appropriate scatter plot to investigate the relationship between the two variables. Also, briefly explain the selection of each variable on the X and Y axes and the reason? Finally, draw the fitting line for the plotted observations. (2.5 marks)

g. Present the equation of the estimated fitting line (regression) in your answer to Question f. Then, estimate the effect of an increase in the independent variable by one unit on the dependent variable. (2.5 marks)

h. Prepare a numerical summary report about the data on the two variables by including the mean, median, range, variance, standard deviation, smallest and largest values, quartiles, interquartile range and the 30th percentile for each variable. (3.5 marks)

i. Compute a numerical measurement which measures the strength and direction of the linear relationship between the two variables. Also, interpret this value. (2 marks)

2. To determine whether or not the height of sonsis related to father’s height (x1) and mother’s height (x2), data were gathered and part of the multiple regression excel output is shown below. Fill the table and answer the following questions. (8 marks)

a. What is the standard error of estimate? What does this statistic tell you? (0.5 mark)

b. What is the coefficient of determination? What does this statistic tell you? (1 mark)

c. What is the adjusted coefficient of determination for degree of freedom? What do this statistic and the one referred to in part (b) tell you about how well the model fits the data (1 mark)

d. Test the overall utility of the model. What does the test result tell you? (1.5 marks)

e. Interpret each of the coefficients. (2 marks)

f. Do these data allow the statistic practitioner to infer that the heights of the sons and the fathers are linearly related? (1 mark)

g. Do these data allow the statistic practitioner to infer that the heights of the sons and the mothers are linearly related? (1 mark)