# Summary statistics, t test, Regression and Correlation

You are hired as a statistical analyst for Silver's Gym and your boss wants to examine the relationship between body fat and weight in men who attend the gym. After compiling the data for weight and body fat of 252 men who attend Silver's Gym, you find it relevant to examine the statistical measures and to perform hypothesis tests and regression analysis to help make general conclusions for body fat and weight in men.

Part I: Statistical Measures

Statistics is a very powerful topic that is used on a daily basis in many situations. For example, you may be interested in the age of the men who attend Silver's Gym. You could not assume that all men are the same age. Thus, it would be an inaccurate measure to state that "the average age of men who attend Silver's Gym is the same age as me."

Averages are only one type of statistical measurements that may be of interest. For example, your company likes to gauge sales during a certain time of year and to keep costs low to a point that the business is making money. These various statistical measurements are important in the world of statistics because they help you make general conclusions about a given population or sample.

To assist in your analysis for Silver's Gym, answer the following questions about the Body Fat Versus Weight data set.

Calculate the mean, median, range, and standard deviation for the Body Fat Versus Weight data set. Report your findings, and interpret the meanings of each measurement. Notice you are to calculate the mean, median, range, and standard deviation for the body fat and for the weight. What is the importance of finding the mean/median? Why might you find this information useful? In some data sets, the mean is more important than the median. For example, you want to know your mean overall grade average because the median grade average would be meaningless. However, you might be interested in a median salary to see the middle value of where salaries fall. Explain which measure, the mean or the median, is more applicable for this data set and this problem. What is the importance of finding the range/standard deviation? Why might you find this information useful?

Part II: Hypothesis Testing

Organizations sometimes want to go beyond describing the data and actually perform some type of inference on the data. Hypothesis testing is a statistical technique that is used to help make inferences about a population parameter. Hypothesis testing allows you to test whether a claim about a parameter is accurate or not.

Your boss makes the claim that the average body fat in men attending Silver's Gym is 20%. You believe that the average body fat for men attending Silver's Gym is not 20%. For claims such as this, you can set up a hypothesis test to reach one of two possible conclusions: either a decision cannot be made to disprove the body fat average of 20%, or there is enough evidence to say that the body fat average claim is inaccurate.

To assist in your analysis for Silver's Gym, consider the following steps based on your boss's claim that the mean body fat in men attending Silver's Gym is 20%:

First, construct the null and alternative hypothesis test based on the claim by your boss.

Using an alpha level of 0.05, perform a hypothesis test, and report your findings. Be sure to discuss which test you will be using and the reason for selection. Recall you found the body fat mean and standard deviation in Part I of the task.

Based on your results, interpret the final decision to report to your boss. Parts I-II: Review and revise your individual project from last week. You must include parts I and II from Individual Project #4 as they will be graded again. Then, add the following responses to your document:

Part III: Regression and Correlation

Based on what you have learned from your research on regression analysis and correlation, answer the following questions about the Body Fat Versus Weight data set:

When performing a regression analysis, it is important to first identify your independent/predictor variable versus your dependent/response variable, or simply put, your x versus y variables. How do you decide which variable is your predictor variable and which is your response variable? Based on the Body Fat Versus Weight data set, which variable is the predictor variable? Which variable is the response variable? Explain.

Using Excel, construct a scatter plot of your data. Using the graph and intuition, determine whether there is a positive correlation, a negative correlation, or no correlation. How did you come to this conclusion?

Calculate the correlation coefficient, r, and verify your conclusion with your scatter plot. What does the correlation coefficient determine? Add a regression line to your scatter plot, and obtain the regression equation.

Does the line appear to be a good fit for the data? Why or why not? Regression equations help you make predictions. Using your regression equation, discuss what the slope means, and determine the predicted value of body fat (y) when weight (x) equals 0. Interpret the meaning of this equation.

Part IV: Putting it Together

Your analysis is now complete, and you are ready to report your findings to your boss. In one paragraph, summarize your results by explaining your findings from the statistical measures, hypothesis test, and regression analysis of body fat and weight for the 252 men attending Silver's Gym.

need references

https://brainmass.com/statistics/students-t-test/summary-statistics-t-test-regression-correlation-510718

#### Solution Preview

Please see attached document for the full report.

Part I: Statistical Measures

Body Fat

Mean 18.93849

Median 19

Standard Deviation 7.750856

Range 45.1

Body Weight

Mean 178.9244

Median 176.5

Standard Deviation 29.38916

Range 244.65

The mean (or average) is a measurement of arithmetic middle. It is the 'socialist' value in the sense that people that have "too much" subsidise the values of people that have "too little" and in the end everybody has the same THE AVERAGE. So in a world that had no dispersion, where everybody was the same, everybody would be the AVERAGE (mean).

The median, on the other hand, is the middle value position wise; i.e. we put people stand in an order according to their body fat measurement and we choose the middle person. For this reason, the median gives a 'fairer' statistic in the sense that it does not get affected by extreme values (or any values at all since it only takes into account the middle value).

For this data, the median might be more appropriate, since we do not want to state a 'socialist' value but rather a middle value not affected by extreme values in either side. The range shows the full dispersion of the data from minimum to maximum values, whereas the standard deviation shows an average dispersion of data around the mean, i.e. it takes the differences of all data from the mean and calculates a form of the average of these ...

#### Solution Summary

This solution contains a detailed step-by-step explanation of the calculation of summary statistics, hypothesis testing using t test, correlation and regression. A thorough demonstration of all calculations using a numerical example is included.