Explore BrainMass

Exploratory Data Analysis for Assess Parental Relationships

Exercise 2 - Exploratory Data Analysis
Constructing Scales to Assess Parents Relationships With Their Parents

Introduction: This exercise is designed to illustrate both exploratory data analysis (EDA) and the construction of scales from a set of survey items. The scales are designed to examine "Mother's Feelings About Her Own Mother" (Items MOF1 to MOF14) and "Father's Feelings About His Own Father" (Items FAF1 to FAF14). The SPSS Variable Labels provide the exact wording of the scale items. SPSS Value Labels indicate that the valid values range from 1 to 5, with six (6) meaning, "don't know." I will illustrate the process with the MOF scale. You will duplicate that process with the FAF scale.

Exploratory data analysis can be extremely complex; however, there are four main purposes of "Pre-Analysis Data Screening." These are, assessing the accuracy of the data, dealing with missing data, assessing the effects of outliers, and assessing the adequacy of the fit between data characteristics and the assumptions of a statistical procedure. In that context, we will also construct a scale and assess the reliability of that scale.

Step 1: Examine the items for accuracy.

A. I have used the DESCRIPTIVES procedure to examine the distribution of all of the items in the MOF scale and provided the output below. (There are 14 items in this scale, MOF1 TO MOF14. Ignore the MOF items with other labels for now.) What are "valid" responses and what codes have been assigned as missing values? Examine the minimum and maximum values columns of the printout. Are there any responses that appear "out-of-range?" If so, you would need to "fix" these, either by locating the original questionnaire and entering the correct code or by assigning that response as missing for the case in question. (You can find the case in question by looking through the data file or by searching for the ID number of the case that has that value for that item. (The "Select Cases" procedure, in the DATA menu would be used for that purpose.) For example, what is the case ID number for the case that has a maximum value of 55 on MOF06? This is clearly an "out or range" value.) Since you do not have the original questionnaires, your only option for fixing out of range items with this data file would be to assign that item as missing for the case containing the error.

Descriptive Statistics

N Minimum Maximum Mean Std. Deviation
My mom talked with me about her relationship with or feelings about her father. 650 1.00 5.00 2.9523 1.23478
My mother loved her father very much. 579 1.00 5.00 4.3506 1.05868
My mother felt warm and safe when she was with her father. 492 1.00 5.00 4.0447 1.25312
My mother felt as though she did not know her father. 534 1.00 5.00 1.9569 1.27761
My mother's father had a negative influence on her life. 541 1.00 5.00 1.9482 1.31695
My mother and her father enjoyed being together. 511 1.00 55.00 4.0645 2.58231
My mother was disappointed with her father. 528 1.00 5.00 1.9621 1.30772
My mother felt close to her father. 527 1.00 5.00 3.9620 1.31042
My mother felt tense and "on guard" when her father was around. 480 1.00 5.00 1.8500 1.22704
My mother looked up to her father. 538 1.00 5.00 3.9944 1.29817
My mother hated her father. 556 1.00 5.00 1.3759 .93427
My mother was afraid of her father. 538 1.00 5.00 1.6097 1.07626
My mother missed her father when he was away. 485 1.00 5.00 3.9423 1.25670
My mother's relationship with her father had a big effect on my life. 511 1.00 99.00 3.2460 4.49504
Valid N (listwise) 319

In the above example, both the 55 and the 99 are invalid. I changed the 55 to a 5 and the 99 to a 9 and reran the analysis. Note also that the number of complete cases across this set of items is only 319 (Valid N on the printout), despite the fact that some items have as many as 650 complete cases. (This illustrates the problems created by scales that don't get complete responses from every case. In this example, over half the sample is lost.)

Step 2 - Reverse score any items that are negatively worded.

In this scale (and the father scale) a higher score represents more positive feelings about the respondent's maternal grandfather (i.e., the respondent's mother's father). Therefore, any item that reflects a negative sentiment must be reverse scored. In SPSS we call this RECODING. (This is also sometimes called "reflecting" an item. For example, the item, "My mother was afraid of her father." (MOF12) must be recoded, as follows:

mof12 My mother was afraid of her father.

Frequency Percent Valid Percent Cumulative Percent
Valid 1.00 never 368 53.7 68.4 68.4
2.00 seldom 78 11.4 14.5 82.9
3.00 occasionally 49 7.2 9.1 92.0
4.00 frequently 20 2.9 3.7 95.7
5.00 almost always 23 3.4 4.3 100.0
Total 538 78.5 100.0
Missing 6.00 don't know 142 20.7
9.00 5 .7
Total 147 21.5
Total 685 100.0

mof12r Reverse Scored mof12

Frequency Percent Valid Percent Cumulative Percent
Valid 1.00 23 3.4 4.3 4.3
2.00 20 2.9 3.7 8.0
3.00 49 7.2 9.1 17.1
4.00 78 11.4 14.5 31.6
5.00 368 53.7 68.4 100.0
Total 538 78.5 100.0
Missing System 147 21.5
Total 685 100.0

MOF12R Reverse Scored MOF12

Frequency Percent Valid Percent Cumulative Percent
Valid 1.00 23 3.4 3.4 3.4
2.00 20 2.9 2.9 6.3
3.00 49 7.2 7.2 13.4
4.00 78 11.4 11.4 24.8
5.00 368 53.7 53.7 78.5
6.00 142 20.7 20.7 99.3
9.00 5 .7 .7 100.0
Total 685 100.0 100.0

Note that there are two versions of the above distributions and that both do not have labels. What is the difference between the two versions? First, note that the labels do not automatically follow the recode into a new variable. You must add these so your printout looks like the one below. You must also carefully examine the recoded variable to track the treatment of missing values. In this case we definitely wish to keep the 'missing' separate from the 'don't know,' thus each should have its own missing value code (i.e., Six and 9 in the version below).

MOF12R Reverse Scored MOF12

Frequency Percent Valid Percent Cumulative Percent
Valid 1.00 almost always 23 3.4 4.3 4.3
2.00 frequently 20 2.9 3.7 8.0
3.00 occasionally 49 7.2 9.1 17.1
4.00 seldom 78 11.4 14.5 31.6
5.00 never 368 53.7 68.4 100.0
Total 538 78.5 100.0
Missing 6.00 don't know 142 20.7
9.00 5 .7
Total 147 21.5
Total 685 100.0

Step 3 - Evaluate the items in terms of their overall contribution to the scale. Use the SPSS RELIABILITY command to do this.



• SCALE MEAN IF ITEM DELETED: The mean of the scale without that item.

• SCALE VARIANCE IF ITEM DELETED: variance of the scale without that item.

• CORRECTED ITEM-TO-TOTAL CORRELATION: The correlation of that item with the sum of the other items. (Corrected means that the item was taken out before calculating the sum. The higher the better.)

• SQUARED MULTIPLE CORRELATION: The correlation of all of the other items (as independent variables) with that item (as a dependent variable). Tells us how well the other items can predict the value of that item. The higher the better.)

• CRONBACH'S ALPHA IF ITEM DELETED: What would be the value of Cronbach's Alpha if we took that item out of the scale? (If Alpha increases it means the item does not contribute to the reliability of the scale. If Alpha decreases it means the item does contribute to the reliability of the scale.)

Note that items #1 and #14 appear to have much lower item-to-total correlations and squared multiple correlations than the other items. If you examine the content of these items the lower values might make sense because they (i.e., the items or questions) do not necessarily reflect positive or negative sentiments about the father, only that the father had an effect. Thus, we might consider eliminating these items. This has been done below. Note that due to the large number of missing values across the items there are only 319 valid cases. The issue of imputation may need to be considered at a later time. Below I present a printout that was created by the COUNT procedure in SPSS. I counted the number of user and system missing values in the set of 12 remaining items. The printout shows the number of missing items for the 12-item scale, after eliminating items #1 & #14. In other words, the printout indicates that 341 cases have no missing values, 82 have 1 item missing, etc. (This is an example, not something I want you to do.)

mofmiss2 Missing Values in MOF Scale wo 1 14

Frequency Percent Valid Percent Cumulative Percent
Valid .00 341 49.8 49.8 49.8
1.00 82 12.0 12.0 61.8
2.00 32 4.7 4.7 66.4
3.00 33 4.8 4.8 71.2
4.00 21 3.1 3.1 74.3
5.00 25 3.6 3.6 78.0
6.00 21 3.1 3.1 81.0
7.00 14 2.0 2.0 83.1
8.00 13 1.9 1.9 85.0
9.00 19 2.8 2.8 87.7
10.00 16 2.3 2.3 90.1
11.00 18 2.6 2.6 92.7
12.00 50 7.3 7.3 100.0
Total 685 100.0 100.0

Step 4 - Compute the MOF scale score using the SPSS COMPUTE procedure and run its frequency distribution with the statistics shown below.

Step 5 - Examine the new scale for outliers and normality using the EXPLORE procedure.


MOF Scale with imputed MV Stem-and-Leaf Plot

Frequency Stem & Leaf

15.00 Extremes (=<21)
10.00 2 . 4&
15.00 2 . 67&
28.00 3 . 1233344
36.00 3 . 56678899
24.00 4 . 002344&
33.00 4 . 55667789
80.00 5 . 0011122223334444444
148.00 5 . 555555666666677777888888888999999999
99.00 6 . 0000000000000000000000000

Stem width: 10.00
Each leaf: 4 case(s)
& denotes fractional leaves.

The only reasonable conclusion one could make after examination of this scale is that most of the sample has extremely positive attitudes about their mother's relationship with her father. Transforming this variable to make it "normal" would probably be unreasonable and unsuccessful. This suggests that one should utilize statistics that do not demand that the distribution be normal or that are "robust" relative to the violation of that assumption.

YOUR ASSIGNMENT: Complete all the analyses that I modeled above; however, use the FAF items # 1 - #14 instead of the items I used. After computing the FAF scale, run the SPSS Explore procedure to reproduce the output that I show above for the MOF scale. Produce a write-up that describes your procedures, your rationale for excluding any of the 14 items from your scale and the results of your analysis of the distribution and characteristics of the scale. Write this as a clear narrative, with an introduction that describes clearly what you are doing and why you are doing it. Use topic sentences that describe each section of the analysis, what you did and your conclusions based on the statistical analyses.

You DO NOT need to use APA format. Cut and paste SPSS printout into a Word document. Always put your last name in the file name, with the assignment number.

The screens from SPSS shown below are designed to assist you with selecting the correct options and statistics to use with your SPSS RECODE and RELIABILITY procedures.


Solution Summary

The solution constructs scales to assess parents relationships with their parents. An exploratory data analysis is given.