Share
Explore BrainMass

Hypothesis Testing: The Effectiveness of Ads

It's almost decision time, and the stakes are huge. With astronomical TV advertising costs per minute of airtime, it's been worthwhile to do some preliminary work so that nothing is wasted. In particular, you've been helping manage an effort to produce 22 ads for a personal hygiene product, even though only just a few will ever actually be shown to the general public. They have all been tested and ranked using the responses of representative consumers who were each randomly selected and assigned to view one ad, answering questions before and after. A composite score from 0 to 10 points, representing both recall and persuasion, has been produced for each consumer in the sample.

At your firm, the ads traditionally have been ranked using the average composite results, and the highest have run on nationwide TV. Recently, however, statistical hypothesis testing has been used to make sure that the ad or ads to be run are significantly better than a minimum score of 3.5 points.

Everything looks straightforward this time, with the two best ads scoring significantly above the minimum. The decision meeting should be straightforward, with Country Picnic the favorite for the most airtime and Coffee Break as an alternative. Following are the summaries, sorted in descending order by average composite score. The number of consumers viewing the ad is n. The p-values are from one-sided hypothesis tests against the reference value 3.5, computed separately for each ad.

AD n avg stDev stdErr t p
Country Picnic 49 3.95 0.789 0.113 3.895 0.0001
Coffee Break 51 3.70 0.744 0.104 1.921 0.0302
Anniversary 51 3.66 0.934 0.131 1.214 0.1153
Ocean Breeze 49 3.63 0.729 0.104 1.255 0.1078
Friends at Play 56 3.62 0.896 0.120 0.969 0.1683
Tennis Match 56 3.60 0.734 0.098 1.037 0.1521
Walking Together 51 3.57 0.774 0.108 0.687 0.2476
Swimming Pool 52 3.56 0.833 0.116 0.532 0.2984
Shopping 49 3.54 0.884 0.126 0.355 0.3619
Jogging 47 3.54 0.690 0.101 0.423 0.3372
Family Scene 54 3.54 0.740 0.101 0.404 0.3438
Mountain Retreat 49 3.53 0.815 0.116 0.298 0.3836
Cool & Comfortable 52 3.52 0.780 0.108 0.195 0.4229
Coffee Together 53 3.52 0.836 0.115 0.148 0.4415
City Landscape 47 3.51 0.756 0.110 0.058 0.4770
Friends at Work 53 3.50 0.674 0.093 0.020 0.4919
Sailing 48 3.49 0.783 0.113 -0.055 0.5219
Desert Oasis 55 3.48 0.716 0.097 -0.226 0.5890
Birthday Party 50 3.48 0.886 0.125 -0.175 0.5693
Weekend Brunch 53 3.45 0.817 0.112 -0.437 0.6681
Home from Work 55 3.35 0.792 0.107 -1.430 0.9207
Windy 47 3.34 0.678 0.099 -1.593 0.9410

Thinking it over, you have some second thoughts. Because you want to really understand what the decision is based on, and because you remember material about errors in hypothesis testing from a course taken long ago, you wonder. The probability of a type I error is 0.05, so you expect to find about one ad in 20 to be significantly good, even if it isn't. That says that sometimes none would be significant, yet other times more than one could reasonably be significant.

Your speculation continues: Could it be that decisions are being made on the basis of pure randomness? Could it be that consumers, on average, rate these ads equally good? Could it be that all you have here is the randomness of the particular consumers who were chosen for each ad?

You decide to run a computer simulation model, setting the population mean score for all ads to exactly 3.5. Hitting the recalculation button on the spreadhseet 10 times, you observe that 3 times no ads are significant, 5 times one ad is significant, once two ads are, and once 3 ads are significant. Usually the significant ads are different each time. Even more troubling, the random simulated results look a lot like the real ones that are about to be used to make real decisions.

What is your interpretation of the effectiveness of the ads in this study? What would you recommend in this situation?

Solution Summary

This is a discussion on whether a marketing system is working.

$2.19