Project 1 Description
Home ] Up ] [ Project 1 Description ] Project 1 Data Collection ] Project 1 Data ] Project 2 Description ] Project 2 - Survey Data ] Project 3 ]


Project 1: M&M’s Exercise

Objective: The purpose of this lab is to learn about descriptive statistics and get acquainted with Spreadsheet program


The exercise is in two parts, the first one taking place in class and the second part is to be done as a homework exercise

Part I:

  1. During class you will be given a medium sized brown M&M’s package. The contents of the package are to be using in an experiment, and are not to be eaten until you are told you may do so.
  2. Count the number of each color in the package and record on your data collection sheet below. Also, count any defect tablets, like broken or without the ‘m’ printed on the tablet. Keep your data sheet for use in Part II.
  3. Your instructor will now collect the data from the class.

Part II:

  1. You need to obtain the data file of the information collected in class from the ENGR 315 WWW page.
  2. Your task is to report on the observations of you and your classmates. Your report should at least include:
    - Mean and standard deviation of total number of candies in the package
    - Mean and standard deviation of two colors of the candies (blue and one other color)
    - Graphical description of total numbers and one color information
  3. Return a report of your observations using some of the graphical methods described in chapter 2. Comment on whether or not your package was "unusual"

Using a computer to assist in these computations is acceptable.


Questions and Answers concerning Project 1:

Question 1:
I was wondering if the total defective M&M's were included in the total total column or if I need to add the two columns together to get the total number in the package.

Answer 1:
You do not need to add the two columns together. The "total" columns collectively account for the entire bag. The counts in the defective columns represent subsets of the counts in the totals columns.  It is good that you asked the question -- in general it is alway important to clarify the meaning of your data, when performing data analysis.

Question 2
>When you compute the sample mean for the number of candies in the package, do you include the # of defectives in the total? Thanks.

Answer 2:
- The answer to your question is "it depends."
- Specifically, it depends on the question that you are trying to answer through the analysis. Do you want to know the total number of candies in the package (the numbers shown in the figure 6 data colums). Or, are interested only in the number of non-defective candies (the numbers in the first 6 columns minus the numbers of defective candies in the remaining columns). The defective candies are still edible, so as a consumer it might not matter. If you were a quality control person, however, you might want to focus on the number of defective candies.
- This is for you to decide (and explain in your report). In this case, there is no right answer - only justified and explained ones.

Question 3:
I got the data into excel following the directions on the web, but I don't know how to make it perform mean calculations and standard deviations. Could you outline Excel usage on the web page please.

Answer 3
Here are some tips:

  1. Excel can help you with the steps. Follow these steps: 1. Put your cursor in a particular cell, 2. Select "insert function" from the tools menu. 3. Choose the function you want (most of the ones we use are under the category statistical. However you might want to scroll through the list and see what is there). 4. Follow the directions excel provides for specifying the parameters of the function. 5. Press return -- you should see the results of your function call.
  2. Once you have completed this -- you might want to put the cursor back in the cell and look at the contents of the cell as listed at the top of the excel window. You will see the results of the function call in the cell. You will see the function call, at the top. Once you get the hang of it, you can simply type the function call directly - rather than going through the "Insert function" process.
  3. For example, assume that you pasted the function "median" into cell a11, with the median referring to the data in cell range a1..a10. In cell a11, you will see the result of the median function -- the median of your ten datapoints. If you put your cursor in cell a11, and look to the top of the excel screen - you will see how the function call is formatted. Specifically, it will look like "=median(a1..a1)."

Question 4

There's an obvious inconsistency in the M&M data for the total number of candy in the packages, line do we continue to use this value when graphing the histogram? ( i don't know what values to use for the high and low to look for the number of bins) the 141 (the inconsistency) kind of influences the bin number drastically. Could you let me know what i should be doing? Thanks,

Answer 4

What do you think you should do? Think back to the discussion of the heights -- what to do with the 80", 90" and 999". One is clearly not possible and probably should be thrown out. The other two are possible -- although not probable. There are measures that you can use that are not sensitive to the extreme values.

As for specificaly the histogram, you can determine the number of bins without the large value and then hand plot the large value.

In general, this is the type of judgment call that you will need to make as a practioner doing statistics with a dataset. Use your best judgment.

Question 5

Where can i find the how to graph the histogram on excel? Thanks.

Answer 5

There are two ways to create the histogram chart:

1. Starting from the data.

Excel has a histogram function within its "data analysis" functionality. "Data analysis" should appear under the tools menu -- if not, you willneed to get them installed. Either read the help (installation typically does not require disks) or ask a technical support person.

Go to data analysis, then select histogram. You will be prompted for the range of input data and the bin range. You create the bin range by creating a column of numbers representing the endpoints of the bins. For example, if you wanted to have three bins as follows 0 to 1, 1 to 2, and 2 to 3, you would create a column of numbers with 0,1,2, and 3. You would then provide the locations of these numbers as the bin range (e.g., b1..b4).

Once you have provided the histogram function with the data range and the bin range, if you press enter - you will probably just get a frequency distribution. To get a histogram, you need to check the box associated with "chart output" on the histogram screen. If the chart output option is selected, you should get both a frequency distribution and a histogram.

2. Starting from frequency distribution (table of ranges and number of data points in range):

If you already have the frequency distribution and need to create a histogram, you can use the graphing features of Excel. Under the "insert" menu is the option "chart". You want to insert a column chart. Excel should walk you through the process of setting up the chart.

Question 6:

How should I work with the excel graphs and figures when creating the final report?  Do I need to integrate them into the text?  Can I include them in an appendix?

Answer 6

It would be terrific to get reports that integrate the text and the figures (histograms, stem and leaf diagrams, box plots, etc.).  However, incorporating the figures and tables into the text can be time consuming. 

An alternative, easier, and acceptable solution is to include all the figures and tables in an appendix and reference them from the text.  If this approach is taken, then the figures and tables should be clearly marked with "Figure x...." or "Table y.  ....".  Thus, the figures and tables can be (and should be) referenced in the report text. 

Question 7

I have a question regarding calculation of whisker values for box plots. The book indicates the calculation as follows:

lower whisker = q1-1.5IQR with the whisker extending to the smallest data point within that range.
upper whisker = q3+1.5IQR with the whisker extending to the largest data point within that range.

From class I understood that

lower whisker = median-1.5IQR with the whisker extending to the smallest data point within that range.
upper whisker = median-1.5IQR with the whisker extending to the largest data point within that range.

Which is correct??

Answer 7

Good question!  The answer is that you are right (and I was mistaken).  The equations you have above for lower and upper whisker boundaries (using q1 and q3 respectively rather than q2) are correct.  This was a mistake in the lecture.

If you have created box plots using the other equations, please do not feel a need to change them. They will be considered correct if completed using the former equations.  

Question 8

I was wondering what is meant by "one color information" in "graphical description of total numbers and one color information". Thanks.

Answer 8

It means that there should be a graphical description of the total number of candies and then also a graphical description of either blue or red or yellow or green or brown. Logically, this would be best if the graphical description were either of the blue (since you were required to analyze it) or the other color you analyzed.