Project 1: M&Ms Exercise
Objective: The purpose of this lab is to learn about descriptive statistics and get acquainted with Spreadsheet program
The exercise is in two parts, the first one taking place in class and the second part is to be done as a homework exercise
Questions and Answers concerning Project 1:
There's an obvious inconsistency in the M&M data for the total number of candy in the packages, line 70..so do we continue to use this value when graphing the histogram? ( i don't know what values to use for the high and low to look for the number of bins) the 141 (the inconsistency) kind of influences the bin number drastically. Could you let me know what i should be doing? Thanks,
What do you think you should do? Think back to the discussion of the heights -- what to do with the 80", 90" and 999". One is clearly not possible and probably should be thrown out. The other two are possible -- although not probable. There are measures that you can use that are not sensitive to the extreme values.
As for specificaly the histogram, you can determine the number of bins without the large value and then hand plot the large value.
In general, this is the type of judgment call that you will need to make as a practioner doing statistics with a dataset. Use your best judgment.
Where can i find the how to graph the histogram on excel? Thanks.
There are two ways to create the histogram chart:
1. Starting from the data.
Excel has a histogram function within its "data analysis" functionality. "Data analysis" should appear under the tools menu -- if not, you willneed to get them installed. Either read the help (installation typically does not require disks) or ask a technical support person.
Go to data analysis, then select histogram. You will be prompted for the range of input data and the bin range. You create the bin range by creating a column of numbers representing the endpoints of the bins. For example, if you wanted to have three bins as follows 0 to 1, 1 to 2, and 2 to 3, you would create a column of numbers with 0,1,2, and 3. You would then provide the locations of these numbers as the bin range (e.g., b1..b4).
Once you have provided the histogram function with the data range and the bin range, if you press enter - you will probably just get a frequency distribution. To get a histogram, you need to check the box associated with "chart output" on the histogram screen. If the chart output option is selected, you should get both a frequency distribution and a histogram.
2. Starting from frequency distribution (table of ranges and number of data points in range):
If you already have the frequency distribution and need to create a histogram, you can use the graphing features of Excel. Under the "insert" menu is the option "chart". You want to insert a column chart. Excel should walk you through the process of setting up the chart.
How should I work with the excel graphs and figures when creating the final report? Do I need to integrate them into the text? Can I include them in an appendix?
It would be terrific to get reports that integrate the text and the figures (histograms, stem and leaf diagrams, box plots, etc.). However, incorporating the figures and tables into the text can be time consuming.
An alternative, easier, and acceptable solution is to include all the figures and tables in an appendix and reference them from the text. If this approach is taken, then the figures and tables should be clearly marked with "Figure x...." or "Table y. ....". Thus, the figures and tables can be (and should be) referenced in the report text.
I have a question regarding calculation of whisker values for box plots. The book indicates the calculation as follows:
lower whisker = q1-1.5IQR with the whisker extending to the smallest data point within
From class I understood that
lower whisker = median-1.5IQR with the whisker extending to the smallest data point
within that range.
Which is correct??
Good question! The answer is that you are right (and I was mistaken). The equations you have above for lower and upper whisker boundaries (using q1 and q3 respectively rather than q2) are correct. This was a mistake in the lecture.
If you have created box plots using the other equations, please do not feel a need to change them. They will be considered correct if completed using the former equations.
I was wondering what is meant by "one color information" in "graphical description of total numbers and one color information". Thanks.
It means that there should be a graphical description of the total number of candies and then also a graphical description of either blue or red or yellow or green or brown. Logically, this would be best if the graphical description were either of the blue (since you were required to analyze it) or the other color you analyzed.