SOM Logo

Empirical Study

III HOME

OVERVIEW

FIRST STEPS

EMPIRICAL STUDY

LITERATURE REVIEW

RESEARCH

FORMS

 

Dealing with Data

Spreadsheet or database? The first thing to do once you have collected your data is to enter it into a database or spreadsheet. What's the difference? For almost all III purposes, the two are equivalent for information retrieval and storage, and both can be easily imported to SPSS for analysis. Databases are a little more time-consuming to set up but offer the great advantage of preventing some kinds of data entry and manipulation errors. In a spreadsheet, each cell is independent, so if you accidentally hit the up arrow during data entry, you could overwrite existing data. The independence of each cell also means that if you do any kind of sorting, calculating, etc. its possible to leave cells out of the operation if you're not careful. A database minimizes opportunities for these kinds of errors by treating rows or columns as units rather than as strings of independent cells.

Use a spreadsheet when the dataset is relatively small (<100 subjects, <20 variables). In a small dataset, errors are easier to catch and correct. If you are working with large amounts of data, and/or if your data may be used by other investigators in the future, it may be worth the extra effort to use a database. If you use a spreadsheet, avoid any manipulations other than data entry. Manipulations to the data are best done in SPSS where there's no chance of them corrupting the raw data. Save the data as an Excel 4.0 worksheet. At this time SPSS can't open workbook files.

Shortcut to creating an Access database

Data format. Regardless of which kind of data storage system you choose, the format will be the same:

  • 1 subject per row
  • 1 piece of information per column
  • Use the topmost row of a spreadsheet for the column heading
  • Use the first column for subject IDs
  • If a measurement is repeated several times with the same subject, record each occasion of measurement in a different column using a unique column heading
  • If more than 1 response to a survey question is an option, give every possible response its own column and treat each one as a "yes or no" question. Here's an example:
    • I take a daily dose of (choose all that apply):
      • NSAIDs
      • nutritional supplements
      • prescription drugs
    This question would end up occupying 3 columns: NSAIDs - yes or no, nutritional supplements - yes or no, and prescription drugs - yes or no
  • Enter a value for each variable, if possible. For example, if a column is headed "history of hypoglycemia", enter the "no"s not just the "yes"s. Leave cells blank only if the information is missing.
  • Categorical variables can be recorded as words or numbers. Words are more meaningful and can usually be analyzed just fine in SPSS. Both Access and Excel are smart enough to fill in frequently used words once the first letter is entered, so text entries don't mean more typing.
  • If a column contains numerical data, don't include any text. SPSS will think this is a mistake and treat it as missing data.
  • Be consistent in how data is entered. SPSS won't know that "male" and "man" mean the same thing.

A good example:

Subject ID Sex Age Group Pain-pretreatment Pain-posttreatment Satisfied?
1 M 43 control 5.5 4.2 yes
2 M 75 control 6.8 6.1 yes
3 F 68 control 5.2 4.9 no
4 M 72 experimental 6.3 5.4 yes
5 M 49 experimental 7.2 5.9 yes
6 M 55 control 6.7 6.8 no
7 M 47 experimental 5.2 5.1 yes
8 F 70 experimental 6.3 5.9 no
9 F 43 experimental 6.6 5.3 yes
10 M 56 control 5.9 5.5 no

A bad example:

        Pain  
Subject ID Sex Age Group pre-post Satisfied?
1 1 43? cont 5.5 4.2 yes
2 1 75 control 6.8 6.1 yes
3 2 68 C 5.2 4.9 no
4 1 72 exp 6.3 5.4 yes
5 1 49 ex 7.2 5.9 yes
6 1 55 control 6.7 6.8 no
7 m 47 experimental 5.2 5.1 yes
8 1 70 E 6.3 5.9 no
9 2 43 experimental 6.6 5.3 yes
10 1 56 control 5.9 5.5 no