What the Biostatistician Needs to Know about Molecular Biology

Data mining for haplotypes

represent haplotypes as regular expressions with

limited gaps, e.g. (-,-,-,0,0,*,-,-)

look for "overrepresented" haplotypes--occurring

more than a certain # of times in the cases (note--

not a statistical comparison, but a fixed threshold)

observe that any sub-haplotype of an

overrepresented haplotype (introducing more *'s)

is also overrepresented, and recurse to find the

maximal overrepresented haplotypes