A Biological Information Resource for Students

Because of the massive amount of information about molecular sequences, structures and functions that is currently being generated by research programs such as the Human Genome Project, it is critical for all students in the life sciences to learn how to access, manipulate and comprehend this biological information. To facilitate this, we propose to develop a "Biological Information Resource" for students at the University of Washington. Database and software servers would be purchased and installed at the Locke Computer Center and Health Sciences Library to provide general access for undergraduate and graduate students to centralized biological sequence databases and software programs to interact with these databases. Software and site-licenses would be purchased to provide students access to comprehensive analytical and database management software packages. The resource would be accessible through existing campus computers and through Dial-IP from off campus. Two bioinformatics workstations would also be established in the Health Sciences Library and an additional workstation would be set up in the Department of Pathobiology for software development and bioinformatics training. The Biological Information Resource would provide a much needed resource for all students engaged in biological research and study.

Background

There has been a massive increase in genetic, molecular sequence and structural information available for the study of the biology of life and disease. New hardware and software development (in particular, the Internet and World Wide Web) provide access to diverse databases containing this information. Learning how to access and manipulate this information is critical for education and research in the life sciences. A course on "Bioinformatics and Gene Sequence analysis" (PABIO536) was established last year to teach how to access, analyze and understand biological sequence information. PABIO536 has drawn undergraduate and graduate students from 18 different departments. Applications for positions in the class have exceeded the maximal capacity, indicating a strong interest and need for this type of instruction by the students at the University. Because of the wide disparity in platform-specific (ie. Macintosh, DOS/Windows, and UNIX) analytical programs available, and high cost, students do not have access to necessary software outside of the classroom. Although centralized databanks, such as those maintained by the National Center for Biotechnology Information, allow organized accumulation and distribution of information, access to this data on the Internet can be problematic. Because of high demands, it is often difficult to connect to public servers, and response times can be slow. The development of a networked bioinformatics database and analysis system for UW users would resolve these problems. The proposed Biological Information Resource would greatly contribute to the ability of life sciences students to become proficient in skills necessary for their future careers.

Project Description

We propose to establish a "Biological Information Resource" for students within the University of Washington which would provide general student access to the following:

1. Commercial software for cross-platform and remote computer access

2. Commercial software for molecular sequence analysis

3. Non-commercial bioinformatics software developed within the University

4. Biological sequence databases accessed and queried through a UW-based file server

5. Computer workstations specifically set up for bioinformatics analysis

1. Commercial software for cross-platform and remote computer access

We propose to purchase student licenses for software required for interconnections between Windows and Macintosh personal computers and UNIX and Windows NT servers. eXodus (White Pine), an X-Windows emulation program, enables the Macintosh or Windows user to connect to a server using a full featured graphical interface. This program would allow students to better access and utilize Unix-based software, such as the GCG suite of gene sequence databases and analysis tools maintained by the Locke Computer Center. Since most users access GCG from a Windows or Macintosh computer, eXodus would allow students to take advantage of this new feature that greatly improves the usefulness of the system. The program would be provided free of charge to UW students who use the GCG package or other bioinformatics resources that support X-Windows. .

2. Commercial software for molecular sequence analysis

We propose to purchase software packages and site licenses to enable UW students to access and use commercial molecular sequence software packages which are normally unavailable to students due to cost. An example of such software is Sequencher (GeneCodes), which is a highly rated Macintosh program for DNA sequence assembly and analysis. The Health Sciences Library has purchased a single simultaneous user site license which allows non-concurrent use by multiple users. Demand for this program has been high. A multiuser student-only Key Server (Sassafras Software) and an additional student-only Sequencher license would be purchased under this proposal. The Key Server would also be used to control access to other Macintosh and Windows bioinformatics software that would be licensed for student use. Various bioinformatics software including molecular sequence analysis packages and primer design programs (i.e. GeneRunner, Primer, LaserGene (Windows), GenePro (Dos) , MacVector and DNA Strider (MacIntosh)) will be tested, and licences negotiated for selected products. Support will be available from the Health Sciences Library Bioinformatics Consultant.

3. Non-commercial software for bioinformatic analysis

We are currently developing DNA and protein sequence analysis software accessible through a Web Browser. Two undergraduate student programmers are involved in this project. An HTML front-end DNA restriction site analysis program has already been developed and is available at "http://rose1.biostat.u.washington.edu/". Such programs will be made accessible to students through the campus network with a Web browser. This software is being developed for the PABIO536 Bioinformatics class. However, by giving general student access to this software, a greater continuity for students between their classroom training and later analytical or research needs is provided. Additional programs will be obtained from other non-commercial sources to provide a centralized source of bioinformatics tools on UW servers. These programs may be UNIX, DOS/Windows or Macintosh-based, and accessible either directly through the Web browser format or through the cross-platform graphical interfaces. Examples of such programs are Align and ClustalW for sequence alignments, PatMat, Blast and Fasta for sequence homology searches, and Web Restrict for restriction site analysis.

4. Biological sequence databases accessed and queried through a UW-based file server

We propose to establish a University Network Server, a Unix-based SparcStation, to regularly download, compress, and keyword current biological sequence databases, such as the DNA (GenBank) and protein (OWL) databases available from the National Center for Biotechnology Information (NCBI). The server would provide University-wide access to these databases for students in classes that incorporate molecular sequence analysis, as well as to those engaged in research. This local server would provide database acquisition and querying capabilities that are not possible through remote public servers. Using the Informix system as a core for the reformatted databases would allow analysis or querying of specific database subsets. This server would be housed and maintained in the Locke Computer Center. Additional disk drives would be purchased to handle the large sequence databases. Automated BLAST searching algorithms have already been developed. These programs will be installed on the requested Unix SparcStation and be available for use by all students.

5. Computer workstations specifically set up for bioinformatic analysis

We propose to set up two bioinformatics workstations within the Health Sciences Library to provide general student access to bioinformatics software and biological sequence databases. These computers would contain the licensed Macintosh and Windows bioinformatics software, and an interface to the proposed servers. These stations would have training and research software as well as evaluation software for future inclusion in the Biological Information Resource. We also propose to purchase an additional bioinformatics workstation to be located the Department of Pathobiology. This computer would be used for the development of gene sequence and analysis software to be included within the Biological Information Resource, and as an additional workstation for instruction, training and research in bioinformatics accessible to Pathobiology students and the Bioinformatics class. We also propose to set up a DNA sequence gel scanning and sequence assembly system using the SparcStation server. A $60,000 software package from BioImage and a $5,000 gel scanner has been donated to the Department of Pathobiology by Pathogenesis Corp., Seattle, WA. This gel scanner workstation would be set up in the Locke Computer Center. The server would be used to store DNA sequence gel images and to provide networked access to the gel images using an X-Windows interface.

Significance and Benefits:

The establishment of a Biological Information Resource within the University would be an invaluable resource to the undergraduate and graduate students in the life sciences by providing them with enhanced access to molecular sequence databases and analysis software. The resource would provide valuable tools and services to students in the biological and medical sciences throughout the University, and would provide a continuum between the practical issues taught in classes and those faced by students in their subsequent studies, training and research. The development of Web-based analysis programs and access to these programs on a UW server would alleviate many of the problems experienced by students engaged in molecular sequence analysis training and research. It would allow this work to be done on either Macintosh or Windows PCs, and enable students to study, train and do research using different analysis programs on any computer connected to the UW network. Access to HTML-based analysis programs and other bioinformatics programs running on a local Web server would benefit all students in the University involved in research, and those taking other classes that utilize biological information. Over 75 UW students have GCG accounts, 40 students take PABIO536, and an additional 30 students take bioinformatics classes offered by the Health Sciences Library. At least 200 courses in 30 different departments include a molecular biology component, and many of the 3680 students with majors in these departments would benefit from the availability of these resources (see Potential Student Impact).

Student needs for bioinformatic analysis tools have been established through discussions in the PABIO536 Bioinformatics class and through seminars and discussions with students throughout the life sciences. We have polled the past graduates of the PABIO536 class by email and have received a strong unanimous response for the need for this type of resource and the appropriateness of using the Student Technology Fee. There has also been a very positive response from student GCG users. At present there is no uniformity in sequence analysis programs used in different research laboratories. Having a common set of analysis programs would enable exchange of data and results between student researchers which has not been possible with software run on different computer platforms. Access to up-to-date biological sequence databases on a local server would be valuable to students engaged in molecular sequence analysis. Because of the large amounts of molecular sequence information which has become available due to the world-wide genome projects, the ability to perform bioinformatic analyses has become a major demand in the biotechnology and medical research job markets. Student fee support for the "Biological Information Resource" would provide a means for students to train and become qualified in this field.

Future of the Biological Information Resource

This proposal to develop a Biological Information Resource for students at the University of Washington is specifically designed to provide facilities and services to students which have not previously been available or which have not been readily accessible. The term "bioinformatics" has only been invented in the last couple of years. However, this discipline is becoming more and more important for the study of biology. Although the present proposal is aimed at students, it is clear that a larger effort covering all those engaged in research and study of the molecular basis of life at the University is necessary. A successful outcome from this student proposal will provide the impetus for future development of this resource University-wide. We intend to pursue funding from the University Initiative and granting agencies to further support and develop this resource for students, as well as for expanding it to others at the University. The Locke Computer Center and the Health Sciences Library are strongly committed to developing such a resource.

Procedure

Software procurement, installation and support: Exemplary software tools will be purchased and site-licenses obtained for multiple users. These software packages will be installed on the requested PC workstations and servers. Suitable software will be distributed to students for installation on their personal machines. The software selection, acquisition, and installation will be performed by Dr. Rose, Dr. Yarfitz, Mr. Holzman, and their assistants, and distribution will be carried out by the Health Sciences Library and the Locke Center. Software support will be provided Dr. Yarfitz (Health Sciences Library Biological Information Specialist), and Ted. Holzman (Systems Programmer, Locke Computer Center). Dr. Yarfitiz is a member of the Health Sciences Library staff, and is supported by Howard Hughes Medical Institute funds to develop biological information programs at the University. Student Technology Fee funds will be used for a part-time student assistant to work with Dr. Yarfitz on this project and to maintain and upgrade the software on the bioinformatics workstations in the Library. HTML-based bioinformatics software is currently under development in Dr. Rose's laboratory by student researchers and will be added to the resource as it is developed. A number of software packages have already been identified and could be purchased immediately. Other packages would be evaluated over the first 6 months of the funding period and selected ones would then be purchased.

Hardware procurement, installation and support: The servers and workstations will be purchased through joint agreement with the Locke Computer Center and the Health Sciences Library. The servers will be installed and supported by the Lock Computer Center, while the workstations will be installed and supported by the Health Sciences Library and the Department of Pathobiology. High-end Windows and Macintosh bioinformatics workstations will be located in the student area of the Health Sciences Library, and a Windows bioinformatic/software development workstation will be located in the Department of Pathobiology. An additional Sequence reader/assembly UNIX workstation and scanner will be established at the Locke Computer Center. Funds will also be used to purchase additional disk drives and other components for existing Locke Center computers that will be used as bioinformatics servers. The Locke Computer Center and the Health Sciences Library have long standing commitments for the development of student access to information and information analysis. The Locke Center, the Library, and the School of Public Health and Community Medicine/Department of Pathobiology see bioinformatics as an important area and have committed to provide continued support for development in this area. The hardware identified in this proposal would be purchased and installed within the first 4 months of the funding period.

Database management: Software for database management has already been developed. This software would be installed and sequence databases would be downloaded and prepared. The Locke Computer Center already supports biological sequence databases in conjunction to the UNIX based GCG package of bioinformatic software which is offered to University students and researchers. The Center has offered to provide support for the development and maintenance of the additional databases for the Biological Information Resource. The databases would be maintained on the requested servers and different methods for accession of these databases would be developed.

Student comments on the availability of Bioinformatic Programs and Databases and the need for a Biological Information Resource

1. " To much time is wasted with problems with computer problems, program bugs etc."

2. "I felt that Genepro was limiting because it was only accessible in the computer lab. I would have preferred to use tools on the Web, especially since these tools will be available to me after the class is over".

3. "Lots of bugs in some software programs".

4. "Constant frustration at just getting the programs to work at all, let alone do the analysis we were supposed to do".

5. On the Biological Information Resource. "I think this is a great idea. I am a bioengineering graduate student who has used and anticipates continuing to use GCG as well as DNAaid and many Web sources. A centralized location for biological software would be an appropriate and needed used of student fund money."

6. "I think this is a VERY good use of the student tech fee."

7. "That sounds like a great idea - maybe you will be able to establish windows compatible sequence analysis software (then I guess I'll have to audit the new course)."

8. "I am writing in support of the establishment of a biological information resource here at the University of Washington. The idea of expanding such help into a unified biological resource is very exciting. I only hope that it can be done quickly enough that I may still benefit from it during the time left for me here at the UW."


Budget

05 Supplies and Materials

06 Equipment