The heart of this course is in the computer laboratory
exercises and projects. You can read and talk about
bioinformatics, but to really learn it, you have to
do it. This page discusses matters relevant to
computing in the course as a whole and gives an overview of
what to expect. The exercises and projects will give
specific information related to each of the topics
covered. To those of you who have already embraced the Information
Age: You should find that the web-based instructional
support and learning activities will be an efficient and
timesaving way to experience the core content of the
course. To those of you who have not yet driven the Information
Superhighway, or only tried it a few times: You may find
some of this intimidating at first. However, after exploring
these pages and working through the exercises and projects,
your comfort level will rapidly increase. To all: Problems will occur.
Links to specified sites may change. A page on this site or
elsewhere may not load properly. Unique problems may surface
associated with specific server access. An analysis
application may crash or give consistent error messages.
Whatever it is, please communicate- both with me and with
your fellow classmates. If the problem is associated with a
campus computer, or if you think it may be due to the SSU
server, or even if you are not sure, try the help line:
664-HELP. Keeping a log: It is easy to get
caught up in surfing the web, but how do you keep track of
all the places you'd like to come back to? How do you
minimize the time spent and maximize the information
retrieved? How can you be sure that you recorded the URLs
[Internet addresses] correctly? How can you speed up
the process? The answer to all of the above questions is to
make and use a log. A log is an incredibly handy tool and quite easy to set
up. Open a page in your favorite word processing program or
notepad. [Try using a recent version of Word, because
you can automatically turn URLs into active links by using
return/enter at the end of the address.] Size the window
so you can easily click on it whenever you want, while
saving most of your screen for the open web browser window.
Stagger the corners, so you can easily toggle between the
two windows. Alternatively, you may prefer to work with both
windows maximized and toggle between them by using the
navigation bar. When you find something you want to save, be it a single
address or a whole page of information, you can simply
copy/paste between the browser and your log. You can add
your own notes and comments as you go, note questions you
have, ideas you want to follow in the future, and so on. Be
sure to save your log to a disk. If using school computer
labs, the log has an additional advantage for bookmarks.
Since bookmarks are regularly removed from lab computers,
the log allows you to carry your bookmarks to any machine
you want. [Note: Zip drives are the standard on IMACs
and are available on PCs. Some machines have 100 MB drives
and others have 250 MB drives. Both types will read/write on
100 MB, so that is the best size of zip disk to
get.] The log can be advantageous for completing assignments.
Besides editing them for your own use and making them
functional accessories to surfing, you can use them to store
material for answering homework questions and for projects.
It is easy to collect and store references, both from the
literature and for material on the Web. Web etiquette
expects that recognition is given for material found on-line
just as it is expected for books and journals. For
assignments and projects, it is easy to copy/paste onto
another page and then do a little editing. However, care
needs to be taken that the work is your own, and not
plagiarized. Unit 1: Databases
& Queries Exercises start by providing a basic introduction
to useful sites related to bioinformatics and supporting
subjects and instruction on how to go about finding sites of
interest by using search engines. Developing efficient
search strategies early will help in many ways throughout
the semester, and beyond. Everyone is encouraged to extend
beyond the limits of the first exercise by exploring some
other links given on the "Links" page. Databases, including
accessing and searching, are introduced next, beginning with
literature databases. The first project, introduced
by Eileen Thatcher, involves finding some specific
literature relating to bioinformatics and discussing the
content of the papers found. Molecular sequence database
interfaces, such as NCBI, EBI, and NBIF, are introduced
along with using different types of search strategies, which
includes locating specific nucleic acid and protein
sequences and searching for homologous sequences. Analysis
of the quality of search results and how to refine the
searches is an important part of this introduction. Other
types of databases are explored, including ones related to
protein structure, genomics, metabolic pathways, and
ecology. Biology Workbench, a multipurpose interface with
server-based project file storage, is introduced and is
expected to be useful for many activities during the
semester and beyond. The second project, introduced
by Judy Sakanari, involves using the molecular databases to
answer a question relating to molecular parasitology. Unit 2: Genomics Exercises using EMBOSS, introduced by Barbara
Chapman, give experience in analyzing raw sequence data,
including editing and assembling fragments into contigs,
aligning and displaying contigs. As an extension to search
strategies introduced in Unit 1, there is some practice in
finding and annotating genes. How sequences are submitted to
databases is demonstrated. Further exploration of databases
includes becoming familiar with genomic databases associated
with many of the genome projects [Aradopsis,
Bacillus subtilis, C. elegans, and human, to name
a few] and using applications associated with them. The
project, introduced by Richard Whitkus, focuses on
genomic mapping. Unit 3: Molecular Genetics Exercises include applying what was learned in
Unit 2 to plasmid mapping and designing primers. Microarrays
and the analysis of them is introduced, along with issues
relating to appropriate chip design and challenges in
database management. The project focuses on the use
of microarray data to examine the genetics of a disease.
[Following this unit, there is a mid-semester problem
set reviewing the first three units of the course.] Unit 4: Phylogenetics Exercises focus on multiple sequence alignment
[MSA] tools, such as ClustalW, and related
applications. The effects of selection versus genetic drift
is examined using closely related sequences. Sequence
profiles to identify and describe motifs and motif databases
are introduced. This includes NCBI's COG [Cluster of
Orthologous Groups] database, which compares complete
genomes and can be used to identify gene families and
phylogenetic patterns. Phylogenetic tree building and tree
evaluation is introduced and cladistic methods of parsimony
and maximum likelihood are examined. The project,
introduced by Derek Girman, focuses on phylogenetics in
mammals. Unit 5: Protein Structure
Prediction Exercises involve use of SwissProt and PDB
databases. Motif and profile analysis begun in Unit 4 are
extended to examine relationships to structural patterns. 3D
structure viewers are used to view protein models and
homology modeling is introduced using Modeler.
Secondary and tertiary structure prediction is examined
using a variety of tools. The project, introduced by
Barbara Chapman, focuses on examining protein structure of
death domains. Unit 6: Metabolism &
Networks Exercises begin with exploration of metabolic
databases and their uses. This includes utilizing KEGG, an
encyclopedia of linked databases of genes, genomes, and
metabolic pathways. Also WIT [What is There], a
metabolic reconstruction resource, is explored. Data mining
methods as applied to biological databases are introduced,
including self-organizing maps, decision trees, and neural
networks. The brief introduction given here should give an
idea of some exciting areas which are expected to grow very
rapidly, especially as more fully sequenced genomes become
available. The project will be announced shortly. The final Project Proposal is your chance
to utilize what you have learned along the way and apply it
to a topic of your interest. This is intended to be a
proposal for either 1) a short project probelm on a scale of
what you have been doing in the foregoing projects using
bioinformatics tools and techniques, or 2) a more involved
project requiring a semester or so- something perhaps
suitable for senior research. You need to design your
project and describe how you expect it to be completed. After each of the units is completed, there will be a
survey on this portion of the course. You will earn
points for completing the survey. The results will be used
to improve future offerings of the exercises and
projects. Access to these exercises:
Under construction Click on the link Course
Materials Home, also found on the navigation bars.
Problems
Keeping a log
Topic focus of
exercises
Introduction
Topic Focus of Computing Exercises and
Projects:
Updated 8/28/03 by thatcher@sonoma.edu