SSU home

..Biology Department home

Bioinformatics .Computing

Home | Site Index | Syllabus | Schedule | Study aids | Computing | Links | Course Materials Home

 
Problems

Amino acids- Table

Keeping a log

Topic focus of exercises

  

Introduction

The heart of this course is in the computer laboratory exercises and projects. You can read and talk about bioinformatics, but to really learn it, you have to do it. This page discusses matters relevant to computing in the course as a whole and gives an overview of what to expect. The exercises and projects will give specific information related to each of the topics covered.

To those of you who have already embraced the Information Age: You should find that the web-based instructional support and learning activities will be an efficient and timesaving way to experience the core content of the course.

To those of you who have not yet driven the Information Superhighway, or only tried it a few times: You may find some of this intimidating at first. However, after exploring these pages and working through the exercises and projects, your comfort level will rapidly increase.

To all: Problems will occur. Links to specified sites may change. A page on this site or elsewhere may not load properly. Unique problems may surface associated with specific server access. An analysis application may crash or give consistent error messages. Whatever it is, please communicate- both with me and with your fellow classmates. If the problem is associated with a campus computer, or if you think it may be due to the SSU server, or even if you are not sure, try the help line: 664-HELP.

[top of page]

Keeping a log: It is easy to get caught up in surfing the web, but how do you keep track of all the places you'd like to come back to? How do you minimize the time spent and maximize the information retrieved? How can you be sure that you recorded the URLs [Internet addresses] correctly? How can you speed up the process? The answer to all of the above questions is to make and use a log.

A log is an incredibly handy tool and quite easy to set up. Open a page in your favorite word processing program or notepad. [Try using a recent version of Word, because you can automatically turn URLs into active links by using return/enter at the end of the address.] Size the window so you can easily click on it whenever you want, while saving most of your screen for the open web browser window. Stagger the corners, so you can easily toggle between the two windows. Alternatively, you may prefer to work with both windows maximized and toggle between them by using the navigation bar.

When you find something you want to save, be it a single address or a whole page of information, you can simply copy/paste between the browser and your log. You can add your own notes and comments as you go, note questions you have, ideas you want to follow in the future, and so on. Be sure to save your log to a disk. If using school computer labs, the log has an additional advantage for bookmarks. Since bookmarks are regularly removed from lab computers, the log allows you to carry your bookmarks to any machine you want. [Note: Zip drives are the standard on IMACs and are available on PCs. Some machines have 100 MB drives and others have 250 MB drives. Both types will read/write on 100 MB, so that is the best size of zip disk to get.]

The log can be advantageous for completing assignments. Besides editing them for your own use and making them functional accessories to surfing, you can use them to store material for answering homework questions and for projects. It is easy to collect and store references, both from the literature and for material on the Web. Web etiquette expects that recognition is given for material found on-line just as it is expected for books and journals. For assignments and projects, it is easy to copy/paste onto another page and then do a little editing. However, care needs to be taken that the work is your own, and not plagiarized.

[top of page]

Topic Focus of Computing Exercises and Projects:

Unit 1: Databases & Queries

Exercises start by providing a basic introduction to useful sites related to bioinformatics and supporting subjects and instruction on how to go about finding sites of interest by using search engines. Developing efficient search strategies early will help in many ways throughout the semester, and beyond. Everyone is encouraged to extend beyond the limits of the first exercise by exploring some other links given on the "Links" page. Databases, including accessing and searching, are introduced next, beginning with literature databases. The first project, introduced by Eileen Thatcher, involves finding some specific literature relating to bioinformatics and discussing the content of the papers found. Molecular sequence database interfaces, such as NCBI, EBI, and NBIF, are introduced along with using different types of search strategies, which includes locating specific nucleic acid and protein sequences and searching for homologous sequences. Analysis of the quality of search results and how to refine the searches is an important part of this introduction. Other types of databases are explored, including ones related to protein structure, genomics, metabolic pathways, and ecology. Biology Workbench, a multipurpose interface with server-based project file storage, is introduced and is expected to be useful for many activities during the semester and beyond. The second project, introduced by Judy Sakanari, involves using the molecular databases to answer a question relating to molecular parasitology.

Unit 2: Genomics

Exercises using EMBOSS, introduced by Barbara Chapman, give experience in analyzing raw sequence data, including editing and assembling fragments into contigs, aligning and displaying contigs. As an extension to search strategies introduced in Unit 1, there is some practice in finding and annotating genes. How sequences are submitted to databases is demonstrated. Further exploration of databases includes becoming familiar with genomic databases associated with many of the genome projects [Aradopsis, Bacillus subtilis, C. elegans, and human, to name a few] and using applications associated with them. The project, introduced by Richard Whitkus, focuses on genomic mapping.

Unit 3: Molecular Genetics

Exercises include applying what was learned in Unit 2 to plasmid mapping and designing primers. Microarrays and the analysis of them is introduced, along with issues relating to appropriate chip design and challenges in database management. The project focuses on the use of microarray data to examine the genetics of a disease. [Following this unit, there is a mid-semester problem set reviewing the first three units of the course.]

Unit 4: Phylogenetics

Exercises focus on multiple sequence alignment [MSA] tools, such as ClustalW, and related applications. The effects of selection versus genetic drift is examined using closely related sequences. Sequence profiles to identify and describe motifs and motif databases are introduced. This includes NCBI's COG [Cluster of Orthologous Groups] database, which compares complete genomes and can be used to identify gene families and phylogenetic patterns. Phylogenetic tree building and tree evaluation is introduced and cladistic methods of parsimony and maximum likelihood are examined. The project, introduced by Derek Girman, focuses on phylogenetics in mammals.

Unit 5: Protein Structure Prediction

Exercises involve use of SwissProt and PDB databases. Motif and profile analysis begun in Unit 4 are extended to examine relationships to structural patterns. 3D structure viewers are used to view protein models and homology modeling is introduced using Modeler. Secondary and tertiary structure prediction is examined using a variety of tools. The project, introduced by Barbara Chapman, focuses on examining protein structure of death domains.

Unit 6: Metabolism & Networks

Exercises begin with exploration of metabolic databases and their uses. This includes utilizing KEGG, an encyclopedia of linked databases of genes, genomes, and metabolic pathways. Also WIT [What is There], a metabolic reconstruction resource, is explored. Data mining methods as applied to biological databases are introduced, including self-organizing maps, decision trees, and neural networks. The brief introduction given here should give an idea of some exciting areas which are expected to grow very rapidly, especially as more fully sequenced genomes become available. The project will be announced shortly.

The final Project Proposal is your chance to utilize what you have learned along the way and apply it to a topic of your interest. This is intended to be a proposal for either 1) a short project probelm on a scale of what you have been doing in the foregoing projects using bioinformatics tools and techniques, or 2) a more involved project requiring a semester or so- something perhaps suitable for senior research. You need to design your project and describe how you expect it to be completed.

After each of the units is completed, there will be a survey on this portion of the course. You will earn points for completing the survey. The results will be used to improve future offerings of the exercises and projects.

Access to these exercises: Under construction

Click on the link Course Materials Home, also found on the navigation bars.

[top of page]

Home | Site Index | Syllabus | Schedule | Study aids | Computing | Links | Course Materials Home

 Updated 8/28/03 by thatcher@sonoma.edu