Welcome to the enchanted world of Chlamydomonas genome annotation!

Consult the website www.chlamydomonas.info for updates and useful hints
   
Three ways you can help :
- as a registered annotator. If you want full access, you have to register by sending an E-mail to Diego Martinez stating your name, affiliation, chosen username and password, and who introduced you to Chlamy genome annotation. Notify Olivier Vallon so that your name be added to the mailing list
- as an occasional contributor, in which case you need to find in the list of annotators (« tasklist.xls ») someone who is ready to enter your data under his/her name. Defaults : O. Vallon or S. Merchant
- from a sequence : TblastN on genome (for additional info, you can usebonus scaffolds or non-assembled reads but no gene models there)
- as a seed. If you know people not necessarily working with Chlamy but with the skill and expertise needed in a field that is not properly covered yet, ask them to join for free!

In any event, you need to know what is expected in terms of depth of annotation. Read these guidelines carefully, but then find the procedure that suits you best. There are as many ways to annotate as there are annotators. Feel free to emphasize aspects you deem more important, it is OK as long as you are entering annotation that is both accurate and useful to others not familiar with this particular type of genes. It need not be as thorough as suggested below &the community welcomes all levels of participation.

You can use the « annotation spreadsheet.xls » to collect and organize your data before or as you are entering it.

You can only annotate the filtered gene models of version 2.0. It is not possible for the moment to enter or modify gene models, so if no model covers your gene, add annotation to the gene model nearby as « next to gene X (coordinates) coding for & ». Your annotation will be carried over into version 3.0

The annotation process:

Start with genes or gene families you are most familiar with, then work your way into areas you know less about. If your annotation is vague, chances are it will be completed by another user. Do not hesitate to add your annotation to genes already annotated by others. If you think the annotation out there is wrong or misleading, contact its author by email (list of annotators is at this web site : « TaskList.xls ») and discuss changing it.

Start with the protein page for your gene model, check the validity of the automatic annotation provided. Check gene structure on the browser, by comparing with EST data, alignments. By clicking « annotate this protein », you access the transcript page, which displays the best Smith-Waterman hits (you can change the filter), INTERPRO domains, GO codes, E.C. number, etc.  You can add to a field or edit it if it was entered by yourself. Both your annotation and that of previous users will be displayed. Fields you can fill :

Gene name : only if you know enough about that protein to give the gene a meaningful name. Consult the « guidelines » on the Chlamy Center website : the favorite is a 3-letter radical followed by a numeral, but other options are possible. Check that you are not giving a radical already used with another meaning. This can be done through the advanced search page by entering for example ABC* to search for the radical ABC.

Description : the most important field. Your entry will be searchable together with gene name by users at  the « advanced search » page, so be as thorough as possible. Suggested order : protein name/function (often you can just copy/paste from a Swissprot entry and this is the preferred description line); alternative names ; alternative gene names (in parentheses, including gene names in other organisms) ; identical  to XXXX, if already cloned in Chlamy (or « almost identical » if there are differences) ; subcellular localization (experimental, by sequence similarity &); expression pattern etc & Add as much literature information as needed, in the form of a PMID number (not whole citation). You can also add negative information, like « not a protein kinase », if the automatic annotation that guided you there is wrong. « Probably not a gene » is also useful. There will be many « unknown function », or « conserved in eukaryotes » etc & In general « unknown » means you believe there is a gene, but know nothing of its function, « hypothetical » means no hit no EST data. In general, do not leave a protein model you spent any time on as unannotated &

Model notes : will not be searched, but useful for any detailed analysis. Start by choosing from the pull-down menu (for example « no issue », or « sequence gap impacts model »). Indicate if gene structure needs to be modified, e.g. « 6th intron not supported by EST data », or whether you suspect misassembly, e.g. « C-terminus probably represented by C_xxxxxx ». This is where you enter splicing variants / alternative transcription starts (if biologically significant, enter also in the description), or clustering with genes of related function, or overlapping with neighboring genes etc...

EST evidence : choose yes or no. Watch for genomic contaminants, ESTs that match other places in the genome etc... In the comment line, add info for the EST contig(s) best describing the transcript (copy from the feature line)

Gene Ontology : if you feel this is important, either add from the automatic ontology field (domain, GO, EC code) or fill in the User-Assigned Ontology field. If needed, request creation of a new GO code.

Annotation strategies:

Various ways you can choose the genes you annotate: 
- with a keyword or gene name : « advanced search » the annotations, or « simple search » the Smith Waterman (Blast) hits, or use GO or KEGG search pages to retrieve gene models mapped to that function
- from a domain : : « simple search » INTERPRO hits for an IP number (but significance is not assessed)
- from a sequence : TblastN on genome (for additional info, you can usebonus scaffolds or non-assembled reads but no gene models there)
- by just hopping to the next unannotated gene on the scaffold &........
Possibly useful hints on how to use the annotation tools:
- check significance of alignment by %id and % coverage, by similarity in description and alignment of hits, by broadness of their taxonomic distribution.
- gene families : annotate the complete family. Use the red « tilt » button to fish out gene models matching a particular entry. Repeat with lesser hits until you have saturated and start hitting unrelated sequences. Collate Chlamy and relevant sequences from other organisms in a FASTA Word file, run though ClustalW and choose names according to closest similarity
- multidomain proteins : if only one region is covered by the alignments displayed, increase the number of alignments, or apply filter to reduce taxonomic or database range, or select relevant part of FASTA sequence and Blast on Chlamy genome or NCBI

Your Most Interesting Findings:
Make a note of your most interesting/bizarre findings, it may be useful when it comes to writing a genome paper. Ex : shortest exon, most alternatively spliced variants, overlapping genes, gene in an intron, unusual gene configurations/fusions, large gene family, evidence for horizontal gene transfer, whatever strikes you as deserving attention.
 
Return to Home Page