File Specification

FASTA file

This package assume description lines of FASTA file use the following format.
>anchor_id direction|start_position..end_position
In which anchor_id is the identifyer of the anchor. The direction field indicates the transcriptional direction of the anchor if anchor is gene, 1 indicates the transcriptional orientation is identical to the forward orientation of chromosome, and -1 indicates the gene is in reverse orientation. If the anchor is not gene, direction should always be 1. start_position and end_position is the start and end of the anchor. Follolwing is an example.
>indica5_23853 -1|4369..3375

Pair file

Pair file contains pairs founded bewteen 2 chromosomes or within 1 chromosome. Each line of pair file is a record of a pair of anchors (positioned genes or markers). Following is an example of record.
OsIBCD007232 1 26277701 OsIBCD015822 -1 32139191
One record contains 6 fields which are separated by one or more whitespaces. The first 3 fields represent the gene in the first chromosome of the pair and the late 3 ones represent the gene in the second chromosome. The first of the 3 fields is the name of the anchor. The second indicates transcriptional orientation if the anchor is a gene, 1 indicates the transcriptional orientation is identical to the forward orientation of chromosome, and -1 indicates the gene is in reverse orientation. If the anchor is not gene, this field should always be 1. The third field is the position of the anchor which is physical or in genetic. The name of pair file must contain the chromosome numbers which are compared to extract pairs. The order of the numbers must be corresponding to pairs in the file. The numbers can be separated by alphabets and underlines. Extension (suffix) is necessary to some programs.Following is an example.

Chromosome length file

Chromosome length file contains chromosome lengthes of a species. Each line of length file corresponds to a chromosome of the species. There are 2 fields of each line separated by one or more whitespaces. First is the chromsome No. and the second is the length (in bp or cM, corresponding to anchors). Following is an example from O. sativa ssp indica chromosome length file.
5       31202585

Block file

Predicted colinear segments, referred to as blocks, are collected in block files. Each block file has a head (the first line) which displays mg values used to search the blocks. Following is an example of the head.
+++++++++++++++++ MAXIMUM GAP LENGTH 500000 500000
Each block record contains multiple lines including a head line indicating block size, several lines of pairs contained in the block, and a statistical significant line of p-value. Block records are separated by empty lines. The following is a block record.
the 1th path length 15
OsIBCD029268 2.01061e+07 LOC_Os10 1.81952e+07 1
OsIBCD029264 2.00692e+07 LOC_Os10 1.83001e+07 -1
OsIBCD029250 1.99637e+07 LOC_Os10 1.85672e+07 1
OsIBCD029222 1.97822e+07 LOC_Os10 1.89026e+07 -1
OsIBCD029198 1.95817e+07 LOC_Os10 1.92271e+07 -1
OsIBCD029186 1.95031e+07 LOC_Os10 1.95171e+07 -1
OsIBCD029169 1.93285e+07 LOC_Os10 1.96994e+07 -1
OsIBCD029166 1.93003e+07 LOC_Os10 1.97443e+07 -1
OsIBCD044787 1.92408e+07 LOC_Os10 1.97995e+07 -1
OsIBCD029122 1.88742e+07 LOC_Os10 2.01553e+07 1
OsIBCD029076 1.84667e+07 LOC_Os10 2.0168e+07 1
OsIBCD029060 1.83421e+07 LOC_Os10 2.03553e+07 1
OsIBCD028992 1.78464e+07 LOC_Os10 2.04401e+07 1
OsIBCD028983 1.77888e+07 LOC_Os10 2.06978e+07 1
OsIBCD044725 1.73026e+07 LOC_Os10 2.07117e+07 -1
>LOCALE p-value : 7.78723e-15
The pairs in block records is somewhat different to the pair records in pair files. There is no orientation on chromosomes of each gene (if anchors are genes). Instead, the last field indicates the relative transcriptional orientation, 1 means the 2 genes have the same relative transcriptional orientation, -1 means the opposite orientation. The name of blocks file should have the same format as pair files (see above). e.g.


Copyright © 2006 Center of Bioinformatics, Peking University. All rights reserved.