6 Other Utilities
6.1 RBH
The gene pairs belonging to the Reciprocal Best Hits (RBH) generally have higher accuracy, Here we have integrated a convenient and fast tool to identify RBH. The user only needs to input two files containing three columns of data.
command
genetribe RBH -h
# Usage: RBH -a input1 -b input2
# Description: obtain Reciprocal Best Hits (RBH)
#
# Exmaple:
# Input1:
# A1 B1 100
# A1 B2 200
# Input2
# B1 A1 200
# B2 A1 300
# Output:
# A1 B2
#
# Options:
# -h, --help show this help message and exit
# -a FILE input file1
# -b FILE input file2
example
genetribe RBH -a genome1_genome2.score -b genome2_genome1.score > RBH.out
6.2 longestcds
An assembly generally contains tens of thousands of genes, and the total number of transcript may reach hundreds of thousands, and so many coding sequences are unnecessary for some analyses. Here, we provide a tool to extract the longest transcript of each gene as the representative sequences
command
genetribe longestcds -h
# Usage: longestfasta -i pep.fa -s strsplit
# Description: extract longest protein sequence from protein fasta
#
# Options:
# -h, --help show this help message and exit
# -i FILE input file
# -s STR the string for spliting gene from transcript ID
example
longestfasta -i pep.fa -s .
Output
# >AT2G27490
# MRIVGLTGGIASGKSTVSNLFKASGIPVVDADVVARDVLKKGSGGWKRVVAAFGEEILLP
# SGEVDRPKLGQIVFSSDSKRQLLNKLMAPYISSGIFWEILKQWASGAKVIVVDIPLLFEV
# KMDKWTKPIVVVWVSQETQLKRLMERDGLSEEDARNRVMAQMPLDSKRSKADVVIDNNGS
# LDDLHQQFEKVLIEIRRPLTWIEFWRSRQGAFSVLGSVILGLSVCKQLKIGS
6.3 CBS
Some collinearity analysis software, such as MCscan
, produces outputs that include gene pairs within the collinear blocks. Here, we provide a tool to perform statistics on the collinear block according to the algorithm of genetribe and provide rich evaluation results.
command
genetribe CBS -h
# Usage: coreCBS -i input.anchors -a bed1 -b bed2 -o outname
# Description: calculate Collinear Block Score (CBS)
#
# Options:
# -h, --help show this help message and exit
# -i FILE block obtaining from MCScan
# -a FILE bed 1
# -b FILE bed 2
# -o STR prefix name of output file
example
# genetribe CBS -i aet.rice.lifted.anchors -a aet.bed -b rice.bed -o test
Output 1. test.block_pos
# Chr1 292378812 306611448 10 20843117 21903586 0.685
# Chr1 238973838 247016656 10 18873531 19475406 0.551
# Chr1 193706848 201208886 10 15621354 16238841 0.485
# Chr1 200618116 212196056 10 16665829 17689465 0.625
# Chr1 232172333 235525430 10 18707781 18869428 0.266
# Chr1 260674561 275221462 10 19962772 20806453 0.633
Column | Description |
---|---|
1 | Chromosome name in genome 1 |
2 | Chromosome start location of collinear block in genome 1 |
3 | Chromosome end location of collinear block in genome 1 |
4 | Chromosome name in genome 2 |
5 | Chromosome start location of collinear block in genome 2 |
6 | Chromosome end location of collinear block in genome 2 |
7 | The CBS score of collinear block |
2. test.colinearity_info
# block_13 Chr1:32978403-38610676,5:940494-1061952 84 0.40476190476190477 0.393 AET1Gv20118400,Os05g0120300; AET1Gv20125700,Os05g0120100; AET1Gv20125800,Os05g0120200; AET1Gv20130800,Os05g0119400; AET1Gv20131300,Os05g0119000; AET1Gv20131400,Os05g0118900; AET1Gv20131500,Os05g0118800; AET1Gv20131800,Os05g0118700; AET1Gv20133900,Os05g0118000; AET1Gv20134500,Os05g0117864; AET1Gv20134600,Os05g0117798; AET1Gv20125800,Os05g0120000; AET1Gv20130700,Os05g0119700
Column | Description |
---|---|
1 | ID of collinear block |
2 | Chromosome location of collinear block |
3 | Total number of genes in collinear block |
4 | The ratio of total number of homologous genes to that of all genes in collinear block |
5 | The CBS score of collinear block |
6 | All gene pairs in collinear block |