6 Other Utilities

6.1 RBH

The gene pairs belonging to the Reciprocal Best Hits (RBH) generally have higher accuracy, Here we have integrated a convenient and fast tool to identify RBH. The user only needs to input two files containing three columns of data.
command

genetribe RBH -h

# Usage: RBH -a input1 -b input2
# Description: obtain Reciprocal Best Hits (RBH)
# 
# Exmaple:
#   Input1:
#     A1    B1    100
#     A1    B2    200
#   Input2
#     B1    A1    200
#     B2    A1    300
#   Output:
#     A1    B2
# 
# Options:
#   -h, --help  show this help message and exit
#   -a FILE     input file1
#   -b FILE     input file2

example

genetribe RBH -a genome1_genome2.score -b genome2_genome1.score > RBH.out

6.2 longestcds

An assembly generally contains tens of thousands of genes, and the total number of transcript may reach hundreds of thousands, and so many coding sequences are unnecessary for some analyses. Here, we provide a tool to extract the longest transcript of each gene as the representative sequences
command

genetribe longestcds -h

# Usage: longestfasta -i pep.fa -s strsplit
# Description: extract longest protein sequence from protein fasta
# 
# Options:
#   -h, --help  show this help message and exit
#   -i FILE     input file
#   -s STR      the string for spliting gene from transcript ID

example

longestfasta -i pep.fa -s .


Output

#  >AT2G27490
#  MRIVGLTGGIASGKSTVSNLFKASGIPVVDADVVARDVLKKGSGGWKRVVAAFGEEILLP
#  SGEVDRPKLGQIVFSSDSKRQLLNKLMAPYISSGIFWEILKQWASGAKVIVVDIPLLFEV
#  KMDKWTKPIVVVWVSQETQLKRLMERDGLSEEDARNRVMAQMPLDSKRSKADVVIDNNGS
#  LDDLHQQFEKVLIEIRRPLTWIEFWRSRQGAFSVLGSVILGLSVCKQLKIGS

6.3 CBS

Some collinearity analysis software, such as MCscan, produces outputs that include gene pairs within the collinear blocks. Here, we provide a tool to perform statistics on the collinear block according to the algorithm of genetribe and provide rich evaluation results.
command

genetribe CBS -h

# Usage: coreCBS -i input.anchors -a bed1 -b bed2 -o outname
# Description: calculate Collinear Block Score (CBS)
# 
# Options:
#   -h, --help  show this help message and exit
#   -i FILE     block obtaining from MCScan
#   -a FILE     bed 1
#   -b FILE     bed 2
#   -o STR      prefix name of output file

example

#  genetribe CBS -i aet.rice.lifted.anchors -a aet.bed -b rice.bed -o test

Output 1. test.block_pos

#  Chr1    292378812       306611448       10      20843117        21903586        0.685
#  Chr1    238973838       247016656       10      18873531        19475406        0.551
#  Chr1    193706848       201208886       10      15621354        16238841        0.485
#  Chr1    200618116       212196056       10      16665829        17689465        0.625
#  Chr1    232172333       235525430       10      18707781        18869428        0.266
#  Chr1    260674561       275221462       10      19962772        20806453        0.633
Column Description
1 Chromosome name in genome 1
2 Chromosome start location of collinear block in genome 1
3 Chromosome end location of collinear block in genome 1
4 Chromosome name in genome 2
5 Chromosome start location of collinear block in genome 2
6 Chromosome end location of collinear block in genome 2
7 The CBS score of collinear block

2. test.colinearity_info

#  block_13        Chr1:32978403-38610676,5:940494-1061952 84      0.40476190476190477     0.393   AET1Gv20118400,Os05g0120300; AET1Gv20125700,Os05g0120100; AET1Gv20125800,Os05g0120200; AET1Gv20130800,Os05g0119400; AET1Gv20131300,Os05g0119000; AET1Gv20131400,Os05g0118900; AET1Gv20131500,Os05g0118800; AET1Gv20131800,Os05g0118700; AET1Gv20133900,Os05g0118000; AET1Gv20134500,Os05g0117864; AET1Gv20134600,Os05g0117798; AET1Gv20125800,Os05g0120000; AET1Gv20130700,Os05g0119700
Column Description
1 ID of collinear block
2 Chromosome location of collinear block
3 Total number of genes in collinear block
4 The ratio of total number of homologous genes to that of all genes in collinear block
5 The CBS score of collinear block
6 All gene pairs in collinear block