Skip to main content

Resources

GNN via Command Line

a tutorial for creating genome neighborhood networks via command line

 

The following tutorial outlines how to generate a Genome Neighborhood Network (GNN) from an Sequence Similarity Network (SSN) via the command line scripts. 

 

Step 1: prepare an ssn input

 

Generate an SSN with web-based EFI-EST and move this file to your personal Biocluster account via SFTP (WinSCP or CyberDuck, etc). Alternatively, generate an SSN via the command line scripts. 

 

Step 2: run the makegnn script

 

IN A BIOCLUSTER-CONNECTED TERMINAL:

 

Start an interactive session so that the next commands run on a compute node:

-bash-4.1$ qsub -I -q efi       (that is a capital "i")

qsub: waiting for job 638255.biocluster.igb.illinois.edu to start

qsub: job 638255.biocluster.igb.illinois.edu ready

 
Navigate to the directory containing your SSN input. Load the appropriate module and run the following command to generate a Genome Neighborhood Network (the red text indicates required network specific input).  The text in brackets is optional; omit the brackets when including optional parameters: 
 
-bash-4.1$ module load efignn
-bash-4.1$ makegnn.pl -ssnin "yourSSNtitle".xgmml -n 10 -gnn "yourSSNtitle"-gnn.xgmml -ssnout "yourSSNtitle"-color.xgmml -incfrac 20 [-stats "yourSSNtitle"-stats.tab] [-nomatch "yourSSNtitle"-nomatch.tab]
 
parse edges and nodes from original xgmml

Fetch Edges

Fetch Nodes

Get Network Name

graph name is EthN-Glu-Ligase-PF00120-InterPro51-SSN-AS120(1)

parse nodes for accessions

parse edges to determine clusters

find neighbors

 

Supernode 4, 607 original accessions, simplenumber 1

 

write out gnn xgmml

write out colored ssn network

makegnn.pl finished

 

 
Replace "yourSSNtitle" with the name of your input file. Download your resulting networks with SFTP. 
 

 

 

Description of Variables

-ssnin    The xgmml you are creating the GNN from

-ssnout   Name of the colorized xgmml file that will be created

-nomatch   Name of a tab file that will be created that contains entries that were not matched in ENA.

-n    User-defined +/- number to search for neighbor sequences, default is 10

-gnn   Name of the GNN xgmml file that will be created

-incfrac   User-defined % co-occurrence threshold, default is 20, allowable integers 1-100

-stats The name of the tab file that will be created that contains GNN statistics