A Primer for using the EFi's unix scripts for sequence similarity networks - from fasta
If you are new to the process of generating SSNs via command line - please first review the information here.
This primer is for generating Sequence Similarity Networks using a user-defined FASTA file.
step 0: prepare a fasta file
Prepare a single text or fasta file containing all sequences that you would like included in the final sequence similarity network. Standard FASTA formatting is accepted.
Move this file to your personal Biocluster account via SFTP (WinSCP or CyberDuck, etc).
IN A BIOCLUSTER-CONNECTED TERMINAL:
Start an interactive session so that the next commands run on a compute node:
-bash-4.1$ qsub -I -q efi (that is a capital "i")
qsub: waiting for job 638255.biocluster.igb.illinois.edu to start
qsub: job 638255.biocluster.igb.illinois.edu ready
Navigate to the directory containing your FASTA file. Run the following command to clean up text files generated on non-Linux operating systems:
-bash-4.1$ dos2unix "yourfilename".fa
-bash-4.1$ mac2unix "yourfilename".fa
dos2unix: converting file Modified.doro.txt to UNIX format ...
step 1: blast and generate plots
Navigate to the directory containing your FASTA and DAT outputs. Run the following iteration of generatedata.pl in order to BLAST your FASTA sequences and generate statistical plots. At this point, you may opt to only include the FASTA as input - OR - you may combine your FASTA sequences with any Pfam or InterPro identifier by also designating a -pfam or -ipro variable in the below command.
step 3: generate networks
Run the analyzedata.pl program as described here.