*************************************************************************
*****************************  About:  **********************************
*************************************************************************
DSDv0.5 is a diffusion state distance calculation program. It uses global
topological properties of graphs through random walks to compute
proximity in terms of node's funcationality in graphs such as
protein-protein interaction networks.

If you use DSD, please cite:
Cao M, Zhang H, Park J, Daniels NM, Crovella ME, Cowen LJ, Hescott B. 
(2013) Going the Distance for Protein Function Prediction: A New 
Distance Metric for Protein Interaction Networks. 
PLoS ONE 8(10): e76339. doi:10.1371/journal.pone.0076339 .

DSD is licensed under the GNU public license version 2.0. If you would
like to license DSD in an environment where the GNU public license is
unacceptable (such as inclusion in a non-GPL software package) commercial
Matt licensing is available through Tufts offices of Technology Transfer.
Contact cowen@cs.tufts.edu for more information.
Contact mcao01@cs.tufts.edu for issues involving the code.
Address: 161 College Ave., Medford, MA 02155, USA

*************************************************************************
****************************  Installation:  ****************************
*************************************************************************
To run it, simply copy all source files to the directory you want DSD to
run from and type in the command to run it. In order to support other
formats, we also provide a program PPIConvert that converts the matrix
represented PPI networks into PPI list file, which is acceptable to
DSD.

The code requires Python 2.7+ and numpy installed.


*************************************************************************
****************************  Overview:  ********************************
*************************************************************************
DSD takes a PPI file as input. The PPI file must be that each line
contains a PPI and the two interactors separated by comma/tab/space from
the biginning of lines.

It will firstly compute the largest connected component, calculate DSD
for all pairs of nodes in the component, and then output in one of the
following formats (tab delimited):

Type 1, "matrix" -- it contains a N by N DSD matrix and the node IDs
                    are at the first line and the fist row for all N
                    nodes in the largest connected component

Type 2, "list" -- it contains three columns, where the first two columns
                  are interactors from the input file and the third
                  column as the DSD value between the two nodes; NA if
                  either of the two nodes is not in the largest component

Type 3, "top" -- it contains for each node in the largest component one
		 line where the K nodes with lowest DSD are followed.


*************************************************************************
****************************  Command Line ******************************
*************************************************************************
usage: DSDmain.py [-h] [-n NRW] [-o OUTFILE] [-q] [-f] [-m {1,2,3}]
                  [--outformat {matrix,list,top}] [-k NTOP] [-t THRESHOLD]
                  infile

parses PPIs from infile and calculates DSD

positional arguments:
  infile                read PPIs from infile, either a .csv or .tab file that
                        contains a tab/comma/space delimited table with both
                        IDs at first row and first column, or a .list file
                        that contains for each line one interacting pair

optional arguments:
  -h, --help            show this help message and exit
  -c, --converge        calculate converged DSD
  -n NRW, --nRW NRW     length of random walks, 5 by default
  -o OUTFILE, --outfile OUTFILE
                        output DSD file name, tab delimited tables, stdout by
                        default
  -q, --quiet           turn off status message
  -f, --force           calculate DSD for the whole graph despite it is not
                        connected if it is turned on; otherwise, calculate DSD
                        for the largest component
  -m {1,2,3}, --outFMT {1,2,3}
                        the format of output DSD file: type 1 for matrix; type
                        2 for pairs at each line; type 3 for top K proteins
                        with lowest DSD. Type 1 by default
  --outformat {matrix,list,top}
                        the format of output DSD file: 'matrix' for matrix,
                        type 1; 'list' for pairs at each line, type 2; 'top'
                        for top K proteins with lowest DSD, type 3. 'matrix'
                        by default
  -k NTOP, --nTop NTOP  if chosen to output lowest DSD nodes, output at most K
                        nodes with lowest DSD, 10 by default
  -t THRESHOLD, --threshold THRESHOLD
                        threshold for PPIs' confidence score, if applied
                        
                        
*************************************************************************
******************************  Examples ********************************
*************************************************************************
In the downloaded package, two test files are included: 
small.tab and toy.example,
on which you can run using the following command:

$python DSDmain.py small.tab
$python DSDmain.py toy.example

Other files: testAllMatrix.dat, testColMatrix.dat, testRowMatrix.dat,
testOnlyMatrix.dat are matrix represented files with/without node ID
at rows/columns. You can run:

$python PPIconvert.py testAllMatrix.dat -o PPIList1
$python PPIconvert.py testColMatrix.dat -o PPIList2
$python PPIconvert.py testRowMatrix.dat -o PPIList3
$python PPIconvert.py testOnlyMatrix.dat -o PPIList4

And you will have these four files:
PPI1.list
PPI2.list
PPI3.list
PPI4.list
which contains one PPI at a line and you can feed directly into
DSD program