The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

WWW::PDB - Perl interface to the Protein Data Bank

SYNOPSIS

  use WWW::PDB qw(:all);

  # set directory for caching downloads
  WWW::PDB->cache('/foo/bar');
  
  my $fh = get_structure('2ili');
  print while <$fh>;
  
  my @pdbids = WWW::PDB->keyword_query('carbonic anhydrase');
  for(@pdbids) {
      my $citation = WWW::PDB->get_primary_citation_title($_),
      my @chains   = WWW::PDB->get_chains($_);
      printf("%s\t%s\t[%s]\n", $_, $citation, join(', ', @chains));
  }

  my $seq = q(
      VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTK
      TYFPHFDLSHGSAQVKGHGKKVADALTAVAHVDDMPNAL
  );
  print WWW::PDB->blast($seq, 10.0, 'BLOSUM62', 'HTML');

DESCRIPTION

The Protein Data Bank (PDB) was established in 1971 as a repository of the atomic coordinates of protein structures (Bernstein et al., 1997). It has since outgrown that role, proving invaluable not only to the research community but also to students and educators (Berman et al., 2002).

WWW::PDB is a Perl interface to the Protein Data Bank. It provides functions for retrieving files, optionally caching them locally. Additionally, it wraps the functionality of the PDB's SOAP web services.

FUNCTIONS

CUSTOMIZATION

Let's start with some functions that let you customize how the module does its job. You probably won't play with any of these very often (if at all) except for cache, which is recommended for anyone that expects to do extensive work with a set of files: that way you don't waste resources downloading them each time.

WWW::PDB->ftp( [ $HOST ] )

Returns the host name for the PDB FTP archive, first setting it to $FTP if it's specified. Default value is ftp.wwpdb.org.

WWW::PDB->cache( [ $DIR ] )

Returns the local cache directory, first setting it to $DIR if it's specified. If defined, the module will look for files here first and also use the directory to store any downloads.

WWW::PDB->ns( [ $URI ] )

Returns the namespace URI for the PDB web services, first setting it to $URI if it's specified. Default value is http://www.pdb.org/pdb/services/pdbws.

WWW::PDB->proxy( [ $URI ] )

Returns the proxy for the PDB web services, first setting it to $URI if it's specified. Default value is http://www.pdb.org/pdb/services/pdbws.

WWW::PDB->soap( [ $CLIENT ] )

Returns the client SOAP::Lite object used by this module to talk to the PDB's SOAP interface, first setting it to $CLIENT if it's specified. It's best not to access it directly, but if you must, this is how.

FILE RETRIEVAL

Each of the following functions takes a PDB ID as input and returns a file handle (or undef on failure). You can import these into your namespace with the file tag, as in use WWW::PDB qw(:file).

get_structure( $PDBID )

Retrieves the structure in PDB format.

get_structure_factors( $PDBID )

Retrieves the structure factors file.

PDB ID STATUS

The following functions deal with the status of PDB IDs. You can import them into your namespace with the status tag: use WWW::PDB qw(:status).

get_status( $PDBID )

Finds the status of the structure with the given $PDBID. Return is in qw(CURRENT OBSOLETE UNRELEASED MODEL UNKNOWN).

is_current( $PDBID )

Checks whether or not the specified $PDBID corresponds to a current structure. Implemented for orthogonality, all this does is check if get_status returns CURRENT.

is_obsolete( $PDBID )

Checks whether or not the specified $PDBID corresponds to an obsolete structure. This is actually defined by the PDB web services interface.

is_unreleased( $PDBID )

Checks whether or not the specified $PDBID corresponds to an unreleased structure. Implemented for orthogonality, all this does is check if get_status returns UNRELEASED.

is_model( $PDBID )

Checks whether or not the specified $PDBID corresponds to a model structure. Implemented for orthogonality, all this does is check if get_status returns MODEL.

is_unknown( $PDBID )

Checks whether or not the specified $PDBID is unknown. Implemented for orthogonality, all this does is check if get_status returns UNKNOWN.

PDB WEB SERVICES

The following methods are the interface to the PDB web services.

blast( $SEQUENCE , $CUTOFF , $MATRIX , $OUTPUT_FORMAT )
blast( $PDBID , $CHAINID, $CUTOFF , $MATRIX , $OUTPUT_FORMAT )
blast( $SEQUENCE , $CUTOFF )
blast( $PDBID , $CHAINID , $CUTOFF )

Performs a BLAST against sequences in the PDB and returns the output of the BLAST program. XML is used if the output format is unspecified.

fasta( $SEQUENCE , $CUTOFF )
fasta( $PDBID , $CHAINID , $CUTOFF )

Takes a sequence or PDB ID and chain identifier and runs FASTA using the specified cut-off. The results are overloaded to give PDB IDs when used as strings, but they can also be explicitly probed for a pdbid or FASTA cutoff:

  printf("%s %s %s\n", $_, $_->pdbid, $_->cutoff)
      for $pdb->fasta("2ili", "A");
get_chain_length( $PDBID , $CHAINID )

Returns the length of the specified chain.

get_chains( $PDBID )

Returns a list of all the chain identifiers for a given structure, or a reference to such a list in scalar context.

get_cif_chain( $PDBID , $CHAINID )

Converts the specified author-assigned chain identifier to its mmCIF equivalent.

get_cif_chain_length( $PDBID , $CHAINID )

Returns the length of the specified chain, just like get_chain_length, except it expects the chain identifier to be the mmCIF version.

get_cif_chains( $PDBID )

Returns a list of all the mmCIF chain identifiers for a given structure, or a reference to such a list in scalar context.

get_cif_residue( $PDBID , $CHAINID , $RESIDUEID )

Converts the specified author-assigned residue identifier to its mmCIF equivalent.

get_current_pdbids( )

Returns a list of the identifiers (PDB IDs) corresponding to "current" structures (i.e. not obsolete, models, etc.), or a reference to such a list in scalar context.

get_ec_nums( @PDBIDS )
get_ec_nums( )

Retrieves the Enzyme Classification (EC) numbers associated with the specified PDB IDs or with all PDB structures if called with no arguments.

get_entities( $PDBID )

Returns a list of the entity IDs for a given structure, or a reference to such a list in scalar context.

get_genome_details( )

Retrieves genome details for all PDB structures.

get_kabsch_sander( $PDBID , $CHAINID )

Finds secondary structure for the given chain.

get_obsolete_pdbids( )

Returns a list of the identifiers (PDB IDs) corresponding to obsolete structures, or a reference to such a list in scalar context.

get_primary_citation_title( $PDBID )

Finds the title of the specified structure's primary citation (if it has one).

get_pubmed_ids( )

Retrieves the PubMed IDs associated with all PDB structures.

get_pubmed_id( $PDBID )

Retrieves the PubMed ID associated with the specified structure.

get_release_dates( @PDBIDS )

Maps the given PDB IDs to their release dates.

get_sequence( $PDBID , $CHAINID )

Retrieves the sequence of the specified chain.

get_space_group( $PDBID )

Returns the space group of the specified structure (the symmetry.space_group_name_H_M field according to the mmCIF dictionary).

homology_reduction_query( @PDBIDS , $CUTOFF )

Reduces the set of PDB IDs given as input based on sequence homology.

keyword_query( $KEYWORD_EXPR [, $EXACT_MATCH [, $AUTHORS_ONLY ] ] )

Runs a keyword query with the specified expression. Search can be made stricter by requiring an exact match or restricting the search to authors. Both boolean arguments are optional and default to false. Returns a list of PDB IDs or a reference to such a list in scalar context.

pubmed_abstract_query( $KEYWORD_EXPR )

Runs a keyword query on PubMed Abstracts. Returns a list of PDB IDs or a reference to such a list in scalar context.

UNTESTED

The following methods are defined by the PDB web services interface, so they are wrapped here, but they have not been tested.

get_annotations( $STATE_FILE )

Given a string in the format of a ViewState object from Protein Workshop, returns another ViewState object.

get_atom_site( $PDBID )

Returns the first atom site object for a structure.

get_atom_sites( $PDBID )

Returns the atom site objects for a structure.

get_domain_fragments( $PDBID , $CHAINID , $METHOD )

Finds all structural protein domain fragments for a given structure.

get_first_struct_conf( $PDBID )

Finds the first struct_conf for the given structure.

get_first_struct_sheet_range( $PDBID )

Finds the first struct_sheet_range for the given structure.

get_struct_confs( $PDBID )

Finds the struct_confs for the given structure.

get_struct_sheet_ranges( $PDBID )

Finds the struct_sheet_ranges for the given structure.

get_structural_genomics_pdbids( )

Finds info for structural genomics structures.

xml_query( $XML )

Runs any query that can be constructed, pretty much.

REFERENCES

  1. Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H., Shindyalov, I. N. & Bourne, P. E. (2000). Nucleic Acids Res. 28(1), 235-242.

  2. Bernstein, F. C., Koetzle, T. F., Williams, G. J. B., Meyer, Jr., E. F., Brice, M. D., Rodgers, J. R., Kennard, O., Shimanouchi, T. & Tasumi, M. (1977). Eur. J. Biochem. 80(2), 319-324.

SEE ALSO

The PDB can be accessed via the web at http://www.pdb.org/. The Java API documentation for the PDB's web services is located at http://www.rcsb.org/robohelp_f/webservices/pdbwebservice.html.

BUGS

Please report them: http://rt.cpan.org/Public/Dist/Display.html?Name=WWW-PDB

AUTHOR

Miorel-Lucian Palii, <mlpalii@gmail.com>

COPYRIGHT AND LICENSE

Copyright (C) 2008-2009 by Miorel-Lucian Palii

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.8 or, at your option, any later version of Perl 5 you may have available.