Part 2: Antigen Selection Pipeline
2.2.4. Localization Screening
Algorithm Card
Input: non-homologous.fasta; PSORTb results from public web service in PSORTb-results.txt
Output: localized.fasta - Proteins with promising locations to be analyzed further (extracellular/cell membrane)
Brief Summary: Filter proteins based on probable localization, eliminating proteins confidently classified inside the cell.
Input: non-homologous.fasta; PSORTb results obtained from the public web service in PSROTb-results.txt Output: localized.fasta - Proteins that have promising locations to be analyzed further (i.e., extracellular/cell membrane). Brief Summary: Filter proteins based on probable localization, eliminating those that are confidently classified as being inside the cell.
Proteins can be found anywhere in a cell, but antigens that are surface-bound are much more effective for obvious reasons. It’s time to filter our cells by their localization. PSORTb is a tool that tries to localize proteins based on their sequence, returning a classification and a confidence score from 1 to 10. You can upload the last step’s resulting file here, and the result will be sent to you by mail in about 20-30 minutes.
With the result saved to ‘PSORTb-results.txt’, we can use the following script in combination with filter_fasta.py to eliminate proteins that are most likely located inside the cytoplasm or on the cytoplasmic membrane. This will be true for a majority of proteins, so don’t worry if this step eliminates 60-80% of the remaining candidates.
process_psortb_results.pyNote that the constraints, like the previous ones, are fairly loose: instead of keeping just proteins we’re confident have good localization, we’re eliminating proteins that we’re confident are located on the inside of the cell. This is because tool results (even those of the best tools) may be inaccurate - we’ll review experimentally determined locations for finalists to account for the added flexibility. This way of defining the criteria also allows us to keep looking at the many proteins with a localization of ‘unknown’, meaning PSORTb heuristics could not determine exactly where they’d be located (or more than one location is probable). With the script, going through this step is as easy as running two commands:
step3.shOnly 653 proteins are written to ‘localized.fasta’ and will be analyzed in the next step.
Note: Depending on the bacteria you’re analyzing, DeepLocPro may yield better results. To choose a tool for this step, run different pipelines, each using one of the tools, then see which suggested candidates have experimentally known locations. K. pneumoniae protein localization is better predicted by DeepLocPro, possibly due to the bacteria being analyzed more, while PSORTb produces much better results for A. baumannii.