Part 2: Antigen Selection Pipeline
2.2.5. Essential proteins and virulence factors
Algorithm Card
Input: localized.fasta
Output: virulence.fasta - Proteins from localized.fasta that have adequate matches in both VFDB and DEG
Brief Summary: Filter remaining proteins based on expected virulence and essentiality.
Input: localized.fasta Output: virulence.fasta - Proteins from ‘localized.fasta’ that have adequate matches in both VFDB (Virulence Factor Database) and DEG . Brief Summary: Filter resulting proteins based on their expected virulence.
The last step in the pipeline - which is intended to return a small number of candidates that we can review manually - is to select only the remaining proteins that are likely to be essential and virulence factors. This can be done through methods similar to the ones used in [Homologous Protein Removal] using the Database of Essential Genes and the Virulence Factor DataBase:
step4.shNote that DEG may have some entries where the sequence of the protein reads “Not available” instead of a valid amino acid sequence. To address that, the script below processes all entries and removes the ones that don’t contain a full sequence, ensuring DIAMOND can read the final file.
step4_filter_deg.pyParameters for the matches are once again taken from similar pipelines in literature. The ‘virulence.fasta’ file has 21 candidate proteins, which need to be manually analyzed.