Data import

Currently, AlphaStats allows the analysis of five quantitative proteomics software packages: AlphaPept, DIA-NN, FragPipe, MaxQuant and Spectronaut. As the output of these software differs significantly data needs to be loaded in customized loaders.

Imported proteomics data and metadata can be combined in a DataSet, which will be used for the downstream analysis.

import alphastats 

maxquant_data = alphastats.MaxQuantLoader(
    file="testfiles/maxquant_proteinGroups.txt"
)

dataset = alphastats.DataSet(
    loader = maxquant_data, 
    metadata_path="../testfiles/maxquant/metadata.xlsx", 
    sample_column="sample"
)

Importing data from a Proteomics software

To import data from these software the columns describing intensity and Protein/ProteinGroups have to be specified when loading. Each loader has a default for the intensity_columnand the index_column, however, the column naming can vary depending on the version of the software and personalized settings.

As we are dealing with wide data, a column represents the intensity for one sample. Thus the intensity_column must be specified as follow: For MaxQuant "LFQ intensity [sample]" or for AlphaPept "[sample]_LFQ" (this is already set as default).

Upon data import, the proteomics data gets processed in an internal format.

Additional modifications by AlphaStats

When importing the data, AlphaStats will identify potential contaminations based on a contaminant library, created by Frankenfield et al. 2022. This information will be added as an extra column to the imported data and can either be ignored or used for filtering in the preprocessing step.

AlphaPept

Alphapept output can either be imported as results_proteins.csv or results.hdf.

Intensity types AlphaPept either described the raw intensity or the free quantification (LFQ) intensity. By default AlphaStats uses the LFQ-Intensity for the downstream analysis.

Further, AlphaStats will identify “Reverse” - Proteins.

Find more details about the file format here.

import alphastats
alphapept_data = alphastats.AlphaPeptLoader(file="testfiles/alphapept_results_proteins.csv")

MaxQuant

MaxQuant generates multiple files as output. For the downstream analysis the proteinGroups.txt file, containing the aggregated protein intensities is sufficient.

Intensity types MaxQuant annotates different intensity types: raw intensity, label - free quantification (LFQ) intensity and intensity-based absolute quantification (iBAQ) intensity. The default settings are “LFQ intensity [sample]”.

Find more details about the file format here

import alphastats 
maxquant_data = alphastats.MaxQuantLoader(file="testfiles/maxquant_proteinGroups.txt")

DIA-NN

For the analysis of DIA-NN output use report_final.pg_matrix.tsv. Versions before 1.7. are not supported.

Find more details about the file format here.

import alphastats 
diann_data = alphastats.DIANNLoader(file="testfiles/diann_report_final.pg_matrix.tsv")

FragPipe

Find more details about the file format here.

import alphastats 
fragpipe_data = alphastats.FragPipeLoader(file="testfiles/fragpipe_combined_proteins.tsv")

Spectronaut

Find more details about the file format here.

As default alphastats will use “PG.ProteinGroups” and “PG.Quantity” for the analysis. For an ananlysis on a peptide level the “F.PeakArea” and the peptide sequences (“PEP.StrippedSequence”) can be used.

import alphastats
spectronaut_data = alphastats.SpectronautLoader(
    file="testfiles/spectronaut/results.tsv",
    intensity_column = "F.PeakArea",
    index_column = "PEP.StrippedSequence"
    )

mzTab

Find more details about the file format here.

import alphastats
mztab_data = alphastats.mzTabLoader(
    file="testfiles/mztab/test.mztab"
    )

Preparing metadata

To compare samples across various conditions in the downstream analysis, a metadata file in form of a table (excel, csv, tsv) is required. This file should contain a column with the sample IDs (raw file names) matching the sample names annotated in the output file of your proteomics software. Further, information can be provided like disease and various clinical parameters. Examples of metadata files can be found in the testfiles-folder.

Creating a DataSet

The whole downstream analysis can be perforemd on the alphastats.DataSet. To create the DataSet you need to provide the loader object as well as the metadata.

import alphastats 

maxquant_data = alphastats.MaxQuantLoader(
    file="testfiles/maxquant_proteinGroups.txt"
)

dataset = alphastats.DataSet(
    loader = maxquant_data, 
    metadata_path="../testfiles/maxquant/metadata.xlsx", 
    sample_column="sample"
)