Validate

Validate allows users to verify the predictive power of a Polygenic Risk Score (PRS).

Login to saas.allelica.com to access these functions in the PRS Validate App:

Validate Populations

You may choose to use your own population data or the UK Biobank population dataset. Both paths are detailed below.

Step 1 – Select PRS

There are 2 paths available to users:

1.1 Use the UK Biobank data as the population for comparison.

1.2 Provide your custom population for comparison.

Path 1.1

1.1.1 To use the UK Biobank data as the population for comparison, check the boxes for the disease/s you wish to validate:

1.1.2 Click "Select" to confirm your checkbox selections.

1.1.3 Click "Done".

Path 1.2

Alternatively, you may provide your custom population for comparison by uploading your own dataset.

1.2.1 Verify your data format

To ensure successful processing of your file, the data must be formatted as 3 tab-delimited fields. The file extension type must be .csv. Your column layout must be:

  1. rs ID

  2. Effective allele

  3. Weight

Matching your data to the expected structure is a vital step. You must setup your data appropriately pre-submission.

A tab-delimited file provides these 3 columns with an even spacing (tab) between them such as:

rs12345 A 0003

A comma-delimited file will be rejected, such as:

rs12345,A,0.0003

1.2.2 Once you have your data in the correct format, use our file picker to upload it. Click "Browse". Navigate to your file and upload.

1.2.3 Check the boxes for the disease/s you wish to validate:

1.2.4 Click "Select" to confirm your selections.

1.2.5 Click "Done".

Finalizing Select PRS

1.3 Success is confirmed with a "Task completed" verification providing a date/time stamp.

If you want to revert the upload of your data or your selections any time, simply use the "Reset" option.

Step 2 – Upload Validation

There are 2 paths available to users.

2.1 Use the UK Biobank data as the population for validation.

2.2 Provide your own custom population for validation.

Path 2.1

2.1.1 Implementing the UK Biobank population dataset as your validation population simply requires you to click:

2.1.2 The UK Biobank is a large dataset containing epidemiological, biometric, and clinical data from a population sample of approximately 400,000 European individuals. Each member of the UK Biobank population is also linked to Hospital Episode Statistics (HES) data, as well as national death and cancer registries. This vast amount of data allows you to formulate both simple and complex phenotypes based on a single biometric parameter or a combination of multiple data sources (e.g. Hospital diagnoses and Surgical procedures received by the patient).

Each data source in the UK Biobank is identified by a specific Data-Field number. For example, the heights of UK Biobank participants are specified by the Data-Field 12144. You can specify any desired phenotype by inserting all the phenotype defining-Data-Fields as a comma-separated list. You can browse the Data-Fields id in the UK Biobank showcase.

Please note that Data-Fields in the UK Biobank may contain data referring to multiple conditions; for example, the Data-Field 20002 (non-cancer illness code, self-reported) contains a wide spectrum of self-reported illnesses, each one specified by a different numerical code. In these cases, you must insert the phenotype-defining codes as a comma/separated list enclosed in brackets after the field of interest.

As an example, to specify a self-reported phenotype of diabetes (illness codes: 1220, 1222, and 1223), you must insert the following Data-Fields and codes: 20002 (1220, 1222, 1223). When accounting for multiple Data-Fields and codes, they must be comma-separated after each previous bracket.

2.1.3 Click "Done".

Path 2.2

2.2.1 As per Step 1.1, data to be uploaded must fit with the expected data format to successfully run your model.

For example, a VCF must provide the data in 2 tab-delimited columns as previously described.

The phenotype ID must correspond to that applied by the UK Biobank. They provide a search engine for this purpose.

Other data-standards that may be parsed include:

  • Oxford genotype format (bgen/bfam/bsam)

  • Plink genotype format (pgen/psam/pfam)

  • Binary format (bim/bed/fam)

Null values are not acceptable.

2.2.2 Once you have your data in the correct format, use our file picker to upload it. Click "Browse". Navigate to your file and upload.

2.2.3 Click "Done".

Finalizing Upload Validation Population

2.3 A successful upload is confirmed with an "Actions done" verification providing a date/time stamp.

Step 3 – Upload Testing Population

There are 2 paths available to users.

3.1 You may use the UK Biobank data as the test population for validating the predictive power of the model.

3.2 You may provide your own custom population for comparison.

Path 3.1

3.1.1 Implementing the UK Biobank population dataset as your validation population simply requires you to click:

3.1.2 Click "Confirm".

3.1.3 Click "Done".

Path 3.2

3.2.1 As per previous upload requirements, your data must fit with the expected data format to successfully run your model.

A VCF must be tab-delimited as previously described.

Null values are not acceptable.

3.2.2 Once you have your data in the correct format, use our file picker to upload it. Click "Browse". Navigate to your file and upload.

3.2.3 Click "Done".

Finalizing Upload Testing Population

3.3 A successful upload is confirmed with an "Actions done" verification providing a date/time stamp.

Step 4 – Run the Model

You are able to return to any of the previous steps to verify or update your choices at any time.

Once you have finalized all your selections, click "Run".

Step 5 – Download the Report

The processing power required to run the analysis is substantial. The main factors that will influence your run-time are the algorithm selected and your population size. You will receive an email notification when your report is available to download within 3–5 days.

Troubleshooting?

If you need assistance, please reach out.

Last updated