Gene set scores
This score is used to compute a p-value for each gene set. Because the size of a gene set must be taken into account as well, there is no simple relationship between a score and a pvalue, though higher scores will be better. This has several important implications:
- This means that the gene set score should not be used in isolation to evaluate the significance of a gene set. It is displayed for your information.
- Two gene sets of different sizes will have different pvalues for the same score
- The ranking of gene scores will not be the same as the ranking of p values
The meaning of the score depends on the type of analysis (see details of each analysis type for more information). The scores are:
- ORA: The number of genes in the gene set above the threshold you set.
- GSR (Resampling): Either the mean or median of the gene scores for the genes in the gene set, depending on the settings.
- Precision-recall: The average precision for the genes in the set, given the ranking of genes implied by the gene scores.
- ROC:The area under the ROC curve for the genes in the set, given the ranking of genes implied by the gene scores.
- Correlation: The mean value of the absolute value of the correlation between all pairs of genes in the gene set.
ORA: Say your gene score threshold is 0.001 and that selects 50 genes. Say some gene set has 30 genes, of which 3 are in the 50 genes you selected. The score displayed will be 3.
Resampling: For the same gene set of 50 genes, say the mean (log-transformed, negated) p-value is 2. That means that the geometric mean p-value is 0.01. The gene set score is 2.
Correlation: For that same gene set, we measure the correlation between each pair of genes, a value that can vary from -1 to 1. We use the absolute value, yielding values from 0 to 1. The average of this value is the gene set score. (comparisons of a gene to itself are not included).