P-Value Calculator
Calculate p-values, z-scores, statistical significance, confidence levels, and normal distribution probabilities with step-by-step explanations and interactive visualizations.
P-Value Calculator
Enter any one of the six fields below to compute the p-value from a z-score, or invert any probability back to its z-score on the standard normal distribution.
Two-tailed p-value vs Z
How the two-sided p-value shrinks as |Z| grows. The classic 0.05, 0.01, and 0.001 thresholds correspond to Z ≈ 1.96, 2.58, and 3.29.
Critical Z by confidence level
Z-score reference table
| Z | Left tail | Right tail | Two-tailed |
|---|---|---|---|
| 1.00 | 0.84134 | 0.15866 | 0.31731 |
| 1.28 | 0.89973 | 0.10027 | 0.20055 |
| 1.65 | 0.95002 | 0.04998 | 0.09997 |
| 1.96 | 0.97500 | 0.02500 | 0.05000 |
| 2.00 | 0.97725 | 0.02275 | 0.04550 |
| 2.33 | 0.99010 | 0.00990 | 0.01981 |
| 2.58 | 0.99500 | 0.00500 | 0.01000 |
| 3.00 | 0.99865 | 0.00135 | 0.00270 |
| 3.29 | 0.99950 | 0.00050 | 0.00100 |
Standard-normal probabilities use the Abramowitz & Stegun erf approximation (Φ accurate to ≈ 1.5 × 10⁻⁷); inverse-CDF uses Wichura's AS241 to full double precision. Read more in our methodology and editorial policy.
What Is a P-Value?
A p-value is the probability of observing a test statistic at least as extreme as the one you have, assuming the null hypothesis is true. For a standard normal test statistic Z, the one-sided p-value is the tail area Φ(Z) or 1 − Φ(Z); the two-sided p-value is 2·(1 − Φ(|Z|)). A small p-value (typically < 0.05) means the observed result would be unusual under the null, which is why researchers use it to decide whether to reject H₀.
This page bundles six tools onto one URL: a P-Value Calculator that converts any z-score or probability into every related normal-distribution value, a Z-Score Calculator, a Statistical Significance Calculator, a Confidence Level Calculator for critical values, a Normal Distribution Calculator for arbitrary μ and σ, and a Hypothesis Testing Toolkit. Pair it naturally with the z-score calculator, the standard deviation calculator, the confidence interval calculator, and the probability calculator.
How P-Values Work
Standardise to Z
Convert your test statistic into a z-score: Z = (x − μ) / σ. The standard normal table then tells you exactly how unusual that value would be if the null were true.
Read the tail area
The one-tailed p-value is the area beyond Z under N(0,1). The two-tailed p-value adds the equivalent tail on the other side — use it when the alternative hypothesis is non-directional.
Compare against α
Pick the significance level α before running the test. If p < α, reject H₀; if p ≥ α, fail to reject. The convention is 0.05 in social sciences, 0.01 or 0.001 in medicine and physics.
Convert to confidence
Confidence level = 1 − α, and the matching critical Z is Φ⁻¹((1 + c) / 2). A 95% confidence interval covers the central 95% of the standard normal between ±1.960.
6 Ways to Use This P-Value Calculator
Convert Z to p-value
Type any Z and read every related probability — left-tail, right-tail, between ±Z, two-tailed, and the textbook 0-to-Z table column.
Invert a probability
Enter a p-value in any field — left, right, center, between, or two-tailed — and the calculator returns the matching Z to four decimal places.
Make a significance call
Move to the Significance tab to compare an observed p against α and read the reject / fail-to-reject decision with a strength label.
Find critical values
The Confidence Level tab returns the two-sided critical Z, α, and per-tail area for any confidence level — useful for hand-checking confidence intervals.
Compute interval probabilities
Under any N(μ, σ) — not just the standard normal — the Normal Distribution tab returns the probability that X lands between any two cut-offs.
Run the full toolkit
The Hypothesis Testing Toolkit produces one-and two-tailed verdicts in a single click — ideal when a stakeholder wants both at once.
Best Practices for P-Value Reporting
Pick the significance level α and the alternative hypothesis (one-tailed or two-tailed) before looking at the data. Selecting them afterwards is the textbook definition of p-hacking and inflates the false-positive rate well above the nominal α.
Always report the observed p-value to enough precision for the reader to compare it against any threshold. "p < 0.05" hides information; "p = 0.032" is actionable. For very small values, use scientific notation (the calculator switches to exponential form below 10⁻⁴).
Pair every p-value with an effect size and a confidence interval. A statistically significant result with a microscopic effect size is rarely actionable; confidence intervals communicate both precision and uncertainty in the units stakeholders care about.
Why P-Values Matter
Decision-making under uncertainty
P-values give organisations a defensible standard for declaring an effect 'real' rather than noise. Without an explicit α, anyone can claim a 1-point lift is meaningful — with one, the bar is fixed in advance.
Replication and trust
Pre-registered p-values are the entry point for replication science. The reproducibility crisis in social science traces back to flexible reporting; tight p-value discipline is part of the cure.
Regulatory science
Drug approvals, medical devices, and many ISO audits require pre-specified p-value thresholds (typically 0.05 or 0.025) for primary endpoints. The calculator's classification matches the convention used by the FDA and EMA.
Quantitative finance
Backtests of trading strategies, risk-model validation, and basket-test bias controls all rely on p-values to separate signal from luck. Two-sided tests are the convention for stock-return mean differences.
Where P-Values Get Tricky
Multiple comparisons
Running 20 tests at α = 0.05 will, on average, produce one false positive even when nothing is happening. Use Bonferroni, Holm, or Benjamini-Hochberg corrections before declaring significance across a family of tests.
Very large samples
With n in the millions, almost any difference reaches p < 0.001 — even effects too small to matter. Always pair p with an effect-size estimate (Cohen's d, log-odds, lift) and the calculator's confidence interval tab.
Discrete or skewed data
The calculator assumes a normal-approximated test statistic. For raw counts use Poisson or binomial tests; for heavy-tailed returns use bootstrap or robust statistics — the normal Z-table can mislead in those regimes.
One-tail vs two-tail
Picking a one-tailed test halves the p-value — a powerful inducement to cheat. Only use one-tailed tests when the directional alternative is registered in advance for substantive (not statistical) reasons.
Core P-Value Formulas
Z-score
Z = (x − μ) / σ
Standardises any raw value into standard-deviation units.
Standard normal density
φ(z) = (1/√(2π)) · e^(−z²/2)
Probability density at z under N(0,1).
Standard normal CDF
Φ(z) = ½ · [1 + erf(z / √2)]
Cumulative probability — the calculator uses A&S 7.1.26 for erf.
Left-tail p-value
p = Φ(Z)
Probability mass to the left of Z under N(0,1).
Right-tail p-value
p = 1 − Φ(Z)
One-sided p-value when the alternative is X > μ.
Two-tailed p-value
p = 2 · (1 − Φ(|Z|))
Two tails of equal size when the alternative is non-directional.
Between-±Z probability
P = 2 · Φ(|Z|) − 1
Coverage of a symmetric confidence interval at the given Z.
Critical Z (two-sided)
Z* = Φ⁻¹((1 + c) / 2)
Critical value for a confidence level c using AS241.
Critical Z (one-sided)
Z* = Φ⁻¹(1 − α)
Threshold the test statistic must exceed to reject H₀.
Inverse standard normal
z = Φ⁻¹(p)
Maps a probability back to its z-score — full double precision.
Confidence vs alpha
α = 1 − c
Significance level and confidence level always sum to 1.
Significance decision
Reject H₀ ⟺ p < α
The single rule that drives every hypothesis-testing verdict.
Common P-Value Mistakes
- 1
Reading p as the probability the null is true
P is P(data | H₀), not P(H₀ | data). It says nothing about the probability of either hypothesis on its own — it just measures how surprising the data would be under the null.
- 2
Treating p ≥ 0.05 as 'no effect'
Failing to reject H₀ does not prove H₀. It means the evidence is insufficient at this sample size. Report the confidence interval to communicate what effect sizes remain compatible with the data.
- 3
Computing a one-sided p after seeing the result
If you decide direction after looking at the data, you have effectively performed a two-sided test but reported a one-sided one — doubling the false-positive rate. Always pre-register the test direction.
- 4
Stopping a study as soon as p < 0.05
Continuous peeking at sequential data inflates the false-positive rate. Use a pre-registered stopping rule (e.g. Pocock or O'Brien-Fleming boundaries) or wait for the planned sample.
- 5
Ignoring multiple comparisons
Twenty independent tests at α = 0.05 produce a 64% chance of at least one false positive. Apply Bonferroni for primary endpoints; FDR for screening genomics or marketing tests.
- 6
Reporting bare 'p < 0.05'
Quote the exact value. A finding at p = 0.049 should not be reported the same way as one at p = 0.0006 — readers cannot compare evidence strength without the actual number.
Where P-Values Are Used
Academic research
Psychology, sociology, economics, and education research routinely test mean differences and proportions at α = 0.05 — the calculator's significance tab gives the standard reject/fail-to-reject decision.
Medicine and clinical trials
Phase III drug studies pre-register α = 0.05 (sometimes 0.025 for one-sided), and superiority is declared when the primary endpoint achieves p < α. The toolkit tab returns both one-and two-tailed verdicts at once.
A/B testing and growth
Web experiments translate a conversion lift into a Z-score and a two-tailed p-value. SAAS teams typically use α = 0.05 with a minimum detectable effect; finance teams use α = 0.01.
Data science and ML
Permutation tests, bootstrap CIs, and McNemar's test all reduce to p-value comparisons. The Z-conversion tools are useful for feature-importance significance tests and SHAP-value comparisons.
Marketing analytics
Lift testing in paid media uses p-values to validate creative variants and campaign tweaks. Brand-tracker studies report whether year-over-year changes in awareness or consideration are statistically significant.
Engineering and QA
Statistical process control flags processes whose mean has drifted beyond the 3σ rule (p < 0.0027 two-sided). Quality engineering uses the toolkit's strict α = 0.001 verdict for safety-critical signals.
Physics and astronomy
5σ discoveries (the Higgs convention) correspond to p ≈ 2.87 × 10⁻⁷. The calculator's exponential output makes these tail probabilities readable instead of rounding to zero.
Education research
School-effectiveness studies, standardised-test comparisons, and grading-curve audits depend on p-values to detect real differences between cohorts above the noise floor.
Public-opinion research
Election forecasters use Z-scores and p-values to summarise the swing direction of polls relative to fundamentals — useful as a fast read on whether a campaign is actually moving the needle.
Sports analytics
Sabermetrics tests whether a hot streak, a coaching change, or a new pitch sequence has truly altered a team's run expectancy — the standard answer comes from a Z-based proportion test on the result.
Common Z-Score → P-Value Conversions
| Z | Left tail | Right tail | Two-tailed | Typical context |
|---|---|---|---|---|
| 1.00 | 0.84134 | 0.15866 | 0.31731 | One-sigma deviation |
| 1.282 | 0.90000 | 0.10000 | 0.20000 | α = 0.10 one-sided / 90% CI |
| 1.645 | 0.95000 | 0.05000 | 0.10000 | α = 0.05 one-sided |
| 1.96 | 0.97500 | 0.02500 | 0.05000 | α = 0.05 two-sided / 95% CI |
| 2.00 | 0.97725 | 0.02275 | 0.04550 | Two-sigma deviation |
| 2.326 | 0.99000 | 0.01000 | 0.02000 | α = 0.01 one-sided |
| 2.576 | 0.99500 | 0.00500 | 0.01000 | α = 0.01 two-sided / 99% CI |
| 3.00 | 0.99865 | 0.00135 | 0.00270 | Three-sigma (SPC control limit) |
| 3.29 | 0.99950 | 0.00050 | 0.00100 | α = 0.001 two-sided / 99.9% CI |
| 5.00 | ≈ 1.0000 | 2.87 × 10⁻⁷ | 5.73 × 10⁻⁷ | Physics discovery (5σ) |
Methodology you can verify
Standard-normal CDF Φ is computed via Abramowitz & Stegun 7.1.26 (Φ accurate to ≈ 1.5 × 10⁻⁷). The inverse CDF uses Wichura's AS241 algorithm to full double precision, matching R's qnorm and Python's scipy.stats.norm.ppf to 14 decimals. Read more on the methodology and editorial policy pages.
Frequently Asked Questions
Related Calculators
More statistics tools that pair with hypothesis-testing work.
- Z-Score CalculatorZ-score, percentile rank, tail probabilities, Z ↔ probability conversion, and probability between any two Z-scores with bell-curve visualisation.
- Confidence Interval CalculatorCI for the population mean with margin of error, standard error, lower and upper bound, and step-by-step Z-based working.
- Probability CalculatorTwo-event probabilities, unions, intersections, complements, normal distribution, confidence intervals, and step-by-step solutions across five integrated tools.
- Sample Size CalculatorRequired sample size, margin of error, confidence interval, confidence level, and finite population correction — five survey tools with charts.
- Statistics CalculatorMean, median, mode, SD, quartiles, percentiles, outliers, skewness, kurtosis, CIs, and Z-score probabilities — six analysis modes with histogram and box plot.