P-Value Calculator

P-values, z-scores, significance, and normal distribution.

P-Value Calculator

Enter any one of the six fields below to compute the p-value from a z-score, or invert any probability back to its z-score on the standard normal distribution.

Z-scoreany real number

P(x < Z), left tail0–1

P(x > Z), right tail0–1

P(0 < x < Z), from center0–0.5

P(−Z < x < Z), between0–1

P(x < −Z or x > Z), two tails0–1

Two-tailed p-value vs Z

How the two-sided p-value shrinks as |Z| grows. The classic 0.05, 0.01, and 0.001 thresholds correspond to Z ≈ 1.96, 2.58, and 3.29.

Critical Z by confidence level

Z-score reference table

Z	Left tail	Right tail	Two-tailed
1.00	0.84134	0.15866	0.31731
1.28	0.89973	0.10027	0.20055
1.65	0.95002	0.04998	0.09997
1.96	0.97500	0.02500	0.05000
2.00	0.97725	0.02275	0.04550
2.33	0.99010	0.00990	0.01981
2.58	0.99500	0.00500	0.01000
3.00	0.99865	0.00135	0.00270
3.29	0.99950	0.00050	0.00100

Standard-normal probabilities use the Abramowitz & Stegun erf approximation (Φ accurate to ≈ 1.5 × 10⁻⁷); inverse-CDF uses Wichura's AS241 to full double precision. Read more in our methodology and editorial policy.

What Is a P-Value?

A p-value is the probability of observing a test statistic at least as extreme as the one you have, assuming the null hypothesis is true. For a standard normal test statistic Z, the one-sided p-value is the tail area Φ(Z) or 1 − Φ(Z); the two-sided p-value is 2·(1 − Φ(|Z|)). A small p-value (typically < 0.05) means the observed result would be unusual under the null, which is why researchers use it to decide whether to reject H₀.

This page bundles six tools onto one URL: a P-Value Calculator that converts any z-score or probability into every related normal-distribution value, a Z-Score Calculator, a Statistical Significance Calculator, a Confidence Level Calculator for critical values, a Normal Distribution Calculator for arbitrary μ and σ, and a Hypothesis Testing Toolkit. Pair it naturally with the z-score calculator, the standard deviation calculator, the confidence interval calculator, and the probability calculator.

How P-Values Work

Standardise to Z

Convert your test statistic into a z-score: Z = (x − μ) / σ. The standard normal table then tells you exactly how unusual that value would be if the null were true.

Read the tail area

The one-tailed p-value is the area beyond Z under N(0,1). The two-tailed p-value adds the equivalent tail on the other side — use it when the alternative hypothesis is non-directional.

Compare against α

Pick the significance level α before running the test. If p < α, reject H₀; if p ≥ α, fail to reject. The convention is 0.05 in social sciences, 0.01 or 0.001 in medicine and physics.

Convert to confidence

Confidence level = 1 − α, and the matching critical Z is Φ⁻¹((1 + c) / 2). A 95% confidence interval covers the central 95% of the standard normal between ±1.960.

6 Ways to Use This P-Value Calculator

Convert Z to p-value

Type any Z and read every related probability — left-tail, right-tail, between ±Z, two-tailed, and the textbook 0-to-Z table column.

Invert a probability

Enter a p-value in any field — left, right, center, between, or two-tailed — and the calculator returns the matching Z to four decimal places.

Make a significance call

Move to the Significance tab to compare an observed p against α and read the reject / fail-to-reject decision with a strength label.

Find critical values

The Confidence Level tab returns the two-sided critical Z, α, and per-tail area for any confidence level — useful for hand-checking confidence intervals.

Compute interval probabilities

Under any N(μ, σ) — not just the standard normal — the Normal Distribution tab returns the probability that X lands between any two cut-offs.

Run the full toolkit

The Hypothesis Testing Toolkit produces one-and two-tailed verdicts in a single click — ideal when a stakeholder wants both at once.

Best Practices for P-Value Reporting

Pick the significance level α and the alternative hypothesis (one-tailed or two-tailed) before looking at the data. Selecting them afterwards is the textbook definition of p-hacking and inflates the false-positive rate well above the nominal α.

Always report the observed p-value to enough precision for the reader to compare it against any threshold. "p < 0.05" hides information; "p = 0.032" is actionable. For very small values, use scientific notation (the calculator switches to exponential form below 10⁻⁴).

Pair every p-value with an effect size and a confidence interval. A statistically significant result with a microscopic effect size is rarely actionable; confidence intervals communicate both precision and uncertainty in the units stakeholders care about.

Why P-Values Matter

Decision-making under uncertainty

P-values give organisations a defensible standard for declaring an effect 'real' rather than noise. Without an explicit α, anyone can claim a 1-point lift is meaningful — with one, the bar is fixed in advance.

Replication and trust

Pre-registered p-values are the entry point for replication science. The reproducibility crisis in social science traces back to flexible reporting; tight p-value discipline is part of the cure.

Regulatory science

Drug approvals, medical devices, and many ISO audits require pre-specified p-value thresholds (typically 0.05 or 0.025) for primary endpoints. The calculator's classification matches the convention used by the FDA and EMA.

Quantitative finance

Backtests of trading strategies, risk-model validation, and basket-test bias controls all rely on p-values to separate signal from luck. Two-sided tests are the convention for stock-return mean differences.

Where P-Values Get Tricky

Multiple comparisons

Running 20 tests at α = 0.05 will, on average, produce one false positive even when nothing is happening. Use Bonferroni, Holm, or Benjamini-Hochberg corrections before declaring significance across a family of tests.

Very large samples

With n in the millions, almost any difference reaches p < 0.001 — even effects too small to matter. Always pair p with an effect-size estimate (Cohen's d, log-odds, lift) and the calculator's confidence interval tab.

Discrete or skewed data

The calculator assumes a normal-approximated test statistic. For raw counts use Poisson or binomial tests; for heavy-tailed returns use bootstrap or robust statistics — the normal Z-table can mislead in those regimes.

One-tail vs two-tail

Picking a one-tailed test halves the p-value — a powerful inducement to cheat. Only use one-tailed tests when the directional alternative is registered in advance for substantive (not statistical) reasons.

Core P-Value Formulas

Z-score

Z = (x − μ) / σ

Standardises any raw value into standard-deviation units.

Standard normal density

φ(z) = (1/√(2π)) · e^(−z²/2)

Probability density at z under N(0,1).

Standard normal CDF

Φ(z) = ½ · [1 + erf(z / √2)]

Cumulative probability — the calculator uses A&S 7.1.26 for erf.

Left-tail p-value

p = Φ(Z)

Probability mass to the left of Z under N(0,1).

Right-tail p-value

p = 1 − Φ(Z)

One-sided p-value when the alternative is X > μ.

Two-tailed p-value

p = 2 · (1 − Φ(|Z|))

Two tails of equal size when the alternative is non-directional.

Between-±Z probability

P = 2 · Φ(|Z|) − 1

Coverage of a symmetric confidence interval at the given Z.

Critical Z (two-sided)

Z* = Φ⁻¹((1 + c) / 2)

Critical value for a confidence level c using AS241.

Critical Z (one-sided)

Z* = Φ⁻¹(1 − α)

Threshold the test statistic must exceed to reject H₀.

Inverse standard normal

z = Φ⁻¹(p)

Maps a probability back to its z-score — full double precision.

Confidence vs alpha

α = 1 − c

Significance level and confidence level always sum to 1.

Significance decision

Reject H₀ ⟺ p < α

The single rule that drives every hypothesis-testing verdict.

Common P-Value Mistakes

1
Reading p as the probability the null is true
P is P(data | H₀), not P(H₀ | data). It says nothing about the probability of either hypothesis on its own — it just measures how surprising the data would be under the null.
2
Treating p ≥ 0.05 as 'no effect'
Failing to reject H₀ does not prove H₀. It means the evidence is insufficient at this sample size. Report the confidence interval to communicate what effect sizes remain compatible with the data.
3
Computing a one-sided p after seeing the result
If you decide direction after looking at the data, you have effectively performed a two-sided test but reported a one-sided one — doubling the false-positive rate. Always pre-register the test direction.
4
Stopping a study as soon as p < 0.05
Continuous peeking at sequential data inflates the false-positive rate. Use a pre-registered stopping rule (e.g. Pocock or O'Brien-Fleming boundaries) or wait for the planned sample.
5
Ignoring multiple comparisons
Twenty independent tests at α = 0.05 produce a 64% chance of at least one false positive. Apply Bonferroni for primary endpoints; FDR for screening genomics or marketing tests.
6
Reporting bare 'p < 0.05'
Quote the exact value. A finding at p = 0.049 should not be reported the same way as one at p = 0.0006 — readers cannot compare evidence strength without the actual number.

Where P-Values Are Used

Academic research

Psychology, sociology, economics, and education research routinely test mean differences and proportions at α = 0.05 — the calculator's significance tab gives the standard reject/fail-to-reject decision.

Medicine and clinical trials

Phase III drug studies pre-register α = 0.05 (sometimes 0.025 for one-sided), and superiority is declared when the primary endpoint achieves p < α. The toolkit tab returns both one-and two-tailed verdicts at once.

A/B testing and growth

Web experiments translate a conversion lift into a Z-score and a two-tailed p-value. SAAS teams typically use α = 0.05 with a minimum detectable effect; finance teams use α = 0.01.

Data science and ML

Permutation tests, bootstrap CIs, and McNemar's test all reduce to p-value comparisons. The Z-conversion tools are useful for feature-importance significance tests and SHAP-value comparisons.

Marketing analytics

Lift testing in paid media uses p-values to validate creative variants and campaign tweaks. Brand-tracker studies report whether year-over-year changes in awareness or consideration are statistically significant.

Engineering and QA

Statistical process control flags processes whose mean has drifted beyond the 3σ rule (p < 0.0027 two-sided). Quality engineering uses the toolkit's strict α = 0.001 verdict for safety-critical signals.

Physics and astronomy

5σ discoveries (the Higgs convention) correspond to p ≈ 2.87 × 10⁻⁷. The calculator's exponential output makes these tail probabilities readable instead of rounding to zero.

Education research

School-effectiveness studies, standardised-test comparisons, and grading-curve audits depend on p-values to detect real differences between cohorts above the noise floor.

Public-opinion research

Election forecasters use Z-scores and p-values to summarise the swing direction of polls relative to fundamentals — useful as a fast read on whether a campaign is actually moving the needle.

Sports analytics

Sabermetrics tests whether a hot streak, a coaching change, or a new pitch sequence has truly altered a team's run expectancy — the standard answer comes from a Z-based proportion test on the result.

Common Z-Score → P-Value Conversions

Z	Left tail	Right tail	Two-tailed	Typical context
1.00	0.84134	0.15866	0.31731	One-sigma deviation
1.282	0.90000	0.10000	0.20000	α = 0.10 one-sided / 90% CI
1.645	0.95000	0.05000	0.10000	α = 0.05 one-sided
1.96	0.97500	0.02500	0.05000	α = 0.05 two-sided / 95% CI
2.00	0.97725	0.02275	0.04550	Two-sigma deviation
2.326	0.99000	0.01000	0.02000	α = 0.01 one-sided
2.576	0.99500	0.00500	0.01000	α = 0.01 two-sided / 99% CI
3.00	0.99865	0.00135	0.00270	Three-sigma (SPC control limit)
3.29	0.99950	0.00050	0.00100	α = 0.001 two-sided / 99.9% CI
5.00	≈ 1.0000	2.87 × 10⁻⁷	5.73 × 10⁻⁷	Physics discovery (5σ)

Methodology you can verify

Standard-normal CDF Φ is computed via Abramowitz & Stegun 7.1.26 (Φ accurate to ≈ 1.5 × 10⁻⁷). The inverse CDF uses Wichura's AS241 algorithm to full double precision, matching R's qnorm and Python's scipy.stats.norm.ppf to 14 decimals. Read more on the methodology and editorial policy pages.

Frequently Asked Questions

A p-value is the probability of observing a test statistic at least as extreme as the one you have, assuming the null hypothesis is true. For a standard normal test statistic Z, the one-sided p-value is the tail area Φ(Z) or 1 − Φ(Z) and the two-sided p-value is 2·(1 − Φ(|Z|)). It is NOT the probability that the null hypothesis is true — it is P(data | H₀), not P(H₀ | data). A small p-value (typically < 0.05) means the observed result would be unusual under the null, which is the standard reason to reject H₀.

A z-score (also called a standard score) measures how many standard deviations a value lies above or below the mean. The formula is z = (x − μ) / σ. A z-score of +1.5 sits 1.5σ above the mean; a z-score of −2 sits 2σ below. Z-scores rescale any normal distribution to the standard normal N(0,1), so the same probability tables apply universally. The Z-Score tab on this page returns the z-score, percentile, and tail probabilities from any raw value.

p < 0.05 means the observed data would happen less than 5% of the time if the null hypothesis were true. By convention, this is the threshold for declaring a result 'statistically significant' in most social-science and survey research. It does NOT mean the alternative hypothesis is 95% likely, nor that the effect is large or important — only that the data is surprising under H₀ at the chosen tolerance level. Always pair the p-value with an effect size and a confidence interval to communicate practical significance.

Statistical significance is the decision that an observed effect is unlikely to be a fluke under the null hypothesis. It is determined by comparing an observed p-value against a pre-registered significance level α (typically 0.05, 0.01, or 0.001). If p < α, reject H₀ and call the result significant; if p ≥ α, fail to reject and conclude there is insufficient evidence. The Significance tab on this page automates that comparison and returns a strength-of-evidence label spanning very weak, weak, moderate, strong, and very strong.

A two-tailed test rejects the null hypothesis when the test statistic falls in either tail of the distribution — i.e. when |Z| exceeds the critical value Z* = Φ⁻¹(1 − α/2). It is the right choice when the alternative hypothesis is non-directional (the effect could go either way). For α = 0.05 the critical Z* is ±1.96; for α = 0.01 it is ±2.576. The two-tailed p-value is 2·(1 − Φ(|Z|)) — exactly twice the one-sided value.

A one-tailed (or one-sided) test rejects the null hypothesis only when the test statistic falls in a specified single tail. It is the right choice when the alternative hypothesis has a pre-specified direction — for instance, 'mean A is greater than mean B' rather than 'mean A differs from mean B'. The one-tailed p-value is half the two-tailed value, which makes one-tailed tests more powerful but also riskier — choosing the direction after seeing the data inflates the false-positive rate. Always pre-register the direction.

Convert your test statistic to a standard-normal z-score, then look up the appropriate tail area under N(0,1). For a one-tailed test with H₁: μ > μ₀, p = 1 − Φ(Z); for the opposite direction, p = Φ(Z); for a two-tailed test, p = 2·(1 − Φ(|Z|)). The P-Value tab on this page does both directions in one click — enter Z and read every related probability, or enter any probability and the inverse-CDF returns the matching Z.

A confidence level c is the long-run probability that a confidence interval constructed from a random sample contains the true population parameter. A 95% confidence level means about 95 of 100 repeat intervals would cover the true value. The matching significance level is α = 1 − c, and the two-sided critical Z is Z* = Φ⁻¹((1 + c) / 2). The Confidence Level tab returns the critical Z, α, and per-tail area from any confidence level.

A normal distribution is the bell-shaped continuous distribution defined by its mean μ and standard deviation σ. Its density is f(x) = (1 / (σ√(2π)))·exp(−(x − μ)² / (2σ²)). The standard normal N(0,1) is the special case with μ = 0 and σ = 1 — every normal distribution can be standardised to it via z = (x − μ) / σ, which is why z-tables and the calculator's P-Value tab work universally. Many real-world quantities — test scores, heights, measurement errors — are approximately normal thanks to the Central Limit Theorem.

P-values are the foundation of frequentist hypothesis testing — used in clinical trials, A/B testing, survey research, quality control, psychology, economics, physics, and many other fields. Specific uses include declaring drug efficacy in regulatory submissions, validating A/B test winners in product development, flagging out-of-control processes in manufacturing (the three-sigma rule corresponds to p < 0.0027), and identifying statistically significant features in machine-learning pipelines. The Hypothesis Testing Toolkit on this page covers the standard one- and two-tailed cases in one shot.

Related Calculators

More statistics tools that pair with hypothesis-testing work.

P-Value Calculator

Two-tailed p-value vs Z

Critical Z by confidence level

Z-score reference table

What Is a P-Value?

How P-Values Work

Standardise to Z

Read the tail area

Compare against α

Convert to confidence

6 Ways to Use This P-Value Calculator

Best Practices for P-Value Reporting

Why P-Values Matter

Decision-making under uncertainty

Replication and trust

Regulatory science

Quantitative finance

Where P-Values Get Tricky

Multiple comparisons

Very large samples

Discrete or skewed data

One-tail vs two-tail

Core P-Value Formulas

Common P-Value Mistakes

Where P-Values Are Used

Academic research

Medicine and clinical trials

A/B testing and growth

Data science and ML

Marketing analytics

Engineering and QA

Physics and astronomy

Education research

Public-opinion research

Sports analytics

Common Z-Score → P-Value Conversions

Methodology you can verify

Frequently Asked Questions

What is a p-value?

What is a z-score?

What does p < 0.05 mean?

What is statistical significance?

What is a two-tailed test?

What is a one-tailed test?

How do I calculate a p-value?

What is a confidence level?

What is a normal distribution?

Where are p-values used?

Related Calculators