Testing a Correlation

4.4.1 Testing a Correlation

When analysing two associated sample variables, one is often interested in knowing whether the sample provides enough evidence that the respective random variables are correlated. For instance, in data classification, when two variables are

4.4 Inference on Two Populations 127

correlated and their correlation is high, one may contemplate the possibility of discarding one of the variables, since a highly correlated variable only conveys redundant information.

Let ρ represent the true value of the Pearson correlation mentioned in section

2.3.4. The correlation test is formalised as:

H 0 : ρ = 0, H 1 : ρ ≠ 0, for a two-sided test.

For a one-sided test the alternative hypothesis is:

H 1 : ρ > 0 or ρ < 0.

Let r represent the sample Pearson correlation when the null hypothesis is verified and the sample size is n. Furthermore, assume that the random variables are normally distributed. Then, the (r.v. corresponding to the) following test statistic:

4.6 1 r

has a Student’s t distribution with n – 2 degrees of freedom.

The Pearson correlation test can be performed as part of the computation of correlations with SPSS and STATISTICA. It can also be performed using the Correlation Test sheet of Tools.xls (see Appendix F) or the Probability Calculator; Correlations of STATISTICA (see also Commands 4.2).

Example 4.5

Q: Consider the variables PMax and T80 of the meteorological dataset ( Meteo) for the “moderate” category of precipitation (PClass = 2) as defined in 2.1.2. We then have n = 16 measurements of the maximum precipitation and the maximum temperature during 1980, respectively. Is there evidence, at α = 0.05, of a negative correlation between these two variables?

A: The distributions of PMax and T80 for “moderate” precipitation are reasonably well approximated by the normal distribution (see section 5.1). The sample correlation is r = –0.53. Thus, the test statistic is:

r = –0.53, n = 16 ⇒ t * = –2.33.

14 , 0 . 05 =− 1. 76 , the value of t falls in the critical region ] – ∞, –1.76]; therefore, the null hypothesis is rejected, i.e., there is evidence of a negative correlation between PMax and T80 at that level of significance. Note that the

Since * t

observed significance of t * is 0.0176, below α. ฀

4 Parametric Tests of Hypotheses

Commands 4.2. SPSS, STATISTICA, MATLAB and R commands used to perform the correlation test.

SPSS

Analyze; Correlate; Bivariate Statistics; Basic Statistics and Tables;

STATISTICA Correlation Matrices Probability Calculator; Correlations

MATLAB [r,t,tcrit] = corrtest(x,y,alpha)

R cor.test(x, y, conf.level = 0.95, ...)

As mentioned above the Pearson correlation test can be performed as part of the computation of correlations with SPSS and STATISTICA. Also with the Correlations option of STATISTICA Probability Calculator.

MATLAB does not have a correlation test function. We do provide, however, a function for that purpose, corrtest (see Appendix F). Assuming that we have available the vector columns pmax, t80 and pclass as described in 2.1.2.3, Example 4.5 would be solved as:

>>[r,t,tcrit]=corrtest(pmax(pclass==2),t80(pclass==2) ,0.05) r= -0.5281 t= -2.3268 tcrit = -1.7613

The correlation test can be performed in R with the function cor.test. In Commands 4.2 we only show the main arguments of this function. As usual, by default conf.level=0.95. Example 4.5 would be solved as:

> cor.test(T80[Pclass==2],Pmax[Pclass==2]) Pearson’s product-moment correlation data: T80[Pclass == 2] and Pmax[Pclass == 2] t = -2.3268, df = 14, p-value = 0.0355 alternative hypothesis: true correlation is not equal

to 0

95 percent confidence interval: -0.81138702 -0.04385491 sample estimates: cor -0.5280802

4.4 Inference on Two Populations

As a final comment, we draw the reader’s attention to the fact that correlation is by no means synonymous with causality. As a matter of fact, when two variables

X and Y are correlated, one of the following situations can happen:

– One of the variables is the cause and the other is the effect. For instance, if

X = “nr of forest f ires per year” and Y = “area of burnt forest per year”, then one usually finds that X is correlated with Y, since Y is the effect of X

– Both variables have an indirect cause. For instance, if X = “% of persons daily arriving at a Hospital with yellow-tainted fingers” and Y = “% of persons daily arriving at the same Hospital with pulmonary carcinoma”, one finds that X is correlated with Y, but neither is cause or effect. Instead, there is another variable that is the cause of both − volume of inhaled tobacco smoke.

– The correlation is fortuitous and there is no causal link. For instance, one may eventually find a correlation between X = “% of persons with blue eyes per household ” and Y = “% of persons preferring radio to TV per household ”. It would, however, be meaningless to infer causality between the two variables.

Testing a Correlation

Parts

Dokumen yang terkait

Semlit 3 – Implications for accounting, accountability and performance – Alifia

Semlit 8 – Governance, transparency and accountability An international comparison – Antonius Niha

Semlit 9 – The translation of accrual accounting and budgeting and the reconfiguration – Bungsu M

Sources of power and infrastructural conditions in medieval governmental accounting

SEMLIT UAS – 1 Adriana – Accountability and the accounting regime in the public sector

SEMLIT UAS – 10 Bungsu M – Earning Management and Stock Return

Gene Therapy Using Adeno-Associated Virus Vectors

10.1128CMR.00008-08. 2008, 21(4):583. DOI: Clin. Microbiol. Rev. Shyam Daya and Kenneth I. Berns - Clin. Microbiol. Rev. 2008 Daya 583 93

Oil Price History and Analysis

FILE Computational Statistics Handbook With MATLAB

Dukungan

Links

Testing a Correlation

Parts

Dokumen yang terkait

Semlit 3 – Implications for accounting, accountability and performance – Alifia

Semlit 8 – Governance, transparency and accountability An international comparison – Antonius Niha

Semlit 9 – The translation of accrual accounting and budgeting and the reconfiguration – Bungsu M

Sources of power and infrastructural conditions in medieval governmental accounting

SEMLIT UAS – 1 Adriana – Accountability and the accounting regime in the public sector

SEMLIT UAS – 10 Bungsu M – Earning Management and Stock Return

Gene Therapy Using Adeno-Associated Virus Vectors

10.1128CMR.00008-08. 2008, 21(4):583. DOI: Clin. Microbiol. Rev. Shyam Daya and Kenneth I. Berns - Clin. Microbiol. Rev. 2008 Daya 583 93

Oil Price History and Analysis

FILE Computational Statistics Handbook With MATLAB

Dokumen yang Anda mencari sudah siap untuk unduhkan