Further Considerations on the Correlations of Stellar Characters. By Winifred Gibson, B.Sc., formerly Jessel Scholar, University College, London, and Karl Pearson, F.R.S., University College, London. (With Six Diagrams.) (1) Introductory.—In a paper communicated to the Society last year (Monthly Notices, vol. lxvi. p. 445), modern statistical methods were used for the first time to determine the numerical relationships between various star characters. The object of the present paper is to deduce further similar relationships, and to deal with some of the same relationships on the basis of wider data. The general characters with which we have to deal, and which are more or less accurately known for larger or smaller stellar populations, are (1) magnitude, (2) colour, (3) spectral class, (4) proper motion, (5) parallax, (6) position. In any attempt to look upon the stellar universe as an orderly whole, the relationships between these characters must be of fundamental importance. To determine their numerical values is the first stage by which we pass from chaos to an organised and locally differentiated cosmos. The aid which the statistician may venture to offer the trained astronomer in this respect may, perhaps, be illustrated by reference to some recent work. Since the publication of the first paper above referred to, two memoirs, both of considerable importance, have appeared. The first is that of Messrs. Chase, Smith, and Elkin ("Parallax Investigations on 163 Stars, mainly of large Proper Motion," Transactions, Yale Univ. Observatory, vol. ii. pp. 1-207). This memoir deals with the relationship of proper motion, magnitude, parallax, and spectral class, among much else of great value, but not bearing on the topics we have at present in hand. A second memoir of less scope but of considerable interest is that of Mr. W. S. Franks ("The Relation between Star Colours and Spectra," Monthly Notices, vol. lxvii. pp. 539-42). Now, these memoirs more than suffice to show that the distribution of star characters is not one of mere random association. The characters occur in a correlated manner, and this in itself is suggestive of the cosmos being a differentiated organisation. Even if we at once admit that we might anticipate that parallax would be related to proper motion, or even to magnitude, or, again, that colour and spectral group would be found in association, it is less obvious that spectral class will be found related to magnitude, proper motion, and parallax. Yet, even when we see these relationships indicated in the above and other memoirs, there appears to be something lacking, which it is, perhaps, possible for modern statistical methods to supply. It would not be possible from the above type of classificatory work to determine the intensity of the relationship between the characters under consideration. For example: Is parallax more closely associated with magnitude or with spectral class? Or, again: Is magnitude more closely related to distance than to chemical composition? To say that the latter is nearly four times as influential as the former is to crystallise at once our general ideas on magnitude. Accordingly, it seems possible that modern statistical methods may be of some aid in determining the intensity of relationship between various stellar characters; in appreciating what, to adopt a term from biometry, may be spoken of as the organic correlations of the population. It is not suggested that, any more than in the science referred to, the individual must be lost sight of in a cloud of average relationships. But a knowledge of the extent to which stellar characters are correlated may, if properly used, be helpful in indicating the directions of profitable further analysis. (2) Determination of Correlation.—It may not be out of place here to give a brief summary of the constants by aid of which correlation is determined in modern statistical practice. This is the more important as it is desirable to indicate the limits of their proper application in the case of stellar characters. The methods in use are threefold :— (i) Coefficient of Correlation, usually represented by r. This can only be used effectively if both characters are quantitative and actually measured. Let A and B be the two characters, m, and my their mean values, m1+x, m2+y, the characters in any pair of individuals, σ1 and σ2 the standard deviations (square roots of mean square deviations of either group of characters); then if n be the size of the sample taken of the population, 712=S(xy)/(no ̧σ1⁄2). I and +1, v possesses the following properties: it lies between according to the intensity of the relationship; 71901/a, is the slope. of the best fitting straight line of the average values of A for a given value of B; and r/o, is the slope of the corresponding linefor average values of B for given values of A. σ, 1-7122 is the average standard deviation of arrays of A for given values of B, and σ1-12 is the average deviation from the straight line of arrays of B for given values of A. When the average values of one character for a given value of a second lie nearly on a straight line the correlation is said to be linear, and in this case the vanishing of 712 marks the absolute independence of the two characters.. 712 is always the same as 721 All this is completely independent of the nature of the frequency distribution. Provided the characters are quantitatively measurable and the correlation is approximately linear, the correlation coefficient r is pre-eminently the best suited to express the degree of interdependence of the two variables. For example, it was used in the first paper in determining the relationship between proper motion in R.A. and in declination. (ii) The Correlation Ratio, usually represented bv ŋ. If the curve of mean values of A for given values of B depart widely from a straight line, then the vanishing of r does not necessarily mean that the characters A and B are unrelated. It would only signify that the best fitting straight line was horizontal. In order to cover this case the correlation ration has been introduced. Let 1 be the standard deviation of the means of arrays of A for given values of B, each array mean being weighted with its total frequency; then, 712 = Σι/σι (ii) ʼn possesses the following characters: it lies between 0 and 1, being always zero if the characters are independent, and unity if they are absolutely related or causal. n always lies between and I and is equal to r when the correlation is linear. σ1√√1 – 7122 is the mean standard deviation of arrays of A for a given B. σ1 √7122-122 is the mean square deviation of the curve of means from its best fitting straight line. 712 is not necessarily equal to 721 When the correlation is approximately linear, we have 122112 very closely, and this relationship holds for a very wide range of physical and organic variables. On 7, the correlation ratio, will clearly be of service when one variable, say B, is not quantitative, but classificatory, because in this case we can determine 1 and σ, although we have no quantitative measures of B. We ought, however, to have fairly fine groupings of B. Such cases are those of stellar spectra and stellar colour. the other hand, 712 and 721 may be found in order to measure the degree of divergence of the correlation from linearity. As illustration of this, we may consider magnitude. If we replace it by amount of light, this might possibly give a true quantitative scale; but not only does the reduction involve some doubt, but it introduces extremely laborious calculations compared to the simplicity of magnitude proper. Accordingly the method, classifying according to magnitude, seems a suitable method of approaching the problem. (iii) Coefficient of mean square Contingency, usually represented by C1. Let the arrangement of any table of two variables-e.g. Mr. Franks' tables of colour and spectral class-be purely classificatory. Let the frequency of any class a of A in the population N be n and of the class b of B ben; let the frequency of individuals combining both classes be nab, then is termed the mean square contingency, and it clearly vanishes if the distribution of the two characters be independent. C1 = √2/(1+62) is termed the coefficient of mean square contingency, and it measures the deviation of the two characters from independence. It approaches unity if the characters are causal, and is zero if they are independent. The whole of the above brief résumé of the usual statistical methods of dealing with correlation is independent of any assump tion as to the distributions of frequency following the normal or Gaussian law of variation. Should they do so, however, we have or all our three methods of determining the intensity of correlation merge theoretically into a single value. We say theoretically, because the truth of (iv) depends upon our replacing summation by integrals, or it is the limit with sufficiently fine grouping. By actually testing (iv) on fairly Gaussian material it will be found that even moderately coarse classifications give us quite close results. Deviations from (iv) arise, not wholly, but chiefly through the characters dealt with being non-Gaussian in distribution of variability. We venture to think that possibly astronomers have been too ready to assume that all types of variability follow the Gaussian law of distribution, and that the assumption that star characters follow it requires statistical justification. It is, perhaps, rather dangerous to start with the doctrine that they must, and then deduce rather sweeping conclusions from the fact that they do not. In the present investigation the correlation coefficient, the correlation ratio, and the coefficient of mean square contingency will be used according to the nature of the statistics with which we have to deal. The probable error of r is calculated from and of n from η These values are p.e of r='67449(1 − 12)/√Ñ p.e of n='67449(1 − n2)/ √Ñ. not the absolutely correct values, but are sufficiently close for most practical purposes." * The probable error of C, is troublesome to calculate, † but if C1 be two to three times greater than 67449/N1, this being the maximum value of the probable error, i.e. that when C1 = o, it will undoubtedly be significant. (3) Correlations with Stellar Colour.—(i) Colour and Magnitude. In the earlier memoir stellar colour and magnitude were correlated, using for this purpose the catalogue of star colours contained in vol. ix. of the Annals of the Cape Observatory. The stars included in that list range in magnitude from 4 to 10, being 4 to 9 for visual and 6 to 10 for photographic magnitude. The colours ranged from yellow to red, and contained no blue or green or white. In order to include green and blue stars we used, at the suggestion of Mr. T. W. Backhouse, the star catalogue in vol. xiv. of the Harvard Annals. That list includes stars of photographic magnitude 1'5 to 7, and contains about 3.6 per cent. of blue and green stars. A table of contingency for magnitude and colour in * For probable errors of r and 7, see Pearson, "On the General Theory of Skew Correlation," pp. 19 and 20, Drapers' Research Memoirs, Dulau & Co. + Blakeman and Pearson, "On the Probable Error of Mean Square Contingency," Biometrika, vol. v. pp. 191 et seq. the case of 2834 stars is given below (Table I.). In grouping for contingency we reduced this to a 6 × 9-fold table, but the means were calculated on the basis of Table I. The results reached were as follows: Thus while the two groups of stars differ very widely in mean character, we see that the dependence of one character on the other is essentially the same in the two samples. The reduction of the TABLE I. Contingency Table for Colour and Photographic Magnitude.* Colour. Magnitude. - 1'5 to -1 - 1 to -'5 - '5 to o o to 5 '5 to 1 Blue. Green. White. Yellow. Orange. Red. Totals. * In the magnitude class, the group includes the first and excludes the second value given. Thus all stars being enumerated to one decimal in magnitude, the group 4 to 4'5 contains 40, 41, 42, 4'3 and 4'4 stars or centres at 4.2. The blank lines indicate the magnitude groups actually used in deducing the contingency coefficient from a 6 x 9 fold table. |