The Sphere Game in n Dimensions

Document first posted 3/05/2002
Document last revised 5/15/2006
 
 


Al Lehnen
Gary E Wesenberg
Mathematics Department
Department of Biochemistry
Madison Area Technical College 
University of Wisconsin
3550 Anderson Street
6606B Biochemistry
Madison, WI 53704
433 Babcock Drive
(608) 246-6567
Madison, WI 53706
(608) 263-5923
my.execpc.com/~aplehnen/al

Abstract:

This work reports a solution to the following problem: "To within a fixed tolerance, what is the most probable straight line distance between a fixed point and a second point picked at random from the surface of a sphere?". The surprising result in two and three dimensions are values near the diameter! In more than three dimensions, this most probable distance is less than the diameter and begins to approach the more intuitive "equatorial value",  the square root of two times the sphere's radius. The analysis presented solves the problem for a sphere in any dimension greater than one. In particular, as the dimension gets large the probability distribution approaches a normal distribution with a mean of the equatorial value and a standard deviation which goes to zero like the reciprocal of the square root of the dimension.
 
 

A Summary of the Results



Imagine fixing a point on the surface of an n-dimensional sphere (it is assumed that n > 1, the surface is a "hypersphere"

if n > 3) of radius a and then randomly picking a second point on the same surface. Let L designate the Euclidean distance
between these two points. What is the nature of the probability distribution of L ? A related problem, the distribution of the distance between two random points picked within the hypersphere seems to have been first proposed by Robert Deltheil
and can be found in his 1926 text, Probabilities Geometriques.Traite du Calcul des Probabilities et de ses Applications: Tome II, Fasciscule IIe. This problem was revisited in greater detail by J. M. Hammersley in 1950. In 1954 R. D. Lord presented an elegant solution using a method of G. N. Watson which employed Bessel functions. Lord's analysis generalized Hammersley's results and also gave the explicit solution if the points are constrained to lie on the hypersphere's surface .

The purpose of  this report is to summarize our solution of the problem, which though not as general as Lord's, might be more accessible due to its use of polyspherical coordinates and the resulting elementary methods employed. The principal results
are stated below.
1.The probability distribution of L is , where the coefficients c(n) are given as follows:

For even n,                            For odd n,

Both these results can be represented by the single formula: 

Asymptotically for large n
2. The distribution is linear for a three-sphere.
3. In two and three dimensions values of L close to the diameter are most likely.
4. In three or more dimensions the mode of the L distribution is given by.
5. For all values of n the median value of L is the "equatorial value" .
6. The average value of L, < L >  is given by the following formula:
This can be also evaluated by the following sums:
For even n
For odd ,
7. The second moment of L,  < L2> is 2a2.
8. Asymptotically for large n,
.

.

9. Asymptotically for large n, the probability distribution for L approaches a normal distribution with a mean of 
and a standard deviation of .
 
Graphs of the distribution for different values of n are shown in Figures 1 and 2. In the formula for the probability density function of L the c(n) are normalization factors. From their representation as a ratio of Gamma functions, the c(n) exist for any n > 1. Thus, for L in the domain (0, 2a) the probability distribution can be extended to any real "dimension" greater than one. Different views of a three dimensional perspective plot of the probability density as a function of L and n are shown in figures 3 through 6 .

 
Figure 1 Figure 2
Figure 3 Figure 4
Figure 5 Figure 6

 
 

Details

The n = 2 Case: A Circle
Surprisingly, the most singular case is for two dimensions. Let the position of the first point on the circle have coordinates
(-a , 0 ). Using polar coordinates the length, L, to the second point picked at random anywhere on the circumference of
the circle is a function of the angle theta. Explicitly, we have.
.

However, theta is not a function of L since there are two possible angles for each non-zero L value less than the diameter. Imagine generating L values by letting theta range from to 0 . This is illustrated in Figure 7. The probability that a value
for the length would occur between L and L+dL would then be given by the following:

.
The factor of two in the numerator is a consequence of the two different angles for each value of L. The minus sign is
needed because the change in theta is negative for an increase in L.
Figure 7

This probability can be expressed in terms of L by using the chain rule.

.

This is entirely equivalent to using the fundamental transformation equation for a probability distribution [2, p. 137, 9 pp. 265-268, 14 p. 186]

For theta in the range from to 0 and from elementary trigonometry it follows that

.
Substituting this result gives the probability as an explicit function of L.
Although the probability density function is singular at an L value equal to the circle's diameter, the function is still in
L1(0, 2a) with an integrated probability of one. Surprisingly, the probability density increases with increasing length, so that
the most likely values of L are near the diameter of the circle. Stated more precisely, for any non-zero tolerance 
smaller than a, the probability that L lies betweenand is a maximum for Q equal to.

Bertrand's Paradox

Inscribe an equilateral triangle of side S in the circle as shown in Figure 8. It then follows that . Let O be the origin
and from (-a , 0 ) generate a chord of length L by choosing a second point at random from the circumference of the circle.
Figure 8

The probability that L would exceed S is given by .

This is one of the well-known "solutions" to Bertrand's Paradox [14, pp.161-162, 3]. The probability of a "random" chord is ambiguous, unless the method for generating it is specified. In this analysis the length L is generated by randomly selecting an element of "area" on the surface of the hypersphere. One could imagine that after marking the first point, the hypersphere is placed in an n-dimensional box which is then "shaken". When the hypersphere stops moving, the Euclidean distance between the marked point and the point of tangency of the hypersphere to the hyperplane of the floor of the box is measured.
 
The n > 2 Case: A Hypersphere in n Dimensions:
To treat the problem in n dimensions we will use the following generalization of spherical coordinates.

with the domain. From these definitions it follows that

The element of volume in these coordinates requires calculating the following Jacobian.

Using an induction argument it is shown in Appendix A that

.

Integrating this expression over the domains of the variables yields the well-known formula for the "volume" of a hypersphere
of radius a in n dimensions [10, 11, p. 594, 13, 16]
The element of surface area, dSn, of a hypersphere of radius a is given by the formula:

The total surface area is easily computed from the formula for the volume.

Orient the coordinate system so that the first "marked" point has coordinates xn = -a, and xj = 0 for j < n .

The Euclidean distance L between this point and a second point on the hypersphere is then given by

But this simplifies to the same result as on the circle! , with the difference that sinceis now

a function of L. Just as in two dimensions, for dL to be positive, must be negative. Also, as before,
The probability that the distance from the first point to the second point is between L and L+dL is the probability that the
second point occupies that part of the surface area, dA , within the interval . This is just the ratio of dA to
the total surface area of the hypersphere. In this ratio the integrals of all of the angle variables exceptcancel, so the
probability reduces to the following:
.

Thus, an explicit expression is obtained for the probability density function of L for any value of n.

.

In a definitive analysis published in 1954 R. D. Lord derived this same result [13]. In passing, we note that for a sphere in three dimensions the factor in brackets "magically" disappears and the probability distribution is linear. This has the "non-intuitive" result that in three dimensions, like in two, within a given tolerance the most probable value of L is close to the diameter. The results for a sphere in three dimensions seem were first brought to our attention  in a study by Christopher and Baldwin related to protein structures [8]. However, other researchers [4, 5, 6, 7] had noted them even earlier. The special case of three dimensions was also analyzed by us in detail in a previous publication [12].
 
The coefficients c(n) are given as follows:

From this recursion, one obtains the following:

For even n                                                For odd n

For n > 3, setting the derivative equal to zero gives the result that the mode of L is at
. This formula also gives the correct answer (the diameter) when n= 3. However, beyond three dimensions,
it is no longer true that the diameter is the most probable value for L. As n gets large the most probable value of L
approaches the "intuitive" answer of the distance to a point along the "equator", . It is "obvious" from symmetry
and L's dependence onthat this equatorial value is the Median value of L for any n > 1 . This can be demonstrated
explicitly by making the substitution  in the integral of the probability distribution.

Calculating the first moment of the probability distribution of L gives the mean. This integral is a multiple of a Beta function which by a well known identity can be expressed as a ratio of Gamma functions.[1, p. 258, 15, pp. 253-255]

The second moment equals the median value squared for every value of n.


The results for n = 2 through n = 10 are summarized in Table 1.

Table 1


n Vn Sn Mode of L < L >
2
3
4
5
6
7
8
9
10

 

An Asymptotic Analysis of the L Distribution for Large n

The well known asymptotic formula for the gamma function is [1, p. 257, 15, p. 253, 17]

This equation provides a convenient starting point for a systematic analysis of the probability distribution of L for large n. For ,
and
.
For ,
A long, but straightforward calculation that uses the binomial series and the Maclaurin series for ln(1+ x) and exp(x) results in the following expansion.

.

From which it follows that

.

Since , an asymptotic formula for c(n) is obtained.

.

An asymptotic expansion for the mean follows from .

The variance of L is computed as  . This has the following asymptotic form.

The accuracy of these asymptotic expressions is demonstrated to ten decimal places in Table 2.

Table 2


n
10 1.3947761299 1.3947750312 0.0545995476 0.0546093750
15 1.4016613571 1.4016611894 0.0353454399 0.0353472222
20 1.4049475047 1.4049474597  0.0261225089 0.0261230469
25 1.4068707168 1.4068707004 0.0207147861 0.0207150000
50 1.4106109093 1.4106109086 0.0101768625 0.0101768750
75 1.4118268316 1.4118268315 0.0067449976 0.0067450000
100 1.4124291190 1.4124291190 0.0050439836 0.0050439844

As n increases the distribution of L narrows and becomes more symmetric with the mode converging to the median from above and the mean converging to the median from below. For large n the probability density function of L,

,
rapidly approaches zero for L < a and L near 2a. Most of the probability is therefore concentrated near the mode of L, which for large n approaches the median value. To investigate the large n behavior of the probability density function in the neighborhood of the median, we define a new variable z as
.
The fundamental transformation equation for a probability distribution then implies the following result.


.

From the Maclaurin series of the logarithm and the asymptotic form of c(n), this results in the following expression.

.

So,  . Thus, for large n the probability distribution of L approaches a Normal Distribution with a mean of and a standard deviation . This result seems to have been first established for two points chosen at random within the hypersphere by Hammersley [10] and then presented in a generalized form by Lord which included two points chosen at random on the hypersphere[13].
 

Conclusion:

Intuitively the median value,, would seem to be the obvious answer to the question, "What is the most probable straight line distance between a fixed point and a second point picked at random from the surface of a sphere?". The rather humorous, if not surprising, conclusion of this analysis is that it is only in the limit of large dimensions that our intuition is realized!
Not only that, but for large dimensions,  is "almost the only" answer to "What is the straight line distance?".
 
 

Acknowledgements:

The reference to the work of R. D. Lord were kindly communicated to us by Mario N. Berberan-Santos of the

Centro de Quimica-Fisica Molecular, Instituto Superior Tecnico in Lisbon, Portugal. More than a decade before our work on the problem he derived the same results only to discover that R. D. Lord had published a solution One of us (A. Lehnen) would like to thank a former student, Elizabeth Nack, whose solution of the volume of a hypersphere as a bonus problem in my Calculus III class inspired me to extend the three dimensional case to n dimensions. We would also like to thank Professor David Griffeath of the University of Wisconsin-Madison Mathematics Department for helpful suggestions and Richard Parris, a teacher at Phillips Exeter Academy in Exeter, New Hampshire for making his wonderful WinPlot program freely available. This graphing utility was used to generate all the figures displayed in this article. The newest version of WinPlot can be downloaded from Parris's web site at http://math.exeter.edu/rparris/winplot.html .

 
 

References:

1. M. Abramowitz and I. Stegun, Handbook of Mathematical Functions, Dover, 1964.

2. T. Apostol, Calculus Vol II, Blaisdell Publishing, 1962.
3. A. Bogomolny, "Bertrand's Paradox." http://www.cut-the-knot.com/bertrand.html.
4. M. Berberan-Santos and M.J.E. Prieto, "Energy transfer in spherical geometry. Application to micelles", Journal of the Chemical Society, Faraday Transactions II 83, 1391, (1987).
5. M.N. Berberan-Santos, M.J.E. Prieto, A.G. Szabo, "Picosecond electronic energy-transfer studies in sodium dodecyl sulfate micelles", Journal of the Chemical Society, Faraday Transactions 88 255, (1992).
6. M. Berberan-Santos, "On the distribution of the nearest neighbor", American Journal of Physics 54, 1139 (1986).
7. M. Berberan-Santos, "Distribution of neighbors other than the nearest", American Journal of Physics 55, 952 (1987).
8. J. A. Christopher, T. O. Baldwin, Implications of N- and C-Terminal Proximity for Protein Folding, Journal of Molecular Biology 257, 175-187, (1996).
9. J. Freund, Mathematical Statistics, Fifth Edition, Prentice-Hall, 1992.
10. J.M. Hammersley, "The Distribution of Distance in a Hypersphere", Ann. Math. Stat. 21, 447-452, (1950).
11. S. Hassani, Mathematical Physics: A Modern Introduction to its Foundations, Springer, 1999.
12. Lehnen A. and Wesenberg G., "The Sphere Game", The AMATYC Review 25, 25 (2003).
13. R.D. Lord, "The Distribution of Distance in a Hypersphere", Ann. Math. Stat. 25, 794-798, (1954).
14. S. Ross, A First Course in Probability, Third Edition, Macmillan, 1988.
15. E. T. Whittaker and G. N. Watson, A Course of Modern Analysis, Cambridge University Press, 1969.
16. Wolfram Research: The hypersphere entry at the website http://mathworld.wolfram.com/Hypersphere.html
17. Wolfram Research: The Stirling's Series entry at the website http://mathworld.wolfram.com/StirlingsSeries.html