The Sphere Game in n Dimensions
Document first posted 3/05/2002
Document last revised 5/15/2006
| Al Lehnen |
Gary E Wesenberg
|
| Mathematics Department |
Department of Biochemistry
|
|
Madison Area Technical College
|
University of Wisconsin
|
|
3550 Anderson Street
|
6606B Biochemistry
|
|
Madison, WI 53704
|
433 Babcock Drive
|
|
(608) 246-6567
|
Madison, WI 53706
|
|
|
(608) 263-5923
|
| my.execpc.com/~aplehnen/al |
|
|
|
|
|
Abstract:
This work reports a solution to the following problem:
"To within a fixed tolerance, what is the most probable straight line distance
between a fixed point and a second point picked at random from the surface
of a sphere?". The surprising result in two and three dimensions are values
near the diameter! In more than three dimensions, this most probable distance
is less than the diameter and begins to approach the more intuitive "equatorial
value", the square root of two times the sphere's radius. The analysis
presented solves the problem for a sphere in any dimension greater than
one. In particular, as the dimension gets large the probability distribution
approaches a normal distribution with a mean of the equatorial value and
a standard deviation which goes to zero like the reciprocal of the square
root of the dimension.
A Summary of the Results
Imagine fixing a point
on the surface of an n-dimensional sphere (it is assumed that n
>
1, the surface is a "hypersphere"
if n > 3) of
radius a and then randomly picking a second point on the same surface.
Let L designate the Euclidean distance
between these two points.
What is the nature of the probability distribution of
L ? A related
problem, the distribution of the distance between two random points picked
within
the
hypersphere seems to have been first proposed by Robert Deltheil
and can be found in his 1926 text,
Probabilities Geometriques.Traite
du Calcul des Probabilities et de ses Applications: Tome II, Fasciscule
IIe. This problem was revisited in greater detail by J. M. Hammersley
in 1950. In 1954 R. D. Lord presented an elegant solution using a method
of G. N. Watson which employed Bessel functions. Lord's analysis generalized
Hammersley's results and also gave the explicit solution if the points
are constrained to lie
on the hypersphere's surface .
The purpose of this report is to summarize our solution of the
problem, which though not as general as Lord's, might be more accessible
due to its use of polyspherical coordinates and the resulting elementary
methods employed. The principal results
are stated below.
1.The probability distribution of L is
,
where the coefficients c(n) are given as follows:
For even n,
For odd n,
Both these results can
be represented by the single formula:
Asymptotically for large
n,

2.
The distribution is linear for a three-sphere.
3.
In two and three dimensions values of L close to the diameter are
most likely.
4.
In three or more dimensions the mode of the
L distribution is given
by

.
5.
For all values of
n the median value of
L is the "equatorial
value"

.
6.
The average value of L, < L > is given by the following
formula:
This
can be also evaluated by the following sums:
For even
n,

For odd
n ,

7.
The second moment of L, < L2> is 2a2.
8.
Asymptotically for large
n,
.
.
9. Asymptotically for large
n, the probability distribution for
L
approaches a normal distribution with a mean of

and
a standard deviation of

.
Graphs of the distribution for different values of n are shown
in Figures 1 and 2. In the formula for the probability density function
of L the c(n) are normalization factors. From their
representation as a ratio of Gamma functions, the c(n) exist
for any n > 1. Thus, for L in the domain (0, 2a) the
probability distribution can be extended to any real "dimension" greater
than one. Different views of a three dimensional perspective plot of the
probability density as a function of L and n are shown in
figures 3 through 6 .
| Figure 1 |
Figure 2 |
 |
 |
| Figure 3 |
Figure 4 |
 |
 |
| Figure 5 |
Figure 6 |
 |
 |
Details
The n = 2 Case: A Circle
Surprisingly, the most
singular case is for two dimensions. Let the position of the first point
on the circle have coordinates
(-a
, 0 ). Using polar coordinates the length, L, to the second point
picked at random anywhere on the circumference of
the circle is a function of the angle theta. Explicitly, we have.
.
However, theta is not a function of L since there are two possible
angles for each non-zero L value less than the diameter. Imagine
generating L values by letting theta range
from
to 0 . This is illustrated in Figure 7. The probability that a value
for the length would occur between L and L+dL would then
be given by the following:
.
The factor of two in
the numerator is a consequence of the two different angles for each value
of L. The minus sign is
needed because the change in theta is negative for an increase in L.
Figure 7
This probability can be expressed in terms of L
by using the chain rule.
.
This is entirely equivalent to using the fundamental transformation equation
for a probability distribution [
2, p. 137,
9 pp. 265-268,
14
p. 186]
For theta in the range from
to
0 and from elementary trigonometry it follows that
.
Substituting this result
gives the probability as an explicit function of
L,

.
Although the probability
density function is singular at an
L value equal to the circle's
diameter, the function is still in
L
1(0, 2
a) with an integrated probability of one.
Surprisingly, the probability density increases with increasing length,
so that
the most likely values of
L are near the diameter of the circle.
Stated more precisely, for any non-zero tolerance
smaller than
a, the probability that
L lies between

and

is a maximum for
Q equal to

.
Bertrand's Paradox
Inscribe an equilateral triangle of side
S
in the circle as shown in Figure 8. It then follows that

.
Let
O be the origin
and from (-
a , 0 ) generate a chord of length
L by choosing
a second point at random from the circumference of the circle.
Figure 8
The probability that
L
would exceed S is given by
.
This is
one of
the well-known "solutions" to
Bertrand's
Paradox [
14, pp.161-162,
3]. The probability of a "random"
chord is ambiguous, unless the method for generating it is specified. In
this analysis the length
L is generated by randomly selecting an
element of "area" on the surface of the hypersphere. One could imagine
that after marking the first point, the hypersphere is placed in an
n-dimensional
box which is then "shaken". When the hypersphere stops moving, the Euclidean
distance between the marked point and the point of tangency of the hypersphere
to the hyperplane of the floor of the box is measured.
The n > 2 Case: A Hypersphere in n
Dimensions:
To treat the problem
in n dimensions we will use the following generalization of spherical
coordinates.
with the domain
.
From these definitions it follows that
The element of volume in these coordinates requires
calculating the following Jacobian.

Using an induction argument it
is shown in Appendix A that
.
Integrating this expression
over the domains of the variables yields the
well-known
formula for the "volume" of a hypersphere
of radius
a in
n dimensions [
10,
11, p.
594,
13,
16]
The element of surface area, dSn,
of a hypersphere of radius a is given by the formula:
The total surface area is easily computed from the
formula for the volume.
Orient the coordinate
system so that the first "marked" point has coordinates xn
= -a, and xj = 0 for j < n .
The Euclidean distance
L
between
this point and a second point on the hypersphere is then given by
But this simplifies to the same result as on the
circle!
,
with the difference that since
is
now
a function of
L. Just as in two dimensions,
for
dL to be positive,

must be negative. Also, as before,
The probability that
the distance from the first point to the second point is between
L
and
L+
dL is the probability that the
second point occupies that part of the surface area,
dA , with

in
the interval

.
This is just the ratio of
dA to
the total surface area of the hypersphere. In this ratio the integrals
of all of the angle variables except

cancel,
so the
probability reduces to the following:
.
Thus,
an explicit expression is obtained for the probability density function
of L for any value of n.
.
In a definitive analysis published in 1954 R. D. Lord derived this same
result [
13]. In passing, we note that for a sphere in three dimensions
the factor in brackets "magically" disappears and the probability distribution
is linear. This has the "non-intuitive" result that in three dimensions,
like in two, within a given tolerance the most probable value of
L is
close to the diameter. The results for a sphere in three dimensions seem
were first brought to our attention in a study by Christopher and
Baldwin related to protein structures [
8]. However, other researchers
[
4,
5,
6,
7] had noted them even earlier. The
special case of three dimensions was also analyzed by us in detail in a
previous publication [
12].
The coefficients c(n) are given as follows:
From this recursion, one obtains the following:
For even n,
For odd n,
For
n > 3, setting
the derivative

equal
to zero gives the result that the mode of
L is at

.
This formula also gives the correct answer (the diameter) when
n=
3. However, beyond three dimensions,
it is no longer true
that the diameter is the most probable value for L. As n
gets large the most probable value of L
approaches the "intuitive"
answer of the distance to a point along the "equator",

.
It is "obvious" from symmetry
and
L's dependence
on

that
this equatorial value is the Median value of
L for any
n
> 1 . This can be demonstrated
explicitly by making
the substitution

in the integral of the probability distribution.
Calculating the first moment of the probability
distribution of L gives the mean. This integral is a multiple of
a Beta function which by a well known identity can be expressed as a ratio
of Gamma functions.[1, p. 258, 15, pp. 253-255]

The second moment equals the median value squared
for every value of n.
The results for n = 2 through n = 10 are summarized
in Table 1.
Table 1
| n |
Vn |
Sn |
Mode of L |
< L > |
| 2 |
 |
 |
 |
 |
| 3 |
 |
 |
 |
 |
| 4 |
 |
 |
 |
 |
| 5 |
 |
 |
 |
 |
| 6 |
 |
 |
 |
 |
| 7 |
 |
 |
 |
 |
| 8 |
 |
 |
 |
 |
| 9 |
 |
 |
 |
 |
| 10 |
 |
 |
 |
 |
An Asymptotic Analysis of the L Distribution for Large n
The well known asymptotic
formula for the gamma function is [1, p. 257, 15, p.
253, 17]

This equation provides a convenient starting point for a systematic analysis
of the probability distribution of
L for large
n. For

,

and
.
For

,

A long, but straightforward calculation that uses the binomial series and
the Maclaurin series for ln(1+
x) and exp(
x) results in the
following expansion.
.
From which it follows that
.
Since

,
an asymptotic formula for
c(
n) is obtained.

.
An asymptotic expansion for the mean follows from
.

The variance of
L is computed as

. This has the following asymptotic form.

The accuracy of these asymptotic expressions is demonstrated to ten decimal
places in Table 2.
Table 2
| n |
 |
 |
 |
 |
| 10 |
1.3947761299 |
1.3947750312 |
0.0545995476 |
0.0546093750 |
| 15 |
1.4016613571 |
1.4016611894 |
0.0353454399 |
0.0353472222 |
| 20 |
1.4049475047 |
1.4049474597 |
0.0261225089 |
0.0261230469 |
| 25 |
1.4068707168 |
1.4068707004 |
0.0207147861 |
0.0207150000 |
| 50 |
1.4106109093 |
1.4106109086 |
0.0101768625 |
0.0101768750 |
| 75 |
1.4118268316 |
1.4118268315 |
0.0067449976 |
0.0067450000 |
| 100 |
1.4124291190 |
1.4124291190 |
0.0050439836 |
0.0050439844 |
As n increases the distribution of L narrows and becomes
more symmetric with the mode converging to the median from above and the
mean converging to the median from below. For large n the probability
density function of L,
,
rapidly approaches zero for
L <
a and
L near 2
a.
Most of the probability is therefore concentrated near the mode of
L,
which for large
n approaches the median value. To investigate the
large
n behavior of the probability density function in the neighborhood
of the median, we define a new variable
z as
.
The fundamental transformation equation for a probability distribution
then implies the following result.
.
From the Maclaurin series of the logarithm and the asymptotic form of
c(n),
this results in the following expression.
.
So,
.
Thus, for large n the probability distribution of L approaches
a Normal Distribution with a mean of
and
a standard deviation
.
This result seems to have been first established for two points chosen
at random within the hypersphere by Hammersley [10] and then
presented in a generalized form by Lord which included two points chosen
at random on the hypersphere[13].
Conclusion:
Intuitively the median value,
,
would seem to be the obvious answer to the question, "What is the most
probable straight line distance between a fixed point and a second point
picked at random from the surface of a sphere?". The rather humorous, if
not surprising, conclusion of this analysis is that it is only in the limit
of large dimensions that our intuition is realized!
Not only that, but for large dimensions,
is "almost the only" answer to "What is the straight line distance?".
Acknowledgements:
The reference to the
work of R. D. Lord were kindly communicated to us by Mario N. Berberan-Santos
of the
Centro de Quimica-Fisica
Molecular, Instituto Superior Tecnico in Lisbon, Portugal. More than a
decade before our work on the problem he derived the same results only
to discover that R. D. Lord had published a solution One of us (A. Lehnen)
would like to thank a former student, Elizabeth Nack, whose solution of
the volume of a hypersphere as a bonus problem in my Calculus III class
inspired me to extend the three dimensional case to
n dimensions.
We would also like to thank Professor David Griffeath of the University
of Wisconsin-Madison Mathematics Department for helpful suggestions and
Richard Parris, a teacher at Phillips Exeter Academy in Exeter, New Hampshire
for making his wonderful WinPlot program freely available. This graphing
utility was used to generate all the figures displayed in this article.
The newest version of WinPlot can be downloaded from Parris's web site
at
http://math.exeter.edu/rparris/winplot.html
.
References:
1. M. Abramowitz and I. Stegun, Handbook of Mathematical Functions,
Dover, 1964.
2. T. Apostol, Calculus
Vol II, Blaisdell Publishing, 1962.
4. M. Berberan-Santos and
M.J.E. Prieto, "Energy transfer in spherical geometry. Application to micelles",
Journal
of the Chemical Society, Faraday Transactions II 83, 1391, (1987).
5. M.N. Berberan-Santos, M.J.E. Prieto, A.G. Szabo, "Picosecond electronic
energy-transfer studies in sodium dodecyl sulfate micelles", Journal
of the Chemical Society, Faraday Transactions 88 255, (1992).
6. M. Berberan-Santos, "On the distribution of the nearest neighbor",
American
Journal of Physics 54, 1139 (1986).
7. M. Berberan-Santos, "Distribution of neighbors other than the nearest",
American
Journal of Physics 55, 952 (1987).
8. J. A. Christopher, T. O. Baldwin, Implications of N- and C-Terminal
Proximity for Protein Folding, Journal of Molecular Biology 257,
175-187, (1996).
9. J. Freund, Mathematical
Statistics, Fifth Edition, Prentice-Hall, 1992.
10. J.M. Hammersley, "The Distribution of Distance in a Hypersphere",
Ann.
Math. Stat. 21, 447-452, (1950).
11. S. Hassani, Mathematical
Physics: A Modern Introduction to its Foundations, Springer, 1999.
12. Lehnen A. and Wesenberg
G., "The Sphere Game", The AMATYC Review 25, 25 (2003).
13. R.D. Lord, "The Distribution
of Distance in a Hypersphere", Ann. Math. Stat. 25, 794-798, (1954).
14. S. Ross, A First Course in Probability, Third Edition, Macmillan,
1988.
15. E. T. Whittaker and
G. N. Watson, A Course of Modern Analysis, Cambridge University
Press, 1969.