The Personal Distribution of Income
Josef
Steindl
AL00661964
1
THE PERSONAL DISTRIBUTION OF INCOME
When D.G.Champernowne showed how you can explain by means
of a stochastic process why the pattern of the Pareto
distribution is found with such great regularity in
various fields he very naturally chose as an example the
distribution of income,because that is the classical
case.It seems to me that the approach is more easily
applied to firm,towns,or wealth. The case of income is the
hardest,so that the great pioneering paper (Champernowne
1953) while fully demonstrating a new method has not
entirely disposed of the special problem to which it was
directed.
Champernowne's Model.
I shall give a simplified version of Champernowne's model
which will throw a new light on its relation to other
models of the Pareto law.
With Champernowne the income of a person is the state of
the system,and its evolution is described by a Markov
chain.The stochastic matrix of probabilities of income
transitions from one year to the next,in desperate
simplification, looks as follows:
2
income in
year t
income in year t+1
QV 1 2 3 4 5 6
0
1
2
3
4
5
q
q
q
q
q p
q p
p
p
p
p
A
We now re-interpret this matrix so that the states of the
system are not the incomes but the stages in a hierarchy
or the seniority,a kind of'age",which,however,as we shall
see later,is linked to the income.Let us assume only two
alternative possibilities of transition for a person in
this system:Either a rise from one stage in one year to
the next higher stage in the next year which has
probability p; or the death of the person,i.e. transition
to the zero class which has probability q (p+q =1).
In addition there are new entries from the zero class to
replenish the stock of persons. The entries are supposed
to compensate the exits of persons so that the stock
remains constant.
In this form the model is exactly the same as a renewal
process described by Feller in the following terms (Feller
1968 Vol I XV.3,p.382):The state Ek represents the age of
the system. When the system reaches age k it either
3
continues to age or it rejuvenates and starts afresh from
age zero.The successive passages through the zero state
represent a recurrent event. The probability that the
recurrence time equals k is p^ ■Lq.
We are interested in the question:How many years have
passed,or rather how many steps in the hierarchy have been
mounted,since the last rejuvenation? This is the "spent
waiting time" of a renewal process. Choosing an arbitrary
starting point we can say that in the year n the system
will be in state if and only if the last re-juvenation
occurred in year n-k. Letting n-k increase we obtain in
the limit the steady state probability of the "spent
waiting time" (Feller 1968 Chapter V ). It is
proportionate to the tail of the recurrence time
• • • • lc
distribution i.e. to p .
More directly the vector of steady state probabilities u^
can be derived from the following two conditions:
uk = P uk-l*
Uq = quQ + qux + qu2 +......... (1)
The first condition ensures the invariance of the steady
state;the second ensures that entries to and exits from
the population balance.
It follows that
uk = Pkuo*
4
uQ = 1 - p. P < 1. (2)
The result is,of course, identical with the distribution
of the spent waiting time derived above.
So far we have described the process without mentioning
income,and have identified the states with a kind of age
(seniority). We have now to define the income in relation
to the class intervals of the matrix. Note that income is
to be measured logarithmically. The lower limit of class 1
is to be taken as the minimum income. We may choose the
income units in such a way as to make the minimum income
equal to unity, i.e.on the logarithmic scale it will be
zero. The income yk at the lower limit of successive
income classes k is defined by
or In yk = kh (3)
where h is the size of the class interval on the
logarithmic scale.1 To find the tail of the steady state
distribution P(yk) we sum (2) from k to <» and obtain
pk.Thus
In P(yk) = In p (1/h) In yk
• — 1
and, putting -h In p = a , we have
In P(yk) = - a In yR. (4)
Evidently the crucial feature of the model is the
geometric distribution of the recurrence time. This
5
relates here to the rank in an economic hierarchy linked
to income.As promotion is assumed to be automatic the
"age" k of the system ,or "spent life time" is
geometrically distributed.Since the income is also an
exponential function of k the Pareto law results from an
elimination of k from the two exponential functions.
It is natural to object here that Champernowne's model
(Champernowne 1953)has been drastically simplified in the
above argument. In his model there are more
alternatives:People can either rise one step in the ladder
or stay in the same state or recess to a lower
state,although the possibilities of movement are limited
to a certain range.The steady state solution which
Champernowne obtains for his model is, however,essentially
the same as the simplified case treated above.
Champernowne assumes that the probability of transitions
from state k-v to state k is independent of k and depends
only on v.That means that the transitions depend on the
size of the jump but not on where it starts from or where
it ends. On this basis the following equation for the
steady state is established:
1
Xk = S pv Xk_v. (5)
-n
In terms of generating function the equation becomes
1
6
• ( , V
The solution of the equation is Xk = b (b<l).
The steady state distribution of the population according
ir t • • • • Tr
to rank is (l-b)b and the tail of the distribution is b .
The steady state solution in Champernowne's case is thus
equivalent to the simplified case treated further above,
if p is replaced by b.
The aim of the preceding considerations is to show that
Champernowne's explanation of the Pareto law is basically
the same as that of Simon (1957)and myself (1965) which
goes back to the model of Yule (1924) who used it to
explain the frequency of species in genera.2 According to
this approach size distribution is a transformed age
distribution and the pattern of the Pareto law occurs so
often simply because of the empirical importance of
exponential growth which makes both the age distribution
and the transformation function exponential. Owing to the
conceptual density of Champernowne's model the two
elements of rank in the hierarchy and income as a function
of rank are merged into one.
There is,however, a difference (which relates to the
interpretation rather than to the form ) between
Champernowne's model and the others:Since physical persons
sooner or later die the age or rank in his model is
limited while in other models, of firms or of wealth,for
7
example, there is a probability of virtually infinite
life.The steady state in these models is made possible
only by the continuing new entry of small units.
Further developments.
Champernowne's work has inspired another model (Rutherford
1955)in which personal age has been introduced
explicitly.This still leaves important questions
unanswered. Income is not a very suitable variable as the
state of a Markov chain. It does not embody the influence
of the past (Polya's "influence globale"),so that
yesterdays state tells you all you need to know about the
past. More important, the model is confined to the life
cycle of an individual from entrance to exit. But the
relevant stochastic process goes far beyond that.When
somebody starts in life his chances of receiving certain
incomes are already settled to a large extent by the
condition of his parents:By their wealth,status,
connections,reputation and the education or training they
, , O ,
have been able to give him. In other words the entries
and exits in the life cycle model are linked by
inheritance and similar elements, and the stochastic
process continues over the generations.
The arguments point to an obvious conclusion: We must
relate the chances of getting certain incomes to the
8
amount of wealth and to other factors which are in a wider
sense "inherited". In this way we can link the income to a
suitable state variable which is evolving in a long run
process through the generations. By following such a path
we shall also be able to answer the question why income
distribution is apparently relatively stable although so
many elements relevant to it are changing day by day; the
explanation is that the stability lies in the distribution
of wealth, education, training etc which change only
slowly.
In the present paper we shall confine ourselves to the
consideration of wealth and thus consider only the income
of the wealthy.4
The dependence of income on wealth.
In the following we shall initially consider the income
of the wealthy as flowing from wealth. They have,of
course,not only unearned but also earned income, and the
two are not easy to distinguish even apart from lack of
suitable data. But as a first step we may pretend that all
their income is interest or profit.
Instead of a matrix of income transitions we have now to
consider a matrix wealth-income,which shows for each
amount of wealth the corresponding probability of
different incomes. The basis of the analysis is thus the
9
conditional distribution of income,given the wealth.
Economically speaking this is the probability of a certain
rate of return to wealth, or profit rate. From this, if we
know it,we can derive the distribution of income,provided
we know the distribution of wealth. But the distribution
of wealth is known: It follows the Pareto law over a
fairly wide range and its pattern can also be explained
theoretically (see the preceding paper in this volume ).
Denoting wealth by W ,let us write for the density of the
wealth distribution
p*(W) = c Wa-1dW
or,putting w = In W
p(w) = c e“aw dw for w > 0
p(w) = 0 for w < 0. (7)
If Y denotes income and y = In Y ,the conditional density
function of income can be represented in the form
f*(y-w),the density of a certain return on wealth. Even
without knowing this function we might manage to derive
the distribution of income from that of wealth provided we
can make certain assumptions about independence.We shall
provisionally assume that the distribution of the rate of
return is independent of the amount of wealth. The method
will be to "mix" the conditional distribution of income
10
given the wealth (the distribution of return) with the
density function of wealth. For the purposes of this
calculation we shall replace the density f*(y-w) by the
mirror function f(w-y) which is also independent of
wealth. The two functions are symmetric and have the same
value.In fact, the only difference is in the
dimension:while f* refers to the rate of return per year,
f refers to the number of yearly incomes contained in the
wealth (the reciprocal value of the return).
We calculate then the density of income q(y) by mixing the
function f(w-y) with the density of wealth:
q(Y) =
oo
f(w-y) e-aw dw = c 0(a) e-ay
for w > y > 0
q(y) = 0 for w < y.(8)
where 0(a) is the Laplace transform of f(w).
The above mixture is a Laplace transform of f(w) shifted
to the right by y.
The Laplace transform requires that the argument of the
function f be non-negative. We have therefore to assume
that w > y (we shall show later how this restriction can
be relaxed).
Equation (8) shows that the Pareto pattern of the wealth
distribution is reproduced in the income
11
distribution,provided the independence condition is
fulfilled and y < w.
We have now to face the fact that the rate of return on
wealth is in reality not independent of wealth. The cross
classifications of wealth and income of wealth owners for
Holland and Sweden show that mean income is a linear
function of wealth,and the regression coefficient is
smaller than one.
Various reasons are responsible for the decline of the
rate of return with increasing wealth.Most important, the
income of wealth owners contains more or less considerable
amounts of earned income which are loosely connected with
the ownership of wealth but which can hardly be separated
even conceptually quite apart from the lack of data. The
earned income will be less important the greater the
wealth,simply because one can get rarely as much income
from work as from large wealth.In particular, income from
non-incorporated business is to a considerable extent
earned income,and this type of business is less frequently
present the greater the wealth. A number of other factors
also contribute to explain the regression coefficient. The
retained income of corporations will not find expression
in the income of shareholders but it will, at least in
many cases, affect the shareholders wealth via the market
12
value. Also speculative capital gains from appreciation of
shares or of real estate will affect the wealth but not
the income,and it will presumably be more important in the
higher wealth classes.
Now the rate of return will be independent of wealth if
its conditional distribution is the same whatever the size
of wealth.lt would seem that we might perhaps restore the
condition of independence simply by turning the system of
coordinates in the appropriate way,so that we could reduce
the present case to the former one.If we can make the
covariance of w and w-y zero then the coefficient of
regression of y on w should be one,as in the former case:
Cov (w,w-y) = Var (w) - Cov (w,y) = 0;
Cov (w,y) =
Var (w)
If the regression line of income on wealth is
y = tw + yQ ( tc < 1 )
and if also the variance and higher moments of the
conditional income distribution are independent of wealth
then we should use instead of f(w-y) the function
f( K w + yQ - y ) because this distribution will be
independent of wealth.
Although we do not really know anything about these higher
moments we shall nevertheless try to use the above
13
Although we do not really know anything about these higher
moments we shall nevertheless try to use the above
function and proceed in the same way as before by a
mixture of the conditional distribution:
q(Y) =
p 00
f( fcw + yQ - y) e-aw dw =
J o
= c 1/K 0(0-4, ) expt-a/^ (y-yQ) } for #w > y-yQ
q(y) = o for *w < y-yQ. (9)
The result is now that the Pareto coefficient of the
wealth distribution is reproduced in the income
distribution, but with a larger Pareto coefficient (since
tC < 1 ) . This is exactly what has to be explained (income
distributions are in fact more "equal" than the wealth
distributions,empirically,in so far as they show a larger
Pareto coefficient).The particular shape of the rate of
return distribution has no influence on the tail of the
income distribution as long as it fulfills the
independence conditions mentioned.
Concerning the restriction *Cw > y-yQ it should be
remarked that we are free to shift the coordinate system
to any yQ we choose so as to make the above condition
valid,with no consequence except that the conclusion about
the Pareto tail will be confined to incomes in excess of
y
o*
14
yD must often be more or less high,so that the Pareto
pattern will be confined to a rather narrow range of the
income distribution while in the case of wealth it usually
extends to almost the whole of the assessed wealth data.
This, it is true, partly results from the fact that the
wealth data are more truncated than the income data,in
view of the underlying tax laws.
Income and Wealth:Empirical Patterns
The following illustrations are based on the cross
classification of income and wealth available in Holland
and in Sweden. These data share certain characteristics
found also in other cross section data on size
distributions given in official publications.The first
feature is that the great bulk of the observations is
concentrated in the corner of the first (the north-east
quadrant). In other words the distributions are very skew.
A great many of the units are small in either dimension.
The second feature is that the wealth distribution is
heavily truncated (in Sweden, for example, at 150.000
crowns) while the income distribution is given down to
rather low levels. To put it in another way,only a small
proportion of all income earners are assessed for wealth.
15
If the mean income in the various wealth classes is
calculated,a linear regression of a very regular pattern
is obtained (fig.l). This "regression of the first kind"
as it is called (Cramer 1946 p.270 )differs from the usual
least sguares regression in that it does not assume a
priori a certain mathematical function for the
regression.If the regression of the means turns out to be
linear as is the case here then we should expect it to be
the same as the result of a least squares regression on
the basis of the full data (a difference may
arise,however, in so far as we do not take into account
the weights for the means corresponding to the frequencies
in the various wealth classes).
If the same regression of the first kind is calculated in
the other dimension,that is , if we take the mean wealth
for each of the various income classes,a completely
different picture emerges:The mean wealth in the lowest
income classes does not increase with income at all; for
higher incomes it increases strongly so that a distinctly
curved regression line results.
The reason - or at least the most important of the reasons
-for the peculiar shape of this regression line lies in
the truncation of the wealth data. In the lower income
classes we find only people who have something like the
16
minimum of assessed wealth,while all the income receivers
with lower wealth are not included in the data.If we try
to fill in these missing data in our imagination,assuming
fairly low levels of wealth for these people, we could
easily conceive that also the regression of wealth on
income would become linear.The inclusion of wealth below
the tax limit which is presumably the lower and the more
frequent the lower the income would reduce the mean wealth
in all income classes but it would reduce it the more the
lower the income. In other words the mean wealth in the
lower income classes as shown by the data very strongly
overstates the real mean wealth,and this the more the
lower the wealth.
There is no proof,of course,that some curvilinearity would
not remain even if full wealth data were
available,although it would be surprising that the two
regression lines should be so different in character.
The cross classification of wealth and income,available
for the Netherlands and Sweden, will now be discussed in
the light of the theory contained in equation (8). It
would be too much to expect a verification: For one thing
the estimate of the Pareto coefficient for income is
always more or less arbitrary because it depends on the
range of income classes included when fitting a straight
17
line to it. But any attempt to illustrate an abstract
argument by concrete data is better than speculating in
the void.
The most impressive feature of the data is the linear and
very regular character of the regression of income on
wealth. The regression coefficient is in most cases around
2/3 but it may be as low as 1/2. A considerable defect of
the data is the unequal size of the wealth (as well as the
income ) classes. The range of the classes increases with
the wealth.The last but one wealth class has a range about
four times as great as the lower wealth classes. This
makes it very difficult to decide whether the variance and
higher moments of income are independent of the size of
wealth. In the Swedish data the variance increases in the
higher wealth classes. This may,however, be plausibly
explained by the increase in the range of these classes.
The same defect mars the comparison of the conditional
distributions of income in the various wealth classes.
They all have a Pareto tail,but the Pareto coefficient is
markedly lower in the last two or three wealth classes
than in the others.This,again,may be plausibly explained
by the greater range of these high wealth size classes.
18
Table 1.
SWEDEN HOLLAND
1971 1962/
Married Single
couples persons
Wealth distribution:
Pareto coefficient
(whole range,11 values) 1.78 1.73 1.38
Regression of income
on wealth,coefficient 0.56 0.65 0.63
Calculated Pareto
coefficient for
income 3.18 2.66 2.20
Actual Pareto
coefficient for
income,(5 values) 2.62 2.14 2.08
Ditto,excluding the
open size class of
wealth 3.12 2.62
Table 1 gives the values for the coefficient of regression
of the means of income on wealth. The correlation is very
high (r2 is more than 0.99) for the mean income values;
for the correlation of the grouped income data it is
modest (r2= 0.25 in Holland).
From the empirical Pareto coefficient for wealth and the
calculated regression coefficient K we can calculate the
Pareto coefficient for income in accordance with equation
(8). For Holland we obtain 2.20 which compares with an
actual Pareto coefficient for income of wealth owners of
19
2.08. The coefficient for all incomes,including persons
with no taxable wealth is not very different which is
rather surprising.
While the Dutch data are not worse than could be expected
the Swedish data are not so good. The population of wealth
holders is here (on account of the tax laws) split into
four groups: Couples where both have wealth,couples where
only one has wealth, single men and single women. The
reduction of the sample impairs the regularity of the data
and I have therefore aggregated the four into two
groups,married couples and single persons. The calculated
Pareto coefficients for income of wealth owners are much
higher than the actual ones.These calculated coefficient
correspond more nearly to those of all income receivers
including the wealthless ones. We get,however, also a
reasonably good correspondence with the actual income data
for wealth owners if we exclude the open size class from
the calculation. This might perhaps be motivated by the
argument that the open size class does not enter the
calculation of K either. The motivation is not entirely
convincing and the results are somewhat inconclusive.
Since the conditional income distributions in the various
wealth classes have been referred to several times I give
20
in Table 2 these data for the couples where both husband
and wife have wealth.
Table 2
CONDITIONAL INCOME DISTRIBUTION
Wealth in 000 K
150 - 175
175 - 200
200 - 250
250 - 300
300 - 400
400 - 500
500 - 750
750 - 1000
1000 - 2000
2000 - 5000
5000 -
Mean In income
4.74
4.76
4.79
4.84
4.88
4.95
5.01
5.09
5.18
5.38
(5.74)
Pareto
coefficient
4.06
3.55
3.83
3.92
3.69
3.34
3.29
3.00
3.47
2.19
1.17
All 2.68
All without
open wealth class 3.16
The conditional distributions have all Pareto tails
although the fit is bad (only 4 values can be used).The
Pareto coefficient is between 3 and 4 in all except the
last two size classes where it is very low,and it is 2.68
for the whole income distribution.lt appears that the
distribution as a whole is decisively influenced by the
last two wealth classes where the income distribution is
very unequal owing to the wide range of wealth in these
classes. In this way the peculiar result arises that the
21
Outlook on further developments.
In this paper the idea has been elaborated that the amount
of wealth determines the chances of having certain amounts
of income.But it may be thought that also the inverse
relation - influence of income on wealth - plays a role.
Certainly the increment of wealth per year depends on
income,given the rate of saving out of various incomes. If
we take into account that the present income is usually
strongly correlated with the past incomes of the same
person,or even of his ancestors, then it appears that the
chances of a certain wealth may be determined,indirectly,
by the present income.And we may connect this relation
with the regression line of wealth on income (which in the
Swedish data appears so very distorted on account of the
truncation of the distribution). There are,then, two
theories ,and two regression lines. It would be very
convenient if we could regard each of the regression lines
as a true picture of the corresponding theory. This
correspondence is,however,marred by the greater or lesser
dispersion of values round each of the regression lines.
It can easily be seen that the dispersion round one of the
regression lines will influence the shape of the other
regression line. If the rate of return of a given wealth
is widely dispersed then the persons with a high rate of
22
return will be classified in the high income classes,those
with the same wealth but with a low rate of return among
the small incomes. This will more or less strongly
counteract the tendency of wealth to increase with income,
it will flatten out the regression line.
It seems to me that the joint distribution of two
variables like income and wealth should be approached from
the standpoint of a more elaborate theory. One could
imagine a stochastic process,in the simplest case a Markov
chain, in two stages: One matrix would show for each
amount of wealth at the beginning of the year the
probabilities of various incomes in that year. Another
matrix would show for each of these incomes the
probability of wealth at the end of the year - which
results from the addition of the saving out of the various
incomes to the initial wealth.In this way both
parameters,the rate of return on wealth and the rate of
saving out of income, would play their role in the
process. A multiplication of these matrices would describe
a continuing process of accumulation,starting from
certain initial conditions of wealth distribution. We may
then, under certain conditions,if we allow also for new
entries, derive a steady state of the joint distribution
of wealth and income. This could be used then to
23
understand and interpret the empirical joint distributions
of income and wealth. For this purpose one would no doubt
require data which are more conveniently arranged than the
published material used above, and for this reason alone
it would go beyond the modest scope of this paper.
Footnotes.
1 The difficulties arising from the discrete
representation of a continuous income variable
(Cramer,1969,p.62)in the matrix need not concern us
here.They do not exist since we relate the classes of the
matrix to rank in a hierarchy.
2 Champernowne apparently did not know Yule's paper. It
was Simon's merit to have brought it to the attention of
economists.Unfortunately he used it in a form which
obscured its essence which is the interplay of two
exponential,or geometrical,distributions resulting from
two stochastic processes.
A very different approach to the Pareto law has been
proposed by D.Sahal (1978) who connected it with the
progress function and with the allometric law (the
relation between the growth rates of different parts of an
organism).
24
3 Champernowne was, of course, aware of these facts,as his
thesis of 1937 shows (Champernowne 1973).His formalised
model of 1953 is,however,not well suited to reflect all
the economic factors so well stated in 1937.
4 As far as the other factors,in other words,the
explanations of earned income, are concerned, I can only
vaguely indicate the direction in which I think a theory
might be developped. The high earned incomes accrue to
managers,professional people artists and sportsmen. In
most cases these people can be ranked in a hierarchy which
in the case of managers depends on the scope of their
activity,i.e. on the size of the firm,while in the case of
the artists and sportsmen what matters is the size of the
audience which they serve. The modern media have created
ever larger audiences and accordingly greater incomes. A
relevant theory of the high earned incomes (and the Pareto
distribution is relevant only for them) will have to start
from this concept of hierarchies pertaining to scope which
gives another interpretation of Champernowne's matrix
which,in his own presentation, is more adapted to income
formation in a burocratic hierarchy.