4. Income and Wealth Inequality#
4.1. Overview#
In this section we
provide motivation for the techniques deployed in the lecture and
import code libraries needed for our work.
4.1.1. Some history#
Many historians argue that inequality played a key role in the fall of the Roman Republic.
After defeating Carthage and invading Spain, money flowed into Rome and greatly enriched those in power.
Meanwhile, ordinary citizens were taken from their farms to fight for long periods, diminishing their wealth.
The resulting growth in inequality caused political turmoil that shook the foundations of the republic.
Eventually, the Roman Republic gave way to a series of dictatorships, starting with Octavian (Augustus) in 27 BCE.
This history is fascinating in its own right, and we can see some parallels with certain countries in the modern world.
Many recent political debates revolve around inequality.
Many economic policies, from taxation to the welfare state, are aimed at addressing inequality.
4.1.2. Measurement#
One problem with these debates is that inequality is often poorly defined.
Moreover, debates on inequality are often tied to political beliefs.
This is dangerous for economists because allowing political beliefs to shape our findings reduces objectivity.
To bring a truly scientific perspective to the topic of inequality we must start with careful definitions.
In this lecture we discuss standard measures of inequality used in economic research.
For each of these measures, we will look at both simulated and real data.
We will install the following libraries.
!pip install --upgrade quantecon interpolation
Show code cell output
Requirement already satisfied: quantecon in /usr/share/miniconda3/envs/quantecon/lib/python3.10/site-packages (0.7.1)
Requirement already satisfied: interpolation in /usr/share/miniconda3/envs/quantecon/lib/python3.10/site-packages (2.2.4)
Requirement already satisfied: requests in /usr/share/miniconda3/envs/quantecon/lib/python3.10/site-packages (from quantecon) (2.31.0)
Requirement already satisfied: numpy>=1.17.0 in /usr/share/miniconda3/envs/quantecon/lib/python3.10/site-packages (from quantecon) (1.23.5)
Requirement already satisfied: numba>=0.49.0 in /usr/share/miniconda3/envs/quantecon/lib/python3.10/site-packages (from quantecon) (0.56.4)
Requirement already satisfied: sympy in /usr/share/miniconda3/envs/quantecon/lib/python3.10/site-packages (from quantecon) (1.11.1)
Requirement already satisfied: scipy>=1.5.0 in /usr/share/miniconda3/envs/quantecon/lib/python3.10/site-packages (from quantecon) (1.10.0)
Requirement already satisfied: packaging<22.0,>=21.3 in /usr/share/miniconda3/envs/quantecon/lib/python3.10/site-packages (from interpolation) (21.3)
Requirement already satisfied: tempita<0.6.0,>=0.5.2 in /usr/share/miniconda3/envs/quantecon/lib/python3.10/site-packages (from interpolation) (0.5.2)
Requirement already satisfied: setuptools in /usr/share/miniconda3/envs/quantecon/lib/python3.10/site-packages (from numba>=0.49.0->quantecon) (65.6.3)
Requirement already satisfied: llvmlite<0.40,>=0.39.0dev0 in /usr/share/miniconda3/envs/quantecon/lib/python3.10/site-packages (from numba>=0.49.0->quantecon) (0.39.1)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /usr/share/miniconda3/envs/quantecon/lib/python3.10/site-packages (from packaging<22.0,>=21.3->interpolation) (3.0.9)
Requirement already satisfied: certifi>=2017.4.17 in /usr/share/miniconda3/envs/quantecon/lib/python3.10/site-packages (from requests->quantecon) (2022.12.7)
Requirement already satisfied: idna<4,>=2.5 in /usr/share/miniconda3/envs/quantecon/lib/python3.10/site-packages (from requests->quantecon) (3.4)
Requirement already satisfied: charset-normalizer<4,>=2 in /usr/share/miniconda3/envs/quantecon/lib/python3.10/site-packages (from requests->quantecon) (2.0.4)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/share/miniconda3/envs/quantecon/lib/python3.10/site-packages (from requests->quantecon) (1.26.14)
Requirement already satisfied: mpmath>=0.19 in /usr/share/miniconda3/envs/quantecon/lib/python3.10/site-packages/mpmath-1.2.1-py3.10.egg (from sympy->quantecon) (1.2.1)
And we use the following imports.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import quantecon as qe
import random as rd
from interpolation import interp
4.2. The Lorenz curve#
One popular measure of inequality is the Lorenz curve.
In this section we define the Lorenz curve and examine its properties.
4.2.1. Definition#
The Lorenz curve takes a sample \(w_1, \ldots, w_n\) and produces a curve \(L\).
We suppose that the sample \(w_1, \ldots, w_n\) has been sorted from smallest to largest.
To aid our interpretation, suppose that we are measuring wealth
\(w_1\) is the wealth of the poorest member of the population and
\(w_n\) is the wealth of the richest member of the population.
The curve \(L\) is just a function \(y = L(x)\) that we can plot and interpret.
To create it we first generate data points \((x_i, y_i)\) according to
Now the Lorenz curve \(L\) is formed from these data points using interpolation.
(If we use a line plot in Matplotlib, the interpolation will be done for us.)
The meaning of the statement \(y = L(x)\) is that the lowest \((100 \times x)\)% of people have \((100 \times y)\)% of all wealth.
if \(x=0.5\) and \(y=0.1\), then the bottom 50% of the population owns 10% of the wealth.
In the discussion above we focused on wealth but the same ideas apply to income, consumption, etc.
4.2.2. Lorenz curves of simulated data#
Let’s look at some examples and try to build understanding.
In the next figure, we generate \(n=2000\) draws from a lognormal distribution and treat these draws as our population.
The straight line (\(x=L(x)\) for all \(x\)) corresponds to perfect equality.
The lognormal draws produce a less equal distribution.
For example, if we imagine these draws as being observations of wealth across a sample of households, then the dashed lines show that the bottom 80% of households own just over 40% of total wealth.
n = 2000
sample = np.exp(np.random.randn(n))
fig, ax = plt.subplots()
f_vals, l_vals = qe.lorenz_curve(sample)
ax.plot(f_vals, l_vals, label=f'lognormal sample', lw=2)
ax.plot(f_vals, f_vals, label='equality', lw=2)
ax.legend(fontsize=12)
ax.vlines([0.8], [0.0], [0.43], alpha=0.5, colors='k', ls='--')
ax.hlines([0.43], [0], [0.8], alpha=0.5, colors='k', ls='--')
ax.set_ylim((0, 1))
ax.set_xlim((0, 1))
plt.show()

Fig. 4.1 Lorenz curve of simulated data#
4.2.3. Lorenz curves for US data#
Next let’s look at the real data, focusing on income and wealth in the US in 2016.
The following code block imports a subset of the dataset SCF_plus
,
which is derived from the Survey of Consumer Finances (SCF).
url = 'https://media.githubusercontent.com/media/QuantEcon/high_dim_data/main/SCF_plus/SCF_plus_mini.csv'
df = pd.read_csv(url)
df = df.dropna()
df_income_wealth = df
df_income_wealth.head()
year | n_wealth | t_income | l_income | weights | nw_groups | ti_groups | |
---|---|---|---|---|---|---|---|
0 | 1950 | 266933.75 | 55483.027 | 0.0 | 0.998732 | 50-90% | 50-90% |
1 | 1950 | 87434.46 | 55483.027 | 0.0 | 0.998732 | 50-90% | 50-90% |
2 | 1950 | 795034.94 | 55483.027 | 0.0 | 0.998732 | Top 10% | 50-90% |
3 | 1950 | 94531.78 | 55483.027 | 0.0 | 0.998732 | 50-90% | 50-90% |
4 | 1950 | 166081.03 | 55483.027 | 0.0 | 0.998732 | 50-90% | 50-90% |
The following code block uses data stored in dataframe df_income_wealth
to generate the Lorenz curves.
(The code is somewhat complex because we need to adjust the data according to population weights supplied by the SCF.)
Show code cell source
df = df_income_wealth
varlist = ['n_wealth', # net wealth
't_income', # total income
'l_income'] # labor income
years = df.year.unique()
# Create lists to store Lorenz data
F_vals, L_vals = [], []
for var in varlist:
# create lists to store Lorenz curve data
f_vals = []
l_vals = []
for year in years:
# Repeat the observations according to their weights
counts = list(round(df[df['year'] == year]['weights'] ))
y = df[df['year'] == year][var].repeat(counts)
y = np.asarray(y)
# Shuffle the sequence to improve the plot
rd.shuffle(y)
# calculate and store Lorenz curve data
f_val, l_val = qe.lorenz_curve(y)
f_vals.append(f_val)
l_vals.append(l_val)
F_vals.append(f_vals)
L_vals.append(l_vals)
f_vals_nw, f_vals_ti, f_vals_li = F_vals
l_vals_nw, l_vals_ti, l_vals_li = L_vals
Now we plot Lorenz curves for net wealth, total income and labor income in the US in 2016.
fig, ax = plt.subplots()
ax.plot(f_vals_nw[-1], l_vals_nw[-1], label=f'net wealth')
ax.plot(f_vals_ti[-1], l_vals_ti[-1], label=f'total income')
ax.plot(f_vals_li[-1], l_vals_li[-1], label=f'labor income')
ax.plot(f_vals_nw[-1], f_vals_nw[-1], label=f'equality')
ax.legend(fontsize=12)
plt.show()

Fig. 4.2 2016 US Lorenz curves#
Here all the income and wealth measures are pre-tax.
Total income is the sum of households’ all income sources, including labor income but excluding capital gains.
One key finding from this figure is that wealth inequality is significantly more extreme than income inequality.
4.3. The Gini coefficient#
The Lorenz curve is a useful visual representation of inequality in a distribution.
Another popular measure of income and wealth inequality is the Gini coefficient.
The Gini coefficient is just a number, rather than a curve.
In this section we discuss the Gini coefficient and its relationship to the Lorenz curve.
4.3.1. Definition#
As before, suppose that the sample \(w_1, \ldots, w_n\) has been sorted from smallest to largest.
The Gini coefficient is defined for the sample above as
The Gini coefficient is closely related to the Lorenz curve.
In fact, it can be shown that its value is twice the area between the line of equality and the Lorenz curve (e.g., the shaded area in the following Figure below).
The idea is that \(G=0\) indicates complete equality, while \(G=1\) indicates complete inequality.
fig, ax = plt.subplots()
f_vals, l_vals = qe.lorenz_curve(sample)
ax.plot(f_vals, l_vals, label=f'lognormal sample', lw=2)
ax.plot(f_vals, f_vals, label='equality', lw=2)
ax.legend(fontsize=12)
ax.vlines([0.8], [0.0], [0.43], alpha=0.5, colors='k', ls='--')
ax.hlines([0.43], [0], [0.8], alpha=0.5, colors='k', ls='--')
ax.fill_between(f_vals, l_vals, f_vals, alpha=0.06)
ax.set_ylim((0, 1))
ax.set_xlim((0, 1))
ax.text(0.04, 0.5, r'$G = 2 \times$ shaded area', fontsize=12)
plt.show()

Fig. 4.3 Shaded Lorenz curve of simulated data#
4.3.2. Gini coefficient dynamics of simulated data#
Let’s examine the Gini coefficient in some simulations.
The following code computes the Gini coefficients for five different populations.
Each of these populations is generated by drawing from a lognormal distribution with parameters \(\mu\) (mean) and \(\sigma\) (standard deviation).
To create the five populations, we vary \(\sigma\) over a grid of length \(5\) between \(0.2\) and \(4\).
In each case we set \(\mu = - \sigma^2 / 2\).
This implies that the mean of the distribution does not change with \(\sigma\).
(You can check this by looking up the expression for the mean of a lognormal distribution.)
k = 5
σ_vals = np.linspace(0.2, 4, k)
n = 2_000
ginis = []
for σ in σ_vals:
μ = -σ**2 / 2
y = np.exp(μ + σ * np.random.randn(n))
ginis.append(qe.gini_coefficient(y))
def plot_inequality_measures(x, y, legend, xlabel, ylabel):
fig, ax = plt.subplots()
ax.plot(x, y, marker='o', label=legend)
ax.set_xlabel(xlabel, fontsize=12)
ax.set_ylabel(ylabel, fontsize=12)
ax.legend(fontsize=12)
plt.show()
plot_inequality_measures(σ_vals,
ginis,
'simulated',
'$\sigma$',
'gini coefficients')

Fig. 4.4 Gini coefficients of simulated data#
The plots show that inequality rises with \(\sigma\), according to the Gini coefficient.
4.3.3. Gini coefficient dynamics for US data#
Now let’s look at Gini coefficients for US data derived from the SCF.
The following code creates a list called Ginis
.
It stores data of Gini coefficients generated from the dataframe df_income_wealth
and method gini_coefficient, from QuantEcon library.
Show code cell source
varlist = ['n_wealth', # net wealth
't_income', # total income
'l_income'] # labor income
df = df_income_wealth
# create lists to store Gini for each inequality measure
Ginis = []
for var in varlist:
# create lists to store Gini
ginis = []
for year in years:
# repeat the observations according to their weights
counts = list(round(df[df['year'] == year]['weights'] ))
y = df[df['year'] == year][var].repeat(counts)
y = np.asarray(y)
rd.shuffle(y) # shuffle the sequence
# calculate and store Gini
gini = qe.gini_coefficient(y)
ginis.append(gini)
Ginis.append(ginis)
ginis_nw, ginis_ti, ginis_li = Ginis
Let’s plot the Gini coefficients for net wealth, labor income and total income.
# use an average to replace an outlier in labor income gini
ginis_li_new = ginis_li
ginis_li_new[5] = (ginis_li[4] + ginis_li[6]) / 2
xlabel = "year"
ylabel = "gini coefficient"
fig, ax = plt.subplots()
ax.plot(years, ginis_nw, marker='o')
ax.set_xlabel(xlabel, fontsize=12)
ax.set_ylabel(ylabel, fontsize=12)
plt.show()

Fig. 4.5 Gini coefficients of US net wealth#
xlabel = "year"
ylabel = "gini coefficient"
fig, ax = plt.subplots()
ax.plot(years, ginis_li_new, marker='o', label="labor income")
ax.plot(years, ginis_ti, marker='o', label="total income")
ax.set_xlabel(xlabel, fontsize=12)
ax.set_ylabel(ylabel, fontsize=12)
ax.legend(fontsize=12)
plt.show()

Fig. 4.6 Gini coefficients of US income#
We see that, by this measure, inequality in wealth and income has risen substantially since 1980.
The wealth time series exhibits a strong U-shape.
4.5. Exercises#
Exercise 4.1
Using simulation, compute the top 10 percent shares for the collection of lognormal distributions associated with the random variables \(w_\sigma = \exp(\mu + \sigma Z)\), where \(Z \sim N(0, 1)\) and \(\sigma\) varies over a finite grid between \(0.2\) and \(4\).
As \(\sigma\) increases, so does the variance of \(w_\sigma\).
To focus on volatility, adjust \(\mu\) at each step to maintain the equality \(\mu=-\sigma^2/2\).
For each \(\sigma\), generate 2,000 independent draws of \(w_\sigma\) and calculate the Lorenz curve and Gini coefficient.
Confirm that higher variance generates more dispersion in the sample, and hence greater inequality.
Solution to Exercise 4.1
Here is one solution:
def calculate_top_share(s, p=0.1):
s = np.sort(s)
n = len(s)
index = int(n * (1 - p))
return s[index:].sum() / s.sum()
k = 5
σ_vals = np.linspace(0.2, 4, k)
n = 2_000
topshares = []
ginis = []
f_vals = []
l_vals = []
for σ in σ_vals:
μ = -σ ** 2 / 2
y = np.exp(μ + σ * np.random.randn(n))
f_val, l_val = qe._inequality.lorenz_curve(y)
f_vals.append(f_val)
l_vals.append(l_val)
ginis.append(qe._inequality.gini_coefficient(y))
topshares.append(calculate_top_share(y))
plot_inequality_measures(σ_vals,
topshares,
"simulated data",
"$\sigma$",
"top $10\%$ share")

Fig. 4.8 Top shares of simulated data#
plot_inequality_measures(σ_vals,
ginis,
"simulated data",
"$\sigma$",
"gini coefficient")

Fig. 4.9 Gini coefficients of simulated data#
fig, ax = plt.subplots()
ax.plot([0,1],[0,1], label=f"equality")
for i in range(len(f_vals)):
ax.plot(f_vals[i], l_vals[i], label=f"$\sigma$ = {σ_vals[i]}")
plt.legend()
plt.show()

Fig. 4.10 Lorenz curves for simulated data#
Exercise 4.2
According to the definition of the top shares (4.2) we can also calculate the top percentile shares using the Lorenz curve.
Compute the top shares of US net wealth using the corresponding Lorenz curves data: f_vals_nw, l_vals_nw
and linear interpolation.
Plot the top shares generated from Lorenz curve and the top shares approximated from data together.
Solution to Exercise 4.2
Here is one solution:
def lorenz2top(f_val, l_val, p=0.1):
t = lambda x: interp(f_val, l_val, x)
return 1- t(1 - p)
top_shares_nw = []
for f_val, l_val in zip(f_vals_nw, l_vals_nw):
top_shares_nw.append(lorenz2top(f_val, l_val))
xlabel = "year"
ylabel = "top $10\%$ share"
fig, ax = plt.subplots()
ax.plot(years, df_topshares["topshare_n_wealth"], marker='o',\
label="net wealth-approx")
ax.plot(years, top_shares_nw, marker='o', label="net wealth-lorenz")
ax.set_xlabel(xlabel, fontsize=12)
ax.set_ylabel(ylabel, fontsize=12)
ax.legend(fontsize=12)
plt.show()

Fig. 4.11 US top shares: approximation vs Lorenz#