There has been much progress in improving the availability, quality and comparability of income and wealth inequality data. Several cross-national databases containing summary inequality statistics are now available. In this note, we review the World Bank’s PovcalNet, the Luxembourg Income Study and Wealth Study Databases (LIS, LWS), the Standardized World Income Inequality Database (SWIID), the World Income Inequality Database (WIID), the World and Wealth Income Database (WID.world), All the Ginis dataset, the Estimated Household Income Inequality dataset (EHII) and the Global Consumption and Income Project (GCIP).1 All data are publicly available free of charge in all the databases examined with the exception LIS/LWS.
The databases reviewed differ considerably in purpose, coverage, data sources and indicators provided. Some of them are just repositories of estimates compiled from primary and other secondary sources. Others provide original estimates based on microdata from, mainly, a growing number of household surveys. Some rely on imputation methods to obtain estimates for years when data are missing while others do not. As a result, coverage by country and year differs significantly across datasets. Some databases are produced by institutions while others are developed by individual researchers. Some institutions make data harmonization one of their priorities while others offer diverse sets of data—and the metadata needed to identify differences across data sources and countries.
Although there is significant agreement among these datasets, there are also inconsistencies in both the levels and trends of inequality obtained from each database (for each given indicator). Some of the differences across databases are illustrated below. Overall, there are trade-offs between breadth (coverage) and comparability. Maximizing comparability and quality means focusing on a small number of (developed) countries. It also requires thoroughly harmonizing data, using data from one source or using only a single basis of calculation. Increasing coverage means relying on less reliable data, using different variables to produce estimates (income is used in practically all developed countries; consumption is often the underlying measure in developing countries), and/or making assumptions to impute values where data are missing.
Among the databases examined, PovcalNet appears to have the most non-imputed estimates for the largest number of countries. It is also the data source used for the international monitoring of SDG target 10.1. 2 On the other hand, LIS is the only source that uses a uniform set of assumptions and definitions on the basis of thoroughly harmonized microdata to maximize comparability. SWIID provides the most complete dataset, but many of the values are imputed.