Abstract. Methods of regional economic analysis are widely used in regional and urban economics as well as in economic geography. This paper introduces the REAT (Regional Economic Analysis Toolbox) package for the programming environment R, which provides a collection of mathematical regional analysis methods in a user-friendly way. The focus is on the identification of regional inequality, beta and sigma convergence, measurement of agglomerations, point-based measures of clustering and accessibility, as well as regional growth. The theoretical basics of the applications are briefly introduced, while the usage of the most important functions is presented and explained using real data.
Methods of regional economic analysis (or regional analysis) are used frequently in theory-based, empirical studies from regional and urban economics as well as (quantitative) economic geography. These methods aim at analyzing some of the most important issues in the mentioned research fields, including (but not limited to) the existence and evolution of agglomerations, regional economic growth and regional disparities (Capello, Nijkamp, 2009; Dinc, 2015; Farhauer, Kröll, 2014; Schätzl, 2000). In any of the mentioned fields, a growing amount of quantitative data has to be processed when using traditional or novel methods and models of regional analysis. This paper introduces the package (add-on) REAT (Regional Economic Analysis Toolbox) (Wieland, 2019) for the programming environment R (R Core Team, 2018a). The package provides a collection of mathematical regional analysis applications, designed in a relatively user-friendly way.
The main topics in the regional analysis context can be summarized as follows, showing also the structure of the present paper with respect to the presented approaches and their application in REAT:
Note that, in its original form, the open source software R is a command-line environment including a lot of mathematical and statistical features. For the installation of R and its packages as well as the basics of navigation and implemented statistical functions, see the R documentations (R Core Team, 2018b). A good supplement for working with R is RStudio (RStudio Team, 2016). The REAT package deals with several R data types: The most functions require and calculate numeric vectors, but, in some cases, also objects of type matrix, data frame and list, depending on the complexity of calculation. For a quick introduction to the data types in R and their properties, see e.g. Kabacoff (2017).
Regional disparities are a frequent topic in economic geography and regional economics. The spatial inequality with respect to e.g. regional output, income or employment is an essential element of polarization theory (Myrdal, 1957) and ”New Economic Geography” (Krugman, 1991; Fujita et al., 2001). Assessing regional disparities is possible using concentration and dispersion indicators, which belong to the univariate and descriptive analysis in statistics. Apart from regional economics, these measures are used in several contexts, such as competition economics (market concentration of firms) or welfare economics (income inequality). For a review of the most common indicators with respect to regional inequality, see Portnov, Felsenstein (2010), for studies comparing different indicators in the regional economic context using empirical data, see e.g. Gluschenko (2018); Habánik et al. (2013); Huang, Leung (2009); Palan (2017); Petrakos, Psycharis (2016).
Concentration is operationalized as the discrepancy between an empirical distribution of a variable x (e.g. annual turnover, income, gross domestic product [GDP]) with n observations or objects (e.g. competing firms, households, regions) and a (theoretical) equal distribution or a reference distribution (e.g. population distribution). Dispersion indicators aim at the deviation from the arithmetic mean of x, . In this context, Portnov, Felsenstein (2005, 2010) distinguish between measures of deprivation and variation.
Typical measures of regional disparities are the Gini coefficient, the Herfindahl-Hirschman index and the coefficient of variation (Lessmann, 2005). The most popular measure of concentration is the Gini coefficient (Gini, 1912) in combination with the Lorenz curve (Lorenz, 1905). There are several calculation approaches for the Gini coefficient, all producing the same result. The Lorenz curve is a graphical indicator, showing the deviation of the empirical shares of the regarded variable x from a (theoretical) equal distribution. Another well-known indicator is the Herfindahl-Hirschman index, which was developed independently by Hirschman (1945) and Herfindahl (1950), both in the context of competition economics. Several other concentration indicators are also applied in the fields of regional economics with respect to regional disparities, such as the Hoover coefficient (Hoover, 1936) and the Theil coefficient (Theil, 1967).
Except for the standard deviation, whose unit is equal to the unit of x, all common indicators are dimensionless. Most of them (except for standard deviation and coefficient of variation) have a fixed value range, normally between zero (indicating complete equality/dispersion) and one (indicating complete inequality/concentration).
Most of the common indicators are mathematically formulated in an unweighted and in a weighted form, while, in the context of regional disparities, the latter is mostly done using the regions’ proportion of the total (e.g. national) population (Doran, Jordan 2013; Lessmann 2014; Mussini 2017; Petrakos, Psycharis 2016; for a critical discussion of weighting these coefficients, see Gluschenko 2018). In the literature, there are different formulations where the weighted coefficients also include a weighted arithmetic mean. Note that, in the case of the population-weighted Gini coefficient, a weighted arithmetic mean is mandatory to keep the indicators’ value range.
Especially when dealing with GDP per capita as an indicator of regional economic output, several recent studies use dispersion measures rather than concentration measures, especially the (weighted) coefficient of variation (e.g. Lessmann 2005, 2014, 2016; Lessmann, Seidel 2017; Petrakos, Psycharis 2016). This dispersion indicator is a dimensionless normalization of the standard deviation. Weighting the coefficient of variation with population shares was introduced by Williamson (1965), which has led to calling this coefficient the Williamson index. As regional incomes or outputs are not normally distributed in most cases, resulting in biased arithmetic means used in the calculation of dispersion measures, the regarded variable may be log-transformed, which means replacing x_{i} with log(x_{i}) in the calculations.
Table 1 shows the common indicators, including their (population-)weighted and their normalized form (if there exist any) and the corresponding value ranges. The formulae are shown in a way that includes several ways of application. The regarded variable is always named x_{i}, while the (population) weighting is called w_{i}. Some indicators, such as the Hoover or the Coulter coefficient, require a variable representing a reference distribution the shares of x_{i} are compared to. This reference is not a weighting. However, in many studies, the regional population is also used for the reference distribution. In these cases, reference and weighting are the same data. The reference distribution may also be equal to 1∕n.
Several indicators are also used for the analysis of regional specialization or the spatial concentration of industries, such as the Hoover coefficient or the Herfindahl-Hirschman index or its inverse (1∕HHI; also known as the “equivalent number” in the competition context). Other coefficients of concentration and specialization are discussed in Section 4. The last coefficient in Table 1, the mean square successive difference (von Neumann et al., 1941) is a measure for time variability not originating from but also transferable to regional economics.
Indicator | Unweighted | Weighted | Normalized |
Gini | G = ∑_{ i=1}^{n}∑_{ j=1}^{n} | G^{w} = ∑_{ i=1}^{n}∑_{ j=1}^{n}w_{ i}w_{j} | G^{*} = G |
0 ≤ G ≤ 1 - | 0 ≤ G ≤ 1 - | 0 ≤ G^{*}≤ 1 | |
HHI | HHI = ∑_{ i=1}^{n}()^{2} | HHI^{*} = | |
≤ HHI ≤ 1 | 0 ≤ HHI^{*}≤ 1 | ||
Hoover | HC = | HC^{w} = | |
[∑_{ i=1}^{n}| -|] | [∑_{ i=1}^{n}w_{ i}| -|] | ||
0 ≤ HC ≤ 1 | 0 ≤ HC ≤ 1 | ||
Theil | TC = ∑_{ i=1}^{n} ln() | TC^{w} = ∑_{ i=1}^{n}w_{ i} ln() | |
0 ≤ TC ≤ 1 | 0 ≤ TC^{w} ≤ 1 | ||
Coulter | CC = | ||
0 ≤ CC ≤ 1 | |||
Atkinson | AI = 1 - [∑_{ i=1}^{n}x_{ i}^{1-ϵ}]^{ } | ||
0 ≤ AI ≤ 1 | |||
Dalton | δ = | ||
0 ≤ δ ≤∞ | |||
SD | s = | s^{w} = | see CV |
0 ≤ s ≤∞ | 0 ≤ s ≤∞ | ||
CV | v = | see Williamson | v^{*} = |
0 ≤ v ≤∞ | 0 ≤ v^{*}≤ 1 | ||
Williamson | WI = | ||
0 ≤ v ≤∞ | |||
MSSD | MSSD = | ||
Compiled from: Charles-Coll (2011); Cracau, Durán Lima (2016); Damgaard, Weiner (2000); Gluschenko (2018); Heinemann (2008); Kohn, Öztürk (2013); Portnov, Felsenstein (2005, 2010); Taylor, Cihon (2004); Schätzl (2000); Störmann (2009)
Table 2 shows the functions for concentration and dispersion measures implemented in the REAT package. All functions require at least one argument, a numeric vector with a length equal to n, containing the regarded variable x (e.g. income) with i observations (e.g. regions), where i = 1,...,n. This data may be a single vector or a column of a data frame or matrix.
An optional weighting of the vector x can be done using the function argument weighting which is also a numeric vector of length n. By default, the functions remove missing (NA) values. The hoover() function always needs a reference distribution (see the Hoover coefficient formula in Table 1), which is stated via the ref argument, also requiring a numeric vector of length n. If no reference variable is stated (ref = NULL), the reference is set to 1∕n.
All functions (except for disp()) return the single value of the computed coefficient. In the relevant cases (gini(), gini2(), herf() and cv()), a normalization of the coefficient is possible using the function argument coefnorm = TRUE, returning the normalized coefficient instead of the raw coefficient. The function disp() is a wrapper for all mentioned functions, calculating all coefficients (except for the MSSD) at once for one vector x or a set of variables/columns from a data frame or matrix.
Note that there are two functions for the Gini coefficient, gini() and gini2(), both producing the same result in the unweighted case. The former function is designed for income inequality, where the weighting option is designed for the calculation of the Gini coefficient for groups (e.g. income classes), where the weighting represents the group mean. The function gini2() is designed for the population-weighted analysis of regional inequality.
Indicator | REAT function | Mandatory arguments | Optional arguments | Output |
Gini/ | gini() | vector x | weighting vector, | value: G or G^{*} |
Lorenz | remove NAs, | or G^{w}, | ||
Lorenz curve, | optional: plot (LC) | |||
normalization | ||||
gini2() | vector x | weighting vector P_{i}, | value: G or G^{*} | |
remove NAs, | or G^{w}, | |||
normalization | ||||
lorenz() | vector x | weighting vector, | plot LC, | |
remove NAs, | value: G or G^{w} | |||
and/or G^{*} | ||||
HHI | herf() | vector x | remove NAs, | value: HHI or |
normalization | HHI^{*} or N_{HHI } | |||
Hoover | hoover() | vector x | weighting vector P_{i}, | value: HC or HC^{w} |
reference vector r_{i} | remove NAs | |||
Theil | theil() | vector x | weighting vector P_{i}, | value: TC or TC^{w} |
remove NAs | ||||
Coulter | coulter() | vector x | weighting vector P_{i}, | value: CC |
remove NAs | ||||
Atkinson | atkinson() | vector x | remove NAs, | value: AI |
epsilon | ||||
Dalton | dalton() | vector x | remove NAs | value: δ |
SD | sd2() | vector x | weighting vector, | value: s or s^{W} |
remove NAs, | ||||
treating as sample | ||||
CV | cv() | vector x | weighting vector, | value: v or v^{W} |
remove NAs, | or v^{*} | |||
normalization, | ||||
treating as sample | ||||
Williamson | williamson() | vector x, | remove NAs | value: WI |
weighting | ||||
vector P_{i} | ||||
MSSD | mssd() | vector x | remove NAs | value: MSSD |
All indicators | disp() | vector x | weighting vector P_{i}, | matrix with 13 |
or vectors x_{1},x_{2},... | remove NAs | (no weighting) | ||
from dataframe | or 19 indicators | |||
(incl. weighted) | ||||
Regional inequality with respect to health care providers is a topic of high societal significance. In Germany, the health care planning system (Kassenärztliche Bedarfsplanung) attempts to flatten the disparities of local health care provision (Kassenärztliche Bundesvereinigung, 2013). Here, we analyze small-scale regional disparities in health care provision in two neighboring German counties (Göttingen and Northeim) using the data on medical practices and local population from Wieland, Dittrich (2016). The data is stored in the datasets GoettingenHealth1 and GoettingenHealth2, both included as example datasets in the REAT package. The study area is segmented into 420 districts, representing either city districts of larger cities or villages and hamlets.
The dataset GoettingenHealth2 contains these 420 regions with an individual ID (column district) and geographic coordinates (columns lat and lon, respectively) and the number of general practitioners, psychotherapists and pharmacies located there (columns phys_gen, psych and pharm, respectively) as well as the local population (column pop). First, we load the dataset:
Now, we investigate how the health care providers are dispersed over the whole area. In the first step, we calculate the Gini coefficient for the concentration of general practitioners using the REAT function gini():
The empirical Gini coefficient is equal to 0.839, indicating a relatively strong concentration. If we want to calculate the normalized (unbiased) indicator instead, we use the same function with the optional argument coefnorm = TRUE:
In the same way, we calculate e.g. the Herfindahl-Hirschman index, non-normalized and normalized:
Remember that the minimum of HHI is 1∕n (here: 1∕420 ≈ 0.00238) and the minimum of HHI^{*} is equal to zero.
If we want to inspect the concentration graphically, we could use the Lorenz curve, which can be plotted using either the functions gini() or lorenz(). Here, we use gini(), tell the function to plot the curve (lc = TRUE), and include several graphical parameters (such as lc.col for the color of the Lorenz curve or lcx and lcy for the x/y axes labels). As we want to compare the population distribution to the location distribution, we start by plotting the Lorenz curve for the local population:
Now, we overlay the Lorenz curves of general practitioners and psychotherapists, which means adding two more curves (function argument add.lc = TRUE):
Our commands result in the output of Figure 1, showing three Lorenz curves (population, general practitioners and psychotherapists) and the line of equality (diagonal). All three empirical distributions differ from an equal distribution. In about 72% of the regions, representing about 23% of the whole population (orange curve; G ≈ 0.584), no general practitioner is located (red curve; G ≈ 0.839). But the psychotherapists are more concentrated, as they are located only in about 13% of all districts (blue curve; G ≈ 0.933). As we can see, the physicians are more concentrated than the inhabitants but the psychotherapists are more concentrated than the physicians.
Now, we calculate all mentionened concentration and dispersion coefficients at once for all three types of providers using the function disp(), including a population weighting:
Our output is:
We conclude that any concentration/dispersion measure is the highest for psychotherapists and the lowest for the general practitioners, while the values for pharmacies lie between them. The regional disparities with respect to pharmacies are higher than those with respect to general practitioners, while the most unequal distribution is that of psychotherapists. In other words: The pharmacies are more spatially concentrated than the general practitioners and the psychotherapists are the most concentrated health locations here.
In most cases, population weighting reduces the coefficient values. That is, because districts with a large (small) population have a high (low) impact on the resulting coefficient and the districts without health service providers are also small districts. Furthermore, as the regarded variables contain zero values (which means no health service locations), the Theil coefficient (including the term ln(∕x_{i})) and the Dalton coefficient (including the n-th root) cannot be computed, resulting in an output of NA.
The visible output of any function presented above can be saved in a new R object:
We can simply access our result:
The function disp() returns a matrix with 13 rows (when only unweighted coefficients are computed) or 19 rows (in the case of additional weighted coefficients) and one column for each regarded variable:
We call our results:
Regional convergence is derived from (regional) growth theory (for an extensive survey, see Barro, Sala-i Martin 2004) and means the decline of regional disparities over time. The neoclassical growth model states that a region’s economic output (e.g. GDP per capita) depends on its stock of factors of production, capital and labor (aggregate production function), on condition of constant returns to scale and diminishing marginal product of the factor inputs. As a consequence, regions with a high (low) initial level of factor input grow slower (faster) than “poor” (“rich”) regions, what is called beta convergence. It is assumed that all regions converge to the same regional output level (steady-state). Sigma convergence means the decline of regional inequality with respect to regional output over time itself (Allington, McCombie, 2007; Capello, Nijkamp, 2009).
Both types of convergence can be tested empirically, as presented in Table 3. When testing for beta convergence, the natural logarithms of output growth over T time periods in i regions is regressed against the natural logarithms of the initial output values at time t. The original convergence formula was presented by Barro, Sala-i Martin (2004) using a nonlinear least squares (NLS) estimation approach. But in many cases, a linear transformation is used which allows for ordinary least squares (OLS) estimation (Allington, McCombie, 2007; Dapena et al., 2016; Schmidt, 1997; Young et al., 2008). The outcome variable of the convergence equation can be the regional growth between two years (e.g. Young et al. 2008) or the average growth rate per year (e.g. Goecke, Hüther 2016; Puente 2017; Weddige-Haaf, Kool 2017). Significance tests are carried out with t-tests for the regression coefficients and, in the OLS case, the F-test for the significance of R^{2}.
The estimated parameter of interest is the slope of the model, here denoted β (that is why the modeled process is called beta convergence): If β < 0 and statistically significant, there is absolute beta convergence. If additional variables (conditional variables) are included into the convergence equation, we have a test for conditional beta convergence. A further interpretation of the β coefficient is possible using the speed of convergence, λ, and H, the so-called half-life, which means the time (measured in the regarded time periods) to reduce the regional disparities by one half (Allington, McCombie, 2007; Schmidt, 1997).
Sigma convergence (which is named after the Greek letter for the standard deviation, σ) can be tested in two ways depending on the number of time periods: The regional inequality between all regions at time t is measured using the standard deviation, σ_{t}, or the coefficient of variation, cv_{t}, for the GDP per capita in its original or natural-logged form. If only two years are regarded, the quotient of both parameters is computed. If e.g. σ_{t1} > σ_{t2}, the regional inequality has declined from t1 to t2. A significance test can be applied with a simple ANOVA (analysis of variance), where the test statistic is the quotient of the underlying variances (σ^{2}) (Furceri, 2005; Schmidt, 1997; Young et al., 2008). Within a time series, the dispersion parameter is regressed (and plotted) against time. If the slope coefficient of time is negative, there is sigma convergence (Goecke, Hüther, 2016; Huang, Leung, 2009; Schmidt, 1997).
Type of convergence | Two time periods | More than two time periods | |
Beta convergence | absolute
| ||
and estimation type | NLS | NLS | |
ln() = | ∑_{ t=1}^{T} ln() = | ||
α - [] ln(Y _{i,t1}) + ϵ | α - [] ln(Y _{i,t1}) + ϵ | ||
OLS | OLS | ||
ln() = | ∑_{ t=1}^{T} ln() = | ||
α + β ln(Y _{i,t1}) + ϵ | α + β ln(Y _{i,t1}) + ϵ | ||
conditional | |||
NLS | NLS | ||
ln() = | ∑_{ t=1}^{T} ln() = | ||
α - [] ln(Y _{i,t1}) + θX_{i} + ϵ | α - [] ln(Y _{i,t1}) + θX_{i} + ϵ | ||
OLS | OLS | ||
ln() = | ∑_{ t=1}^{T} ln() = | ||
α + β ln(Y _{i,t1}) + θX_{i} + ϵ | α + β ln(Y _{i,t1}) + θX_{i} + ϵ | ||
β < 0 | β < 0 | ||
Convergence speed: λ =
| |||
Half-life: H =
| |||
Sigma convergence | σ_{t} = or | ||
cv_{t} = | |||
> 1 or | σ = a + bt + ϵ or | ||
> 1 | cv = a + bt + ϵ | ||
Test statistic: | b < 0 | ||
Compiled from: Allington, McCombie (2007); Barro, Sala-i Martin (2004); Furceri (2005); Schmidt (1997)
Table 4 shows the functions for beta and sigma convergence as implemented in REAT. The analysis of beta convergence is provided by the functions betaconv.ols() and betaconv.nls() for OLS and NLS estimation, respectively. Speed of convergence and half-life can be computed with the function betaconv.speed(). The ratio test of sigma convergence for two time periods can be done using the function sigmaconv(), while a trend regression over time is implemented into the function sigmaconv.t(). Both convergence types can be analyzed at once with the function rca(), which is a wrapper for all functions mentioned above.
The functions require (at least) two numeric vectors, containing the regarded variable Y (e.g. GDP per capita) for at least two different time periods, e.g. from the same data frame. Also the start and end time periods (t_{1} and t_{T}) have to be stated. Optionally, a graphical output can be generated (scatterplot for beta convergence, line plot for sigma convergence with respect to longitudinal data). Furthermore, when analyzing sigma convergence, the user can choose whether Y should be log-transformed or not and/or which sigma measure is computed (variance, standard deviation or coefficient of variation; weighted or non-weighted).
Note that, unlike the functions for regional inequality indicators (Section 2), the REAT functions for regional convergence distinguish between a visible and an invisible output. The latter can be saved as a new R object. While the visible output shows the main results, the invisible output goes beyond that: betaconv.ols(), betaconv.nls() and rca() return a list, which is the most flexible data type in R, because it consists of a non-predetermined number of different data objects. Apart from the model results, e.g. the (transformed) regression data is returned in this invisible output.
Convergence | REAT function | Mandatory arguments | Optional arguments | Output |
Beta | betaconv.ols() | vectors Y _{i,t1} and | Conditions, | visible: model |
convergence | Y _{i,t2},...,Y _{i,T}, | scatterplot | estimates, invisible: | |
t_{1} and t_{T} | list with model | |||
estimates and | ||||
regression data, | ||||
optional: plot | ||||
betaconv.nls() | vectors Y _{i,t1} and | Conditions, | visible: model | |
Y _{i,t2},...,Y _{i,T}, | scatterplot | estimates, invisible: | ||
t_{1} and t_{T} | list with model | |||
estimates and | ||||
regression data, | ||||
optional: plot | ||||
betaconv.speed() | values β | matrix with | ||
and T | λ and H | |||
Sigma | sigmaconv() | vectors Y _{i,t1} and | Sigma measure, | visible: estimates, |
convergence | (when T = 2) | Y _{i,t2}, t_{1} and t_{T} | log, weighting, | invisible: matrix |
normalization | with estimates | |||
sigmaconv.t() | vectors Y _{i,t1} and | Sigma measure, | visible: model | |
(when T > 2) | Y _{i,t2},...,Y _{i,T}, | log, weighting, | estimates, invisible: | |
t_{1} and t_{T} | normalization, | matrix with | ||
line plot | model estimates, | |||
optional: plot | ||||
All at once: | ||||
Beta and | rca() | vectors Y _{i,t1} and | Beta estimation, | visible: model |
sigma | Y _{i,t2},...,Y _{i,T}, | conditions, | estimates, invisible: | |
convergence | t_{1} and t_{T} | scatterplot, | list with model | |
sigma measure, | estimates and | |||
log, weighting, | regression data, | |||
line plot | optional: plot | |||
In this example, we look at regional convergence in Germany. The REAT package includes the example dataset G.counties.gdp with the GDP (gross domestic product), the population and the GDP per capita for the 402 counties (“Kreise”) in Germany 1992 to 2014 (complete data only for 2000-2014). First, we load the dataset:
In our case, we prevent scientific notation of numbers in R and set a limit of 4 digits:
We need the columns named gdppcxxxx, containing the GDP per capita for each year, e.g. G.counties.gdp$gdppc2010 contains the GDP per capita for 2010. In the first step, we test absolute beta convergence comparing the years 2010 and 2014 with OLS estimation using the function betaconv.ols():.
The output is:
We see that both regression coefficients, α and β, are statistically significant (t ≈ 5.50 and -3.99, respectively, both p < 0.001) and the linear regression model is significant as a whole (F ≈ 15.92, p < 0.001). The negative sign of β shows that, on average, the higher the initial GDP per capita, the lower its growth, which indicates absolute beta convergence. However, the convergence process is very slow: The speed of convergence, represented by λ, shows a harmonization by 0.185% per year. This implies that the output gap will be reduced by 50% in approximately 375 years.
Now we check sigma convergence for the same time using the function sigmaconv(). We choose the coefficient of variation as measure, while using the GDP per capita values in their original form:
The output is:
The coefficient of variation is a little smaller in 2014, which means the spatial inequality declined between 2010 and 2014. The quotient of the variances is slightly above one (F = σ_{2010}^{2}∕σ_{2014}^{2} ≈ 1.04), but not statistically significant (p ≈ 0.71).
When analyzing regional convergence with REAT, it is preferable (and more convenient) to use the wrapper function rca(). Instead of repeating the results above, we test for (absolute) beta and sigma convergence between 2000 and 2014. The analysis of sigma convergence uses trend regression (function argument sigma.type = "trend") for the coefficient of variation (sigma.measure = "cv"). We also want plots for both convergence types (beta.plot = TRUE and sigma.plot = TRUE, respectively) with specific axis labels (e.g. beta.plotX = "Ln (initial GDP p.c.)"). Our code is:
This results in the following output:
This function also produces the plots in Figures 2a and 2b, both showing a declining curve, which is a first indication of both beta and sigma convergence. The beta convergence model is statistically significant (F ≈ 52.06, p < 0.001), as well as the coefficients α (t ≈ 9.63, p < 0.001) and β (t ≈-7.21, p < 0.001). Again, we find evidence for absolute beta convergence because of a negative slope (β ≈-0.007). The trend regression model for sigma convergence is significant (F ≈ 281.4, p < 0.001). The slope is significant and negative (b ≈-0.00026, t ≈ 17.91, p < 0.001), which indicates sigma convergence. However, both types of convergence can be regarded as very slow processes: The half-life value shows that, resulting from the beta convergence model, the regional disparities in GDP per capita will be halved in approximately 1,356 years. When looking at the trend regression, we see that the coefficient of variation declines only by 0.00026 per year. Another aspect is that we only regarded absolute beta convergence, ignoring other spatial effects or the impact of regional policy. The latter is also not considered in neoclassical regional growth theory.
Remembering German reunification, we want to test if there are average growth differences between West Germany and East Germany (former German Democratic Republic), which leads to conditional beta convergence. The dataset G.regions.emp contains the column regional, where the counties are attributed either to West or East Germany, expressed as character string ("West" or "East"). We need to include our condition into the convergence equation. Thus, we use the REAT function to.dummy() to create dummy variables (1/0) out of (nominal scaled) variables, and add the indicator for West Germany (1, otherwise 0) to our data:
Now, we test for conditional beta and sigma convergence, including the condition “West”, again using the rca() function, but without plots and using the standard deviation (default setting) instead of the cv for sigma convergence. This time, we save the results in an object:
The output is:
In the rca() output, we can compare the results of absolute and conditional beta convergence. In the conditional model, the explained variance increases from R^{2} ≈ 0.12 to R^{2} ≈ 0.18, which indicates an increased explanatory power of the model due to the added condition variable. Both models are statistically significant, also the β values are negative and significant (p < 0.001 in both cases). The condition “West” is significant (t ≈-5.50, p < 0.001) and negative, which means that, on average, the GDP per capita in West German counties grew slower than in East Germany. These results seem to support the convergence hypothesis from growth theory, but one should not forget that e.g. political aspects (such as the German and/or EU regional policy) are not considered in this simple analysis.
As we have saved the invisible function output, we can access specific parts of our analysis, such as the regression data for the absolute convergence model:
If we want to look at the single sigma values, we can address them via:
Specialization of regions or countries and the spatial concentration of industries or firms are phenomena linked to several research fields in regional economics and economic geography: Specialization is a key point in traditional theories of international trade with respect to comparative advantages (Ricardo, 1821) as well as in the generation of the “New Trade Theory” (introduced by Krugman 1979). Spatial clustering of firms or industries due to agglomeration economies is a perennial issue in all spatial economic fields. It especially reemerged in the context of the “New Economic Geography” (e.g. Krugman 1991; Fujita et al. 2001) as well as through the work of Porter (1990) regarding clusters. The common indicators are broadly discussed in Farhauer, Kröll (2014) or Nakamura, Morrison Paul (2009). For studies comparing some different indicators, see e.g. Goschin et al. (2009); Moga, Constantin (2011); Palan (2017).
When looking at the family of indicators of regional specialization and industry concentration, we have to distinguish between indicators for aggregate data, such as regional employment data, and those requiring individual firm data. The first group, compiled in Table 5, can be differentiated into indicators of specialization and indicators of spatial concentration. As both types of agglomeration are closely linked to each other, so are the corresponding indicators. The empirical basis of all those measures is the employment e in industry i in region j, e_{ij}. This employment stock is compared to some reference, mostly including the total employment in region j, e_{j}, and/or the total employment in industry i, e_{i}, as well as the all-over employment e. The individual firm level indicators in Table 6 can be segmented into indicators for agglomeration of one industry due to localization economies and indicators for the coagglomeration of different industries due to urbanization economies.
Indicator | Specialization of region j | Spatial concentration of industry i |
Hoover/Balassa | LQ_{ij} = ≡ MRCA_{ij} = | |
LQ_{j} = ∑_{ i=1}^{I}LQ_{ij} | LQ_{i} = ∑_{ j=1}^{J}LQ_{ij} | |
Extensions: | ||
O’Donoghue-Gleave | SLQ_{ij} = | |
Tian | SLLQ_{ij} = | |
Hoen-Oosterhaven | ARCA_{ij} = - | |
Hoover | H_{j} = [∑_{ i=1}^{I}| -|] | H_{i} = [∑_{ j=1}^{J}| -|] |
0 ≤ H_{j} ≤ 1 | 0 ≤ H_{i} ≤ 1 | |
Gini | G_{j} = ∑_{ i=1}^{I}λ_{i}(R_{i} -) | G_{i} = ∑_{ j=1}^{J}λ_{j}(C_{j} -) |
0 ≤ G_{j} ≤ 1 | 0 ≤ G_{i} ≤ 1 | |
where: R_{i} = , | where: C_{j} = , | |
= ∑_{ i=1}^{I}R_{i} and | = ∑_{ j=1}^{J}C_{j} and | |
λ_{i} = 1,...,I (λ_{i} < λ_{i+1}) | λ_{j} = 1,...,J (λ_{j} < λ_{j+1}) | |
Krugman | K_{jl} = ∑_{ i=1}^{I}|s_{ij}^{s} - s_{il}^{s}| | K_{iu} = ∑_{j=1}^{J}|s_{ij}^{c} - s_{uj}^{c}| |
(J = 2, I = 2) | 0 ≤ K_{jl} ≤ 2 | 0 ≤ K_{iu} ≤ 2 |
where: s_{ij}^{s} = and s_{il}^{s} = | where: s_{ij}^{c} = and s_{uj}^{c} = | |
Extensions: | ||
Midelfart et al., | K_{j} = ∑_{ i=1}^{I}|s_{ij}^{s} - _{il}^{s}| | K_{i} = ∑_{j=1}^{J}|s_{ij}^{c} - _{uj}^{c}| |
Vogiatzoglou | 0 ≤ K_{j} ≤ 2 | 0 ≤ K_{i} ≤ 2 |
(J > 2, I > 2) | where: s_{ij}^{s} = and | where: s_{ij}^{c} = and |
= ∑_{ i}^{J}s_{il}^{s}, | _{il}^{s}= ∑_{ u}^{I}s_{uj}^{c}, | _{uj}^{c}|
l≠j | u≠i | |
Duranton-Puga | RDI_{j} = | |
where: s_{ij}^{s} = and s_{i} = | ||
Litzenberger-Sternberg | CI_{ij} = | |
where IS_{ij} = , ID_{ij} = | ||
and PS_{ij} = | ||
Compiled from: Farhauer, Kröll (2014); Hoen, Oosterhaven (2006); Hoffmann et al. (2017); Nakamura, Morrison Paul (2009); O’Donoghue, Gleave (2004); Tian (2013); Schätzl (2000); Störmann (2009)
Indicator | Agglomeration | Coagglomeration |
Ellison-Glaeser | γ_{i} = | γ^{c} = |
where: G_{i} = ∑_{ j=1}^{J}(s_{ij}^{c} - s_{j})^{2}, | where: G = ∑_{j=1}^{J}(x_{j} - s_{j})^{2}, | |
s_{ij}^{c} = , s_{j} = and | x_{j} = ∑_{ i=1}^{U}, s_{j} = , s_{i} = | |
HHI_{i} = ∑_{ k=1}^{K}()^{2} | and HHI_{U} = ∑_{ i=1}^{U}s_{i}^{2}HHI_{i} | |
z-standardization: | ||
z_{i} = | ||
where: var(G_{i}) = 2HHI_{i}^{2}∑_{ j=1}^{J}s_{j}^{2} | ||
-2 ∑_{ j=1}^{J}s_{j}^{3} + (∑_{j=1}^{J}s_{j}^{2})^{2}- | ||
∑_{ k=1}^{K}z_{ik}^{4}∑_{ j=1}^{J}s_{j}^{2} - 4 ∑_{j=1}^{J}s_{j}^{3} | ||
+3(∑_{ j=1}^{J}s_{j}^{2})^{2} | ||
Howard et al. | CL_{ab} = | |
XCL_{ab} = CL_{ab} - CL_{ab}^{RND} | ||
where: C_{kl} = 1 if firms k and l are located | ||
in the same region and C_{kl} = 0 otherwise | ||
Compiled from: Farhauer, Kröll (2014); Howard et al. (2016); Nakamura, Morrison Paul (2009)
The most popular indicator is the Location Quotient (LQ), which is attributed to Hoover (1936) and mathematically equivalent to the Revealed Comparative Advantage (RCA) index, developed by Balassa (1965) in the context of international trade. The LQ is utilized in many studies (e.g. Bai et al. 2008; Kim 1995) as well as in the OECD Territorial Reviews (OECD, 2019). Following O’Donoghue, Gleave (2004) and Tian (2013), the original formulation can be extended: As the location quotient is not normalized, there is no cut-off value for defining a cluster, which leads to a standardization of the computed values via z-transformation. Hoen, Oosterhaven (2006) developed an additive alternative to the RCA index. The original LQ provides the main mathematical basis for several indicators developed later, such as the spatial Gini coefficients described below.
Some indicators which are known from the context of regional inequality (see Section 2) are also used for the analysis of agglomeration: A modification of the Gini coefficient is used for the spatial concentration of industries as well as regional specialization (e.g. Ceapraz 2008; Wieland, Fuchs 2018). As we can see in the calculation of R_{i} and C_{j}, respectively, the spatial Gini coefficient is based on the LQ. Another popular option for analyzing agglomeration is the Hoover coefficient, comparing the structure of an industry/a region to a reference structure of all industries/regions (e.g. Dixon, Freebairn 2009; Jiang et al. 2007). Both indicator types range between zero (no specialization/concentration) and one (total specialization/concentration). Also the Herfindahl-Hirschman index and its derivates are used to measure concentration, specialization and diversification (e.g. Duranton, Puga 2000; Goschin et al. 2009; Lehocký, Rusnák 2016).
Another type of specialization/concentration indicator was introduced by Krugman (1991), originally designed for comparing the specialization of two regions. An extension of this indicator was established by Midelfart-Knarvik et al. (2000) for the comparison of regional specialization/industry concentration with respect to the sum or mean of all regions/industries (furthermore used e.g. by Haas, Südekum 2005; Vogiatzoglou 2006). Unlike the Gini- or Hoover-type measures, the Krugman coefficients range between zero (no specialization/concentration) and two (total specialization/concentration).
The cluster index developed by Litzenberger, Sternberg (2006) goes beyond employment data and includes additional information about the industry-specific firm size, population density and region size. It is composed of three parts: the relative industrial stock with respect to industry i and region j, IS_{ij}, the relative industrial density, ID_{ij}, and the relative firm size, PS_{ij}. All three components are modified location quotients. This is done to control for small and monostructural regions, which are identified as clusters otherwise (which is a problem in the original LQ). The cluster index CI_{ij} has a potential range from zero to infinity. This extended indicator is used e.g. by Hoffmann et al. (2017) for the German food processing industry.
The cluster indicators by Ellison, Glaeser (1997) compare the empirical distribution of firms to an arbitrary location pattern where agglomeration economies are absent (often referred to as a dartboard approach). Ellison, Glaeser (1997) differentiate between the clustering of firms from one industry (agglomeration) due to localization economies and the clustering of multiple industries (coagglomeration) due to urbanization economies. Their indices also take into account the industry-specific structure of the firms by including the Herfindahl-Hirschman index, HHI_{i}, for the employment concentration in industry i. This is the reason why individual firm-level data is required for the computation. The Herfindahl-Hirschman indicator is included to control the raw measures of spatial concentration, G_{i} and G, for firm employment concentration, which occurs especially when there are just a few firms with many employees. The Ellison-Glaeser (EG) index for agglomeration, γ_{i}, is designed for identifying the clustering of industry i, while the coagglomeration index, γ_{c} aims at the clustering of a set of U industries, where U ≤ I. Values of γ equal to zero imply the absence of agglomeration economies, while values above zero indicate positive effects due to spatial clustering. When γ is negative, firm locations are less spatially concentrated than expected on condition of the dartboard approach, which indicates negative agglomeration economies. The EG index is used in several current regional economic studies (e.g. Dauth et al. 2015, 2018; Yamamura, Goto 2018).
In contrast, Howard et al. (2016) argue that agglomeration economies should not be analyzed regarding employment but the firms itself. Their colocation index, CL_{ab}, sums the colocation of K_{i} and K_{q} firms from two industries, i and q, controlling for all possible combinations. This colocation measure is compared to a counterfactual location structure constructed via bootstrapping; specifically the arithmetic mean of a number of (e.g. 50) random assignments of the regarded firms to the locations. The value of the resulting excess colocation index, XCL_{ab}, ranges between -1 and 1.
Table 7 shows the REAT functions for agglomeration measures based on aggregate (employment) data. All functions require at least information about the employment in one or more regions j in one or more industries i, e_{ij}. The Herfindahl-Hirschman index (function herf()) for measuring regional diversity is not displayed as it is used exactly in the same way as described in Section 2, replacing x_{i} with e_{ij}.
Indicator | REAT function | Mandatory arguments | Optional arguments | Output |
Hoover LQ/ | locq() | vectors or single | LQ method, | Single value or |
Balassa RCA | values of e_{ij} and e_{i}, | plot | matrix with LQ_{ij} | |
incl. extensions | single values of | |||
e_{j} and e | ||||
locq2() | vectors of e_{ij}, | normalization, | matrix or data | |
industry ID i | output type, | frame with I * J | ||
and region ID j | remove NAs | values of LQ_{ij} | ||
Hoover | hoover() | vectors of e_{ij} | remove NAs | value: H_{j} |
specialization/ | (see Section 2) | and reference | or H_{i} | |
concentration | vector e_{i} or e_{j} | |||
Gini | gini.spec() | vectors e_{ij} | plot LC | value: G_{j}, |
specialization | and e_{i} | optional: LC plot | ||
concentration | gini.conc() | vectors e_{ij} | plot LC | value: G_{i}, |
and e_{j} | optional: LC plot | |||
Krugman | krugman.spec() | vectors e_{ij} | value: K_{jl} | |
specialization | (regions j and l) | and e_{il} | ||
krugman.conc2() | vector e_{ij} and matrix | value: K_{j} | ||
(all J regions) | or data frame e_{il} | |||
concentration | krugman.conc() | vectors e_{ij} | value: K_{iu} | |
(industries i and u) | and e_{uj} | |||
krugman.conc2() | vector e_{ij} and matrix | value: K_{i} | ||
(all I industries) | or data frame e_{uj} | |||
All at once: | ||||
specialization | spec() | vectors of e_{ij}, | remove NAs | matrix with H_{j}, G_{j} |
industry ID i | and K_{j} (columns) | |||
and region ID j | for J regions (rows) | |||
concentration | conc() | vectors of e_{ij}, | remove NAs | matrix with H_{i}, G_{i} |
industry ID i | and K_{i} (columns) | |||
and region ID j | for I industries (rows) | |||
Duranton- | durpug() | vectors e_{ij} | value: RDI_{j} | |
Puga | and e_{i} | |||
Litzenberger- | litzenberger() | single values of | value: CI_{ij} | |
Sternberg | e_{i}j, e_{i}, a_{j}, a, | |||
p_{j}, p, b_{ij} and b_{i} | ||||
litzenberger2() | vectors of e_{ij}, | output type, | matrix or data | |
industry ID i, | remove NAs | frame with I * J | ||
region ID j, | values of CI_{ij} | |||
a_{j}, p_{j} and b_{ij} | ||||
Location quotients for one region and one or more industries are computed by the function locq(), including the option for an additive indicator instead of the multiplicative. When calculating the LQ for a set of J regions and I industries, one can use function locq2(), which is a kind of batch processing extension of locq(). As the dimension of the Litzenberger-Sternberg cluster index is the same as in the LQ (a single value for each combination of region j and industry i), the related functions litzenberger() and litzenberger2() work in the same way. When using locq2() or litzenberger2(), the user may choose the type of function output: either a matrix with I columns and J rows or a data frame with I * J rows.
The Hoover-, Gini- and Krugman-type indicators require the same kind of input data. The hoover() function was already explained in Section 2, as it can be also used for measuring spatial concentration of industries or the specialization of regions with all-over employment vectors, e_{i} and e_{j}, respectively, as reference distributions. The spatial Gini coefficients are available through functions gini.spec() for regional specialization and gini.conc() for spatial concentration. The Krugman coefficients are divided into functions for the comparison of two regions/industries (krugman.spec() and krugman.conc(), respectively) and for applying all regions/industries as reference (krugman.spec2() and krugman.conc2(), respectively). The functions spec() and conc() are wrapper functions providing a convenient way to compute Hoover, Gini and Krugman coefficients of a given set of J regions and I industries at once, e.g. originating from official statistics on regional employment.
Indicator | REAT function | Mandatory arguments | Optional arguments | Output |
Ellison-Glaeser | ellison.a() | vectors of e_{ik}, e_{j} | visible: value γ_{i}, | |
agglomeration | and region ID j | invisible: matrix with γ_{i}, | ||
G_{i}, z_{i}, K_{i} and HHI_{i} | ||||
ellison.a2() | vectors e_{ik}, | visible: values γ_{i}, | ||
industry ID i and | invisible: matrix with γ_{i}, | |||
region ID j | G_{i}, z_{i}, K_{i} and HHI_{i}, | |||
for I industries (rows) | ||||
coagglomeration | ellison.c() | vectors e_{ik}, | vectors e_{j} and | value: γ^{c} |
industry ID i and | U industries | |||
region ID j | ||||
ellison.c2() | vectors e_{ik}, | vector e_{j} | matrix with γ^{c} for | |
industry ID i and | I * I - I industry | |||
region ID j | combinations (rows) | |||
Howard et al. | howard.cl() | firm ID k, | value: CL_{ab} | |
colocation | industry ID i, | |||
and region ID j, | ||||
industries a and b | ||||
excess | howard.xcl() | firm ID k, | value: XCL_{ab} | |
colocation | industry ID i | |||
and region ID j, | ||||
industries a and b, | ||||
no. of samples | ||||
howard.xcl2() | firm ID k, | matrix with XCL_{ab} for | ||
industry ID i | I * I - I industry | |||
and region ID j | combinations (rows) | |||
Table 8 shows the functions operating on the level of individual firm data. The Ellison-Glaeser (EG) indices are available through the functions ellison.a() (agglomeration index for industry i) and ellison.a2() (agglomeration indices for I industries) as well as ellison.c() (coagglomeration index for U industries) and ellison.c2() (coagglomeration indices for I * I - I industry combinations). All functions require the firm size (e.g. no. of employees) for the k-th firm from industry i (numeric vector) and the region j the firm is located in. The functions incorporating more than one industry (all except for ellison.a()) require a vector containing the industry i. The data could e.g. be stored in a data frame with at least three columns (firm size, region, industry). Like some of the convergence functions (see Section 3), the EG agglomeration index functions in REAT also distinguish between a visible and an invisible output: ellison.a() and ellison.a2() show the value(s) auf γ_{i} but return an invisible matrix including the raw measure of concentration (G_{i}), the z-standardized results (z_{i}) and the related Herfindahl-Hirschman index for industry-specific firm concentration (HHI_{i}) as well as the number of firms in industry i (K_{i}).
The Howard-Newman-Tarp coagglomeration measure is distributed over the functions howard.cl() (calculation of the colocation index for one pair of industries a and b), howard.xcl() (calculation of the excess colocation index for industries a and b) and howard.xcl2() (calculation of the excess colocation index for I * I - I combinations of I industries). As this cluster index works with firms instead of employment, we only need a vector containing the IDs of the firms k, the corresponding industry i and the region j where the firm is located. When calculating this measure for one pair of industries, the user must state the IDs of industries a and b. Note that calculation time for this index increases heavily with the number of firms and/or industries.
We use the German classification of economic activities (WZ2008) on the level of 21 sections (A-U) for the classification of industries in the following examples (see Table 9).
WZ2008 | |
Code | Title |
A | Agriculture, forestry and fishing |
B | Mining and quarrying |
C | Manufacturing |
D | Electricity, gas, steam and air conditioning supply |
E | Water supply; sewerage, waste management and remediation activities |
F | Construction |
G | Wholesale and retail trade; repair of motor vehicles and motorcycles |
H | Transportation and storage |
I | Accommodation and food service activities |
J | Information and communication |
K | Financial and insurance activities |
L | Real estate activities |
M | Professional, scientific and technical activities |
N | Administrative and support service activities |
O | Public administration and defence; compulsory social security |
P | Education |
Q | Human health and social work activities |
R | Arts, entertainment and recreation |
S | Other service activities |
T | Activities of households as employers; undifferentiated goods-and services-producing |
activities of households for own use | |
U | Activities of extraterritorial organisations and bodies |
Starting with a simple example, we analyze the regional specialization of Göttingen, a city with a population of about 134,000 in Niedersachsen, Germany. The example dataset Goettingen, which is included in REAT, contains the dependent employees in Göttingen and Germany for 2008 to 2017 in industries A to R (rows 2 to 16; row 1 contains the all-over employment). First, we load the data:
Using the REAT function locq(), we calculate a location quotient for Göttingen with respect to the manufacturing industry (”Verarbeitendes Gewerbe”), which is represented by letter C:
The output is simply the LQ value (LQ_{ij}, where i is manufacturing and j is Göttingen). We see that the LQ is very low, indicating that manufacturing is underrepresented in Göttingen as compared to Germany. Now, we calculate LQ values for all industries (A-R), including a simple plot (function argument plot.results = TRUE):
The output is a matrix with one row for each industry:
The result is plotted in Figure 3. The function plots a vertical line at LQ_{ij} = 1 automatically. This is the (only) reference value for the LQ. It indicates a stock of the related industry equal to the whole economy. The highest LQ values can be found for the industries with letters P (education) and Q (health). This is because Göttingen is mainly characterized by a large university (about 30,000 students) with a university hospital with about 7,000 employees.
Now, we want to measure the specialization of Göttingen with a single indicator. First, we simply use the Herfindahl-Hirschman coefficient for both Göttingen and Germany using the function herf():
The HHI for Göttingen is slightly larger than for Germany, which indicates a higher specialization (or lower economic diversity) of the region. To combine this information in one indicator, we calculate the Hoover coefficient of specialization using the function hoover(), where the reference distribution is the German industry structure:
We finish our analysis of Göttingen’s regional specialization by calculating both the Gini and the Krugman coefficient of regional specialization with the same data, using the REAT functions gini.spec() and krugman.spec(), respectively. Note that, here, we use the Krugman coefficient to compare the industry structure of Göttingen to the structure of whole Germany (instead of another region within the country, for which this coefficient was originally formulated):
There seems to be some specialization in Göttingen, but, unfortunately, we do not have any real reference value to interpret the results.
In this example, we will compute indicators of regional specialization and industry concentration for a set of J regions and I industries at once. We load the included test dataset G.regions.industries containing employment and firms on the level of I = 17 industries (WZ2008 codes B-S) and J = 16 regions (“Bundesländer”) in Germany:
The number of employees in the column emp_all includes dependent employees and self-employed persons. The classification code of industries (see Table 9) can be found in column ind_code, while the region code (abbreviation of the region’s official name) is in column region_code. First, we want to detect the spatial concentration of the 17 industries in Germany by calculating Hoover, Gini and Krugman coefficients for all industries at once, applying the REAT function conc() which is a wrapper function for the mentioned indicators. We save our output in the matrix object conc_i:
The output is:
The function returns a matrix with 17 rows (one for each industry) and three columns: H i is the Hoover coefficient, G i is the Gini coefficient and K i is the Krugman coefficient for industry i. We cannot interpret or compare all of these results, but we may pick out some findings: The strongest spatial concentration is found with respect to mining and quarrying (WZ08-B), no matter which indicator is regarded, which may be interpreted with “natural advantages” due to the spatial distribution of mineral resources in Germany. Services (such as retailing) as well as education and health are least concentrated, as these industries are bound to regional demand and/or their locations are regulated by policy and planning authorities.
At a first glance, the three indicators seem to produce similar results. Now, we want to test the similarity between Hoover, Gini and Krugman coefficients of concentration. As we saved our result matrix, we now calculate Pearson correlation coefficients (r) for each pair of indicators using the basic R function cor(), which is implemented in the stats package (included automatically in any R release). The function is applied to the three columns of conc_i, producing a 3 * 3 correlation matrix:
As we can see, each combination of the three indicators shows a strong positive correlation (H_{i} vs. G_{i}: r ≈ 0.97, H_{i} vs. K_{i}: r ≈ 0.95, G_{i} vs. K_{i}: r ≈ 0.97). At least in this context, we may conclude that these indicators are interchangeable. However, we have to recognize that the analysis presented here is on a large-scale regional level (German “Bundesländer”) and all of the mentioned indicators are affected by the modifiable areal unit problem, which means that the results depend on the aggregation unit in the analysis (see e.g. Dapena et al. 2016 for a discussion of this effect).
Now, we do exactly the same with respect to regional specialization of the 16 regions, using the same data. Analogously, we use the wrapper function spec() for calculating Hoover, Gini and Krugman coefficients of regional specialization, also saving the resulting matrix:
The output is:
The strongest specialization can be found in the city states Berlin (BE) and Hamburg (HH), while Niedersachsen (NI) and Nordrhein-Westfalen (NW) show the lowest values in all three indicators. As already mentioned in the concentration example, we have to remember the large-scale aggregation unit. If we used smaller scale units (e.g. counties like in Section 3.2.2), our results would surely be more differentiated. Again, we check the correlation between the indicators:
Again, we find a strong positive correlation between the Hoover coefficient and both Gini and Krugman coefficient (H_{j} vs. G_{j}: r ≈ 0.92, H_{j} vs. K_{j}: r ≈ 0.93), while the third Pearson correlation coefficient is a little lower, but still showing the same direction (G_{j} vs. K_{j}: r ≈ 0.79).
Now we check for clusters in a combination of a specific industry and a specific region. First, we calculate location quotients for the dataset G.regions.industries using the REAT function locq2(). Here, the optional function argument LQ.norm could be used for computing z-standardized location quotients according to O’Donoghue, Gleave (2004) (LQ.norm = "OG") or z-standardized values of the natural-logged LQs according to Tian (2013) (LQ.norm = "T"). However, we produce the original LQs, since we need exactly the same columns as in the examples above:
The output is a matrix with J rows and I columns:
These I * J = 17 * 16 = 272 coefficients are too much information. Thus, we calculate them again using the optional argument LQ.output = "df", which produces a data frame with I * J rows and three columns (j_region: ID of region j, i_industry: ID of industry i and LQ: location quotient LQ_{ij}). We save the results in the object lqs:
As we forego an inspection of these singe values, the results are not displayed here. Instead, we only deal with the five highest LQs in our results (the “top five”). We sort the resulting data frame decreasing and take a look at the first five rows:
The highest LQ is found for the arts, entertainment, and recreation sector (WZ08-R) in the German capital Berlin. Note that this result is congruent with several studies about the “creative class”, showing a large stock of “creative” employment in Berlin (e.g. Martin 2015). We also find a strong concentration of mining and quarrying in two Eastern regions, Brandenburg and Sachsen-Anhalt. Note that the LQ is a relative measure with respect to the total regional employment as well as the total industry-specific employment and the employment in the whole economy, not considering other aspects of industry or spatial structure.
These deficiencies should be overcome with the Litzenberger-Sternberg cluster index, also taking into account area, population and firm size. This additional data is also included in our current dataset (columns area_sqkm, pop and firms). The functions litzenberger() and litzenberger2() work equivalently to locq() and locq2(). To compute cluster indices for all I * J combinations, we use the function litzenberger2():
Like in locq2(), the default output is a matrix with I rows and J columns:
Note that there is a value equal to NaN, which means “not a number”, due to a division by zero; this is because there is no mining and quarrying (WZ08-B) in Bremen (HB). However, we take a look at the “top five” again:
Again, we find the largest cluster value for the arts and entertainment sector in Berlin. Also the other four highest indicators are discovered in the largest city states Berlin and Hamburg, especially with respect to the information and communication industry (WZ08-J) and other knowledge-intensive services. Obviously, the results of the Litzenberger-Sternberg index differ in a noticeable way from those of the LQ, which can be attributed to the consideration of other spatial aspects, especially controlling for the size of the regions.
In our last example about agglomerations, we use the Ellison-Glaeser indices and the Howard-Newman-Tarp colocation index, which both require individual firm data. As this kind of micro-data is sensitive and, of course, not available in official statistics, we have to use fictional data from the textbook by Farhauer, Kröll (2014).
At first, we compute the Ellison-Glaeser agglomeration index for one industry i, γ_{i}. We use the REAT function ellison.a(), which is designed for this purpose and requires three vectors: the size (employment) of firm k, e_{ik}, the IDs of the regions j each firm is located in, and the total regional employment, e_{j}. The numerical example in Farhauer, Kröll (2014), Table 14.11, contains ten firms in three regions (Wien, Linz, and Graz). We simply compile the data from the original table into separate vectors:
Now, we apply ellison.a() to this data:
The EG agglomeration index of γ_{i} ≈ 0.06, which is, by the way, the same result as in the textbook, indicates a stronger clustering than expected from a dartboard approach. Since this data is fictional, we refrain from interpreting this result.
The REAT package contains the dataset FK2014_EGC, which is compiled from the numerical example in Farhauer, Kröll (2014), Tables 14.14 to 14.17. There are k = 42 firms from I = 4 industries (clothing trade, forestry, textiles dyeing and textiles trade) in J = 3 regions (1, 2 and 3). We load this example data:
We compute γ_{i} for all industries in the dataset. This can be done with the function ellison.a2(), which requires vectors containing the size of firm k, the corresponding industry i, and region j. We save the results in the object ega:
Here, we see the output of the function:
We see a strong clustering of the forestry industry, which is attributed to localization economies, but spatial avoidance in the three other industries. The visible output of ellison.a2() contains the γ_{i} values only, but the invisible matrix output also includes the other information referring to the EG agglomeration index:
When looking at the forestry industry, we also see a high standardized value (z_{i} ≈ 1.37) and a relatively low firm concentration (HHI_{i} ≈ 0.09).
In the next step, we compute the EG coagglomeration index, γ^{c}, for the same data using the function ellison.c(). This function requires the same information as ellison.a2() plus the total employment in the regarded regions (column emp_region):
Congruent with the calculation in Farhauer, Kröll (2014), the function returns γ^{c} ≈ 12.07. This value is very large, which indicates urbanization economies in this fictional example.
If we want to analyze the coagglomeration of industry pairs instead, we may use the function ellison.c2(), which requires the same data:
The output is a matrix with I * I - I rows (one for each industry pair, omitting the combination of the same industry i):
If we want to focus on firm numbers instead of employment size, we may compute the Howard-Newman-Tarp excess colocation index, which is included in REAT through the functions howard.cl() for one colocation index for one pair of industries, howard.xcl() for the corresponding excess colocation index and howard.xcl2() for all combinations of I * I industries. Subsequent to the numerical example above, we calculate XCL_{ab} for all industry pairs in the dataset FK2014_EGC, where the firm ID of k is stored in the column firm:
The output has the same structure as the output from ellison.c2():
We see that the index by Howard et al. (2016) is structured differently than the indicators presented above: Although they are based on exactly the same data, the value for forestry and clothing trade (XCL ≈ 0.019) is not equal to the value for clothing trade and forestry (XCL ≈ 0.024). Why? The XCL_{ab} is the difference between the colocation index, CL_{ab}, and the mean of a set of bootstrap samples, CL_{ab}^{RND} (see Table 6). These random samples are drawn again each time a XCL value is computed, consequently, also the XCL value changes.
In this chapter, we mix two different concepts of indicators, accessibility and spatial proximity (see Table 10), both frequently used especially in the context of GIS (geographic information systems). Both concepts are discussed together because they have two aspects in common: 1) they are based on the geographical distance between point locations, in particular, the distance between an origin point i or several origin points (i = 1,...,n) and one or more destination points j (j = 1,...,m), and 2) for the calculation, they require geocoded (with geographical coordinates) individual point data.
Indicator | Non-normalized | Normalized
| |
Accessibility/Market potential | |||
Harris | M_{j} = ∑_{ i=1}^{n}O_{i}d_{ij}^{-1} | ||
0 ≤ M_{j} ≤∞ | |||
Hansen | A_{i} = ∑_{ j=1}^{m}O_{j}f(d_{ij}) | A_{i}^{*} =
| |
i≠j | i≠j
| ||
0 ≤ A_{i} ≤∞ | 0 ≤ A_{i}^{*}≤ 1
| ||
where: f(d_{ij}) = d_{ij}^{-λ} or f(d_{ij}) = e^{-λ*dij}
| |||
or f(d_{ij}) =
| |||
Proximity | |||
Count within buffer | N_{i} = ∑_{ i=1}^{n}I(d_{ij} ≤ t) | ||
i≠j | |||
Weighted count within buffer | N_{i}^{w} = ∑_{ i=1}^{n}I(d_{ij} ≤ t)O_{j} | ||
i≠j | |||
Ripley | K_{t} = ∑_{ i=1}^{n} | L_{t} = | H_{t} = L_{t} - t |
i≠j | i≠j | i≠j | |
E(K_{t}) = πt^{2} | E(L_{t}) = t | E(H_{t}) = 0 | |
where: λ = | |||
Compiled from: Kiskowski et al. (2009); Krider, Putler (2013); Peña Carrera (2002); Pooler (1987); Reggiani et al. (2011); Smith (2016)
One popular indicator of accessibility is the Hansen accessibility, developed by Hansen (1959) in the context of land use theory. The basic idea is that “accessibility” equals the sum of opportunities outgoing from a specific origin i. These opportunities are spread over a set of m locations (j = 1,...,m). The summation is weighted with the distance between i and the j-th location. This distance, no matter how measured (e.g. street distance, Euclidean distance, driving time) is assumed to be perceived in a nonlinear way, which is operationalized by a nonlinear distance decay function (a.k.a. distance impedance function or response function), e.g. power, exponential or logistic. A similar concept was introduced by Harris (1954) attempting to model the market potential of locations. If we replace the inverse distance weighting in the Harris indicator with another type of distance weighting, we see that both concepts are mathematically equivalent. The only difference is that the Harris indicator is conceptualized from the supplier’s perspective j (e.g. market potential of a retail store) and the Hansen accessibility takes the demand location i as a starting point (Pooler, 1987; Reggiani et al., 2011). As these indicators are dimensionless and range from zero to infinity, a normalization with a range from zero to one can be computed by weighting the results with the opportunities without distance correction.
This accessibility/potential concept can be used in the regional economic context e.g. to quantify the over-regional job potential (e.g. Wieland, Fuchs 2018) or the clustering of point locations of a specific type, such as retail stores (e.g. Larsson, Öner 2014). The most common application of these indicators may be the context of transport economics and transport geography (e.g. Albacete et al. 2017).
In the GIS context, spatial proximity can be measured using concentric zones within a radius of t (buffers) around point i, where the number of the j points within this radius is counted (Longley et al., 2005). A systematic analysis of spatial proximity or cluster patterns is possible using Ripley’s K function (Ripley, 1976). It compares empirical point counts with expected values from a random spatial point process based on a Poisson distribution. Ripley’s K computes empirical values for each distance band with a maximum distance of t, which can be compared to the expected value. A more comprehensible (and linear) interpretation is provided when normalizing the K function in the form of the L or H function. Also, confidence intervals for the expected values can be calculated by bootstrapping (Kiskowski et al., 2009; Smith, 2016). All of these measures are based on a simple indicator function, I(d_{ij} ≤ t), which takes the value of I = 1 if point j is within a distance of t from point i or not (I = 0). Originating from natural sciences, especially Ripley’s K is frequently used when analyzing location patterns in spatial economic contexts, such as the clustering of retail stores (e.g. Krider, Putler 2013) or other types of firms assumed to be connected in a network (e.g. Espa et al. 2010).
Table 11 shows the REAT functions for the accessibility and proximity methods described above. A simple Euclidean distance matrix for georeferenced points (data frame with latitude and longitude) can be calculated using the function dist.mat(). The function dist.buf() computes a “count points within buffer”, where also a weighting, O_{j}, can be summarized (e.g., if the destination points are cities of a given population, one could count the number of cities within 50 kilometers and their corresponding population). The latter function uses dist.mat(), thus, it is not necessary to create a distance matrix before.
Indicator | REAT function | Mandatory arguments | Optional arguments | Output |
Distance | dist.mat() | data frame(s) with start | i≠j | data frame with |
matrix | points i (ID, lat, lon) and | from, to, from-to | ||
end points j (ID, lat, lon), | and distance d_{ij} | |||
distance unit | (distance matrix) | |||
Buffer | dist.buf() | data frame(s) with start | i≠j, sum O_{j} | list with distance |
points (ID, lat, lon) and | at endpoints | matrix (data frame) | ||
end points (ID, lat, lon), | and count table | |||
max. distance t, | (data frame) | |||
distance unit | ||||
Hansen/ | hansen() | distance matrix (data | distance constant, | data frame with |
Harris | frame with start points i | max. distance t, | origins i and | |
and end points j as well | i≠j | accessibility A_{i} | ||
as distance d_{ij} and O_{j}), | ||||
weighting functions, | ||||
parameters λ and γ | ||||
Ripley | ripley() | data frame with points | local K values, | visible: matrix with t, K_{t}, |
(ID, lat, lon), total area | confidence | E(K_{t}), K_{t} - E(K_{t}), L_{t} | ||
A, max. distance t, | intervals | and H_{t} for each distance | ||
number of distance | no. of samples, | interval, invisible: matrix | ||
intervals | significance level, | (as described above) | ||
plot (K, L or H) | and optional: matrices | |||
with local K values and | ||||
confidence intervals | ||||
The same is the case for the function ripley(), which calculates Ripley’s K function for georeferenced data (data frame with lat/lon) and a given number of distance intervals up to a maximum distance of t. The differences between the empirical values, K_{t}, and the expected values, E(K_{t}), as well as the normalizations (L_{t} and H_{t}) are calculated and returned automatically. Optionally, local K values for each distance interval and corresponding confidence intervals are computed. These confidence intervals are based on bootstrapping with a given number of samples (default: 100) on a given significance level (the default value is α = 0.05, which leads to confidence intervals of a range from α∕2 = 2.5% to 1 - α∕2 = 97.5%). Note that the plot of the K function (or, when desired, L or H function) provides a graphical and more intuitive interpretation of the analyzed point pattern, especially when including confidence intervals.
When calculating the Hansen accessibility (or the Harris market potential) with hansen(), a distance matrix including the opportunities, O_{j}, is required. This can be, of course, done with dist.mat() (if straight-line distances are sufficient), but also with any other software creating distance matrices (and any type of transport costs indicator). In hansen(), the user may choose between a power, exponential or logistic distance decay function. Optionally, the normalized Hansen accessibility is returned additionally.
In the example in Section 2.2.2, we dealt with small-scale regional inequality in health care in South Lower Saxony, Germany. We have seen that e.g. psychotherapists are more spatially clustered than general practitioners (GPs). Returning to this topic, we want to use proximity and accessibility measures for determining the market potential (in the sense of the Harris model) of these health care locations. Obviously, there are different location patterns of general practitioners and psychotherapists. In the related study, there was evidence that psychotherapists are not just clustered but clustered within some districts of larger cities (Wieland, Dittrich, 2016). In the German health care planning system, the market potential of medical practices is the main determinant of the official authorization to be included into the allocation system of health insurance, while psychotherapists are assumed to need quite larger market areas than GPs (Kassenärztliche Bundesvereinigung, 2013). Consequently, our research hypothesis is that the population potential of psychotherapists is larger than that of general practitioners.
We use the same test data as in the mentioned example, containing the health locations (GoettingenHealth1) and the corresponding settlements (GoettingenHealth2). We load both R datasets:
Table GoettingenHealth1 contains 617 locations, whose ID is stored in the column location. Columns lat and lon contain the latitude and longitude, respectively, while the corresponding location type can be found in column type (phys_gen: general practitioners, psych: psychotherapists, pharm: pharmacies). As the following applications may be time-consuming, we extract the general practitioners from GoettingenHealth1 and draw a random sample of ten doctor’s practices:
Now, we want to summarize the population potential of these health locations in a 1,000 meters buffer. We apply the function dist.buf() to the sample data physgen_sample and sum up the local population of the districts within this distance (column pop in GoettingenHealth2):
We calculate the arithmetic mean of all ten potentials:
On average, the ten GP practices have a population potential of about 8,028 inhabitants. One problem related to the buffer technique is the lack of distance weighting: All origin points up to a given distance are included completely, while all points above 1,000 meters are ignored. Thus, we repeat estimating the population potential using the Hansen accessibility. At first, we need an origin-destination matrix (distance matrix) from the origin points to the sampled GP locations. We use the function dist.mat() and merge the returned distance matrix with the population values from GoettingenHealth2:
Then, we use the function hansen() to calculate the Hansen accessibility (used in the sense of the Harris market potential model) for each GP location in physgen_od.
The required columns in this dataset are the IDs of the GP locations (to), the IDs of the districts (from) and the population of the districts (pop) as well as the distances calculated above (distance). Finally, we have to set a distance weighting (which has an important influence in all types of spatial interaction models like this). For this purpose, we fall back on the results of a study by Fülöp et al. (2011): Based on empirical patient’s choice of doctor, they estimated distance decay functions in spatial interaction models (Huff model) for several types of physicians. For GPs, an exponential distance decay function with λ = -0.28 was found to fit the empirical data best. To set a distance decay function type and the related weighting(s), the function arguments dtype and lambda must be used. We save the results under the name physgen_hansen:
The output of the hansen() function is:
Again, we calculate the arithmetic mean of the distance-weighted market potentials:
The average population potential of the ten GPs is equal to 23,063 inhabitants.
As we want to compare the market potential of GPs and psychotherapists, we repeat the same analysis for them, now in the “fast mode”, leaving out most comments, as the functions and commands are exactly the same as above, only applied to psychotherapists.
The calculation of Hansen accessibility is different from the one for GPs with respect to the assumed distance reaction of the (potential) clients: For psychotherapists, Fülöp et al. (2011) found a distance impedance which is considerably smaller than for GPs (and any other type of doctor), resulting in a weighting parameter of λ = -0.11 in the exponential decay function:
We see that the average population potential of the sampled psychotherapists on the 1,000 meters buffer level is equal to 12,246 inhabitants, which is about one third more than for GPs. The Hansen/Harris market potential of psychotherapists of about 94,911 persons is a fourfold increase compared to the GPs. We have to remember that the last result is not only a matter of location but also due to a lower assumed distance decay. However, the population potential of the sampled psychotherapists is obviously higher than the potential of the GPs, which can be attributed to a different location pattern, where psychotherapists are more clustered within larger city districts.
We stick to the example of health care locations. As we have found different degrees of regional inequality with respect to suppliers (Section 2.2.2) and of market potentials (Section 5.2.2), we now analyze the clustering patterns of health service providers. In South Lower Saxony there is nearly the same number of psychotherapists (118) and pharmacies (120), but we should not expect their location patterns to be similar or even equal. Following the results above, we hypothesize that psychotherapists are more spatially clustered than pharmacies (as we already know about clustering with respect to districts in the former case and we can expect an avoidance tendency in the latter case due to a high degree of substitutability).
For this analysis, we compute Ripley’s K with the REAT function ripley(). Before going on, we have to prepare two things: First, we load the required dataset. Then, we must calculate the total area of the study area manually (here: in square meters).
Now, we compute Ripley’s K for the pharmacies only, which means processing only those locations in GoettingenHealth1 which are pharmacies (type == "pharm"). We set our maximum search radius equal to t = 30000 (function argument t.max), divided into 300 distance intervals (t.sep), resulting in distance steps of 100 meters. As we want to check for a significant deviation from a random spatial pattern, we instruct the function to construct confidence intervals (ci.boot = TRUE) using the default settings (α = 0.05, 100 bootstrapping samples). We also plot the results (default function argument: K.plot = TRUE) to inspect our results graphically. Here, we plot K_{t}, which is also the default setting (if the user wants to plot L_{t} or H_{t} instead, the function argument Kplot.func has to be changed to “L” or “H”, respectively):
The output is a matrix with six columns and one row for each distance interval. Thus, we skip the full output here:
We repeat the computation of Ripley’s K for the psychotherapists:
The output is (also truncated):
The graphical output is shown in Figures 4a (pharmacies) and 4b (psychotherapists), respectively. The expected value of K_{t} is plotted as blue line, while the empirical K_{t} values are red and the corresponding confidence intervals are colored in green (These colors are the default values in ripley() and can be changed by the function arguments lcol.exp and lcol.emp, respectively). As we have nearly the same number of points in both cases within the same field area, a direct comparison seems reasonable. Obviously, both types of locations show a significant spatial clustering: Also the pharmacies are more clustered than expected on condition of complete spatial randomness up to a distance of about 15,000 meters. We have to remember that also the population is already clustered (see Section 2.2.2) and the spatial distribution of pharmacies may follow this pattern. However, the clustering of psychotherapists exceeds this level enormously, especially within smaller distances up to about 8,000 meters. In conclusion, the psychotherapists are more spatially clustered than pharmacies.
Aspects of regional growth have already been discussed in the context of regional convergence in Section 3. The identification of clusters was the topic of Section 4. Combining some aspects of both, this section presents a collection of tools and models concerning regional growth with respect to industries. Like the indicators in Section 4, these techniques are of high significance especially in the context of local economic policy and municipal business promotion activities, aiming at e.g. strengthening a city’s or region’s competitiveness, defining its profile or increasing the number of jobs (Dinc, 2015; Nischwitz et al., 2017). Inspired by Farhauer, Kröll (2014) and congruent with the mathematical formulations in Section 4, we calculate on the basis of local/regional employment, e_{ij}, which is the number of employees of industry i in region j. Its growth from time t to time t + y can be operationalized as an absolute value (Δe_{ij} = e_{ijt+y} -e_{ijt}) or as a relative growth (Δe_{ij}^{rg} = e_{ijt+y}∕e_{ijt}) or as a (percentage) growth rate (Δe_{ij}^{gr} = e_{ijt+y}∕e_{ijt} - 1).
The first technique described is the regional economic portfolio matrix, originating from the portfolio matrix in marketing, developed by the Boston Consulting Group (BCG) for the identification of growing and declining business fields of firms (Henderson, 1973). However, this technique can also be applied to several regional economic contexts (Baker et al., 2002; Howard, 2007). Here, we present a portfolio matrix which compares the growth in one region with the growth in a superordinate reference region (e.g. whole economy). When using the matrix in this way, it is a plot of the growth rate with respect to industry i in the region (Δe_{ij}^{gr}) on the x axis and the corresponding growth in the reference region (Δe_{i}^{gr}) on the y axis (see Figure 5a). The size of the points for each industry may be the total size of employment in the region (e_{ij}) to reflect the absolute relevance of the i-th industry. The plot is segmented into four quadrants, differentiated with respect to positive or negative growth rates. As implied by the colors of the quadrants, they can be interpreted as follows: Quadrant I (top right) contains the industries growing in both the region and the whole economy (or any other reference region). Quadrant II (top left) shows all industries growing in the whole economy but shrinking in the regarded region, which may indicate significant locational handicaps. Quadrant III (bottom left) includes all industries shrinking in the region as well as in the whole economy. Quadrant IV (bottom right) shows the special case of “star” industries, indicating that these industries grow in the regarded region while shrinking in the whole economy. Note that this segmentation (and the corresponding interpretation) differs from the original BCG matrix.
Another variant of the portfolio matrix, which was developed in the context of designing the REAT package, is shown in Figure 5b. Combining the aspects of regional specialization (see Section 4) and regional growth, we can plot the location quotient as an indicator of local specialization on the x axis, while plotting an industry-specific growth indicator on the y axis. For identifying “growing” industries, there are at least three options of operationalization: We can plot the industry-specific regional growth rate (Δe_{ij}^{gr}) on the y axis (which is on the x axis in the portfolio matrix in Figure 5a) or the industry-specific national rate (Δe_{i}^{gr}) or, if we want to show regional growth in relation to national growth, the quotient of industry-specific regional and national growth rates (Δe_{ij}^{gr}∕Δe_{i}^{gr}). In quadrant I, we see now all industries overrepresented in the region (in terms of the location quotient) as well as growing on the regional/national level. Quadrant II shows all industries underrepresented in the region but growing as well. In quadrants III and IV, we can identify all industries with negative growth rates, which are underrepresented or overrepresented, respectively.
Component | Dunn-type (absolute) | Gerfin-type (index) |
Δe_{j} = e_{jt+y} - e_{jt} = | ||
n_{jt,t+y} + m_{jt,t+y} + c_{jt,t+y} | ||
Net total shift | t_{t+y} = e_{jt+y} - e_{jt} - n_{jt,t+y} = | t_{t+y} = m_{jt,t+y}c_{jt,t+y} = |
m_{jt,t+y} + c_{jt,t+y} | ||
static (two time periods t and t + y)
| ||
National share | n_{jt,t+y} = e_{jt} - e_{jt} | n_{jt,t+y} = 1 (omitted) |
Industrial mix | m_{jt,t+y} = ∑_{ i=1}^{I}e_{ij t} - e_{jt} | m_{jt,t+y} = |
Regional share | c_{jt,t+y} = ∑_{ i=1}^{I}e_{ij t}( -) | c_{jt,t+y} = |
dynamic (T time periods, while T > 2)
| ||
National share | n_{jt,T} = ∑_{ t=1}^{T}e_{j t} - e_{jt} | |
Industrial mix | m_{jt,T} = ∑_{ t=1}^{T}∑_{i=1}^{I}e_{ij t} - e_{jt} | |
Regional share | c_{jt,T} = ∑_{ t=1}^{T}∑_{i=1}^{I}e_{ij t}( -) | |
industry-specific
| ||
National share | n_{jt,t+y}^{i} = e_{ijt} - e_{ijt} | |
Regional share | c_{jt,T}^{i} = e_{ijt}( -) | |
prognosis for time period z
| ||
Employment | Δe_{ijt+z} = e_{ijt+y}()c_{jt,t+y} | |
Compiled from: Farhauer, Kröll (2014); Haynes, Parajuli (2014); Schätzl (2000); Schönebeck (1996); Spiekermann, Wegener (2008); Barff, Knight (1988)
A well-established model of regional growth is the shift-share analysis, which is, although developed independently from the portfolio matrix, closely linked to the concept presented above. The original shift-share analysis was introduced by Dunn Jr. (1960) and given a theoretical foundation by Casler (1989). Parallelly and independently, Gerfin (1964) developed a variant of shift-share analysis, which is more popular in the German-speaking regional economic science. Both concepts have been extended in several ways. Table 12 shows the basics of shift-share analysis with respect to “Dunn” and “Gerfin” type. As there are several ways of formulating the shift-share formulae and calling the particular elements of the shift-share analysis, the description here is based on the mathematical formulations in Farhauer, Kröll (2014) and the terms used in Haynes, Parajuli (2014).
The basic idea of shift-share analysis is the decomposition of regional growth into components, recognizing that single economic regions are embedded into and influenced by a larger regional system, normally the whole economy, just called “the nation” hereinafter: The (employment or e.g. gross value added) growth of industry i in region j from time t to time t + y can be attributed to 1) a national trend, which means the economic climate in the whole system of regions, 2) the all-over growth or decline of the regarded industries and 3) the industry-specific performance of the region, which is linked to locational advantages or disadvantages. The first component is called national share and reflects the growth in region j that would have occurred if region j would have developed exactly as the nation. The second component is the industrial mix, representing the aggregated industry-specific growth in region j if the regarded industries would have developed like in the whole economy, adjusted by the national effect. The third component is the regional share, which is the “residuum” of the first two components; this share of growth is attributed to locational advantages (or disadvantages), showing the regional growth adjusted by national and industry effects (Farhauer, Kröll, 2014; Haynes, Parajuli, 2014).
The Dunn-type models deal with absolute growth (Δe_{ij} or Δe_{j}), which is the sum of all shift-share components, and a net total shift, which is the sum of the industrial mix and the regional share (as these components are region-specific). Thus, this technique is also called the “difference method”. The Gerfin-type approaches express growth in terms of indices, while the net total shift for region j is the result of a multiplication of the industrial mix index and the regional share index, resulting in the alternative denomination “index method” (Schätzl, 2000).
Several extensions have been developed for the Dunn-type shift-share analysis (Haynes, Parajuli, 2014). One regular application calculates a shift-share analysis for each industry i in region j (instead of computing components for the whole region), while skipping the industrial mix effect. A main contribution was the dynamic shift-share analysis by Barff, Knight (1988). It extended the Dunn model by dealing with growth within a longitudinal cut of T years. Other extensions of the Dunn-type technique provide a deeper differentiation of the three components, which are regarded as correlated (e.g. Arcelus 1984; Esteban-Marquillas 1972).
Also developed independently in the context of German urban planning, a commercial area prognosis deals with an absolute (assumed) employment growth (Δe_{ij}) over T years, which is used to forecast the required commercial area within a city or region j up to time T. Note that “commercial area” represents the type of urban area which is used by specific economic activities, especially industrial plants, and/or designated for this purpose in municipal land-use plans. This technique is a demand-side approach, as it derives the required commercial area from the (expected) demand for it (Bonny, Kahnert, 2005). See Table 13 for the calculation of two types of commercial area prognosis based on employment growth.
Prognosis | GIFPRO | TBS-GIFPRO |
Employment | e_{ijt}^{A} = e_{ijt0} + e_{ijt0} | e_{ijt}^{A} = e_{ijt} + e_{ijt} |
-e_{ijt0} | -e_{ijt} | |
where: e_{ijt} = f(t) = a + bt or | ||
f(t) = at^{b} or f(t) = ae^{bt} or | ||
f(t) = | ||
Areal index | pre-defined: ai_{ij} | empirical estimation: ai_{ij} = |
Commercial area | A_{ijt} = e_{ijt}^{A}ai_{ij}
| |
A_{jt} = ∑_{
i=1}^{I}A_{ij
t}
| ||
A_{jT} = ∑_{
i=1}^{I}∑_{t=1}^{T}A_{ij
t}
| ||
Compiled from: Bonny, Kahnert (2005); CIMA Projekt + Entwicklung GmbH et al. (2011); Deutsches Institut für Urbanistik GmbH, Spath + Nagel (GbR) (2010); Planungsgruppe MWM (2009); Mulligan (2006); Vallée et al. (2012)
The basic model called GIFPRO (German abbreviation for “Gewerbe- und Industrieflächenbedarfsprognose”, roughly translated: prognosis of future demand of commercial area) was developed by Stark et al. (1981). The usual procedure is to estimate – starting from the current employment – the future industry-specific employment in region j. This number of employees is weighted by the industry-specific shares of workers usually located in commercial areas and multiplied by a resettlement rate (sq_{ij} percent of employees from industry i are resettled in one time period) and a relocation rate (rq_{ij} percent of employees from industry i are relocated in one time period) as well as a reutilization rate (ru_{ij} percent of employees from industry i will be located at reused commercial area). This “commercial area-relevant” employment is weighted with an areal index, a_{ij} (commercial area per employee), to compute the commercial area for industry i in region j for one time period t. The expected commercial area is summed over all I industries (A_{jt}) and, finally, over all T years and I industries (A_{jT}) (Bonny, Kahnert, 2005; Planungsgruppe MWM, 2009).
A significant extension was developed in the context of establishing a land-use plan for Dresden: The TBS-GIFPRO (German abbreviation for “Trendbasierte und standortspezifische Gewerbe- und Industrieflächenbedarfsprognose”, roughly translated: trend-based and location-specific prognosis of future demand of commercial area) technique (Deutsches Institut für Urbanistik GmbH, Spath + Nagel (GbR), 2010). It includes a stochastic approach for forecasting employment as well as other region-specific data. The employment prognosis is done using a trend regression model (employment against time) based on past empirical employment data for region j (mostly from official employment statistics) which are used for forecasting future employment. For each i industry, a single regression model is estimated, where the function type is not pre-defined but chosen e.g. based on the explained variance (R^{2}) and/or plausibility considerations.
The function may be linear (which seems unrealistic) or not: Deutsches Institut für Urbanistik GmbH, Spath + Nagel (GbR) (2010) use linear and exponential functions. However, from the growth perspective, also a logistic function may be applied (see Mulligan 2006 for a discussion of logistic growth with respect to population). If possible, the areal index and, maybe, other parameters are also estimated empirically for the specific region j (e.g. via firm-level surveys and/or official statistical data).
Table 14 shows the functions for the analysis of regional growth as implemented in REAT. Table 15 presents the functions related to commercial area prognosis. All of these functions require at least current employment data for each industry in the regarded region j, e_{ij}, which may be a single numeric vector or the column of a data frame or matrix. Another similarity of all mentioned functions is the optional argument of the industry names (or codes). If no industry names are stated by the user (default function argument: industry.names = NULL), the industries are numbered consecutively. With respect to the function output, all regional growth functions distinguish between a visible and an invisible output (see e.g. Section 3), where the main results are returned automatically and the details are included in the invisible output (mostly a list with several entries of type matrix).
Model | REAT function | Mandatory arguments | Optional arguments | Output |
Growth | portfolio() | vectors of e_{ijt} and e_{ijt+y} | point size | visible: plot, |
portfolio | and vectors of e_{it} and | factor, | invisible: growth | |
matrix | e_{it+y} or matrix/data | industry names | rates (matrix) | |
frame with e_{ijt} and | ||||
e_{it} for T years, | ||||
point size (e.g. e_{ijt+y}) | ||||
Growth and | locq.growth() | vectors of e_{ijt} and e_{ijt+y} | point size | visible: plot, |
specialization | and vectors of e_{it} and | factor, | invisible: list with | |
portfolio | e_{it+y} or matrix/data | industry names | portfolio data | |
matrix | frame with e_{ijt} and | (matrix), LQ_{ij} | ||
e_{it} for T years, | (matrix) and | |||
point size (e.g. e_{ijt+y}) | growth rates | |||
(matrix) | ||||
Shift-share | shift() | vectors of e_{ijt} and e_{ijt+y}, | shift-share | visible: matrix |
analysis | vectors of e_{it} and e_{it+y} | method | with components, | |
(default: Dunn), | invisible: list with | |||
industry names, | components (matrix), | |||
plot components, | growth (matrix) and | |||
plot portfolio | shift method (char), | |||
optional: plot(s) | ||||
dynamic | shiftd() | vectors of e_{ijt0} and e_{it0}, | shift-share | visible: matrix |
matrix/data frame with | method | with annual | ||
e_{ijt} and e_{it} for T years | (default: Dunn), | components, | ||
industry names, | invisible: list with | |||
plot components, | components (matrix), | |||
plot portfolio | annual components | |||
(matrix), growth | ||||
(matrix) and shift | ||||
method (char), | ||||
optional: plot(s) | ||||
industry- | shifti() | vectors of e_{ijt} and e_{ijt+y}, | shift-share | visible: matrix |
specific | vectors of e_{it} and e_{it+y} | method | with industry | |
(default: Dunn), | components, | |||
industry names, | invisible: list with | |||
plot components, | components (matrix), | |||
plot portfolio | industry components | |||
(matrix), growth | ||||
(matrix) and shift | ||||
method (char), | ||||
optional: plot(s) | ||||
industry- | shiftid() | vectors of e_{ijt0} and e_{it0}, | shift-share | visible: matrix |
specific and | matrix/data frame with | method | with industry | |
dynamic | e_{ijt} and e_{it} for T years | (default: Dunn), | components, | |
industry names, | invisible: list with | |||
plot components, | components (matrix), | |||
plot portfolio | industry components | |||
(matrix), growth | ||||
(matrix) and shift | ||||
method (char), | ||||
optional: plot(s) | ||||
prognosis | shiftp() | vectors of e_{ijt} and e_{ijt+y}, | industry names, | visible: matrix |
vectors of e_{it} and e_{it+y}, | plot | with industry | ||
vector of e_{it+z}^{P} | components, | |||
invisible: list with | ||||
industry employment | ||||
prognosis (matrix), | ||||
components (matrix), | ||||
industry components | ||||
(matrix), growth | ||||
(matrix) and shift | ||||
method (char), | ||||
optional: plots | ||||
Model | REAT function | Mandatory arguments | Optional arguments | Output |
GIFPRO | gifpro() | vectors of e_{ij}, a_{i}, | vector of ru_{ij}, | visible: total |
sq_{ij}, rq_{ij} and ai_{ij}, | industry names, | commercial area and | ||
time interval, | type of output | (optional) annual values, | ||
time base | invisible: list with | |||
components (matrices), | ||||
annual and all-over | ||||
results (list with two | ||||
matrices) | ||||
TBS-GIFPRO | gifpro.tbs() | vectors of e_{ijt} for T | vector of ru_{ij}, | visible: total |
years, a_{i}, sq_{ij}, rq_{ij} | industry names, | commercial area and | ||
and ai_{ij}, time interval, | type of output, | (optional) annual values, | ||
time base, trend | employment | invisible: list with | ||
function types | forecast only | components (matrices), | ||
annual and all-over | ||||
results (list with two | ||||
matrices), industry- | ||||
specific forecast | ||||
model results (list | ||||
with I matrices) | ||||
The portfolio matrix (growth portfolio and growth-specialization portfolio, respectively) can be plotted using the functions portfolio() and locq.growth(), respectively. The different techniques of shift-share analysis are distributed over five functions (shift(), shiftd(), shifti(), shiftid() and shiftp()). The usage of portfolio and shift-share functions is similar: In any case, the user needs industry-specific employment data for the regarded region and the reference region (e.g. whole economy) for at least two time periods (e.g. years).
All functions for shift-share analysis (except for shift-share prognosis with shiftp()) provide three variants of calculation of the components: The classical Dunn method (default function argument shift.method="Dunn"), the Dunn extension by Esteban-Marquillas (1972) (shift.method="Esteban") producing four components instead of three, and the Gerfin method (shift.method="Gerfin"). When calculating a dynamic shift-share analysis, the user must choose the function shiftd(). Industry-specific components are returned by the function shifti(). With shiftid() one can combine both approaches. Here, it is important to recognize that the function structure allows a combination of e.g. industry-specific and dynamic components while calculating the components from the Esteban-Marquillas extension of shift-share analysis. Additionally, the shift-share functions may plot a portfolio matrix (function argument plot.portfolio = TRUE), allowing portfolio and shift-share analysis at once.
Both functions for commercial area prognosis (gifpro() and gifpro.tbs()) require vectors of employment data as well as the coefficients for resettlement etc. When forecasting commercial area using the trend-specific technique with gifpro.tbs(), the user needs time series data of previous industry-specific employment and has to specify a trend function type (linear, power, exponential or logistic) for each industry. The “best” function type may be examined visually by regarding the employment forecasting output (optional function argument prog.plot = TRUE) and the related R^{2} values which is part of the invisible function output. Note that this function uses the REAT function curvefit(), which is a simple tool for bivariate regression, similar to the curve fitting functions in other spreadsheet or statistics software.
Referring to the example in Section 4.2.2, we perform a regional growth analysis for the German city Göttingen. We use the same dataset Goettingen as before, that contains industry-specific employment data for Göttingen and Germany from 2008 to 2017. We load our example data:
In the first step, we want to examine the industry-specific growth in Göttingen visually. Using the function portfolio(), we plot a regional growth matrix with respect to the 15 industries (rows 2 to 16). We also set a plot title (argument pmtitle) and axis labels (arguments pmx and pmy, respectively) as well as industry-specific colors (argument pcol):
Similarly, we plot a growth-specialization portfolio matrix using locq.growth() with the same options (colors etc.). On the y axis, we put the industry-specific regional growth which is stated by the function argument y.axis = "r" (if we would like to see the national growth instead, we had to set y.axis = "n"; for the quotient of regional and national growth, use y.axis = "rn"):
The resulting growth portfolio matrix is shown in Figure 6a, the growth-specialization portfolio in Figure 6b. The size of the points (or bubbles) is equal to the current industry-specific employment (e_{ij}) for 2017 (rows 2 to 16 of column Goettingen2017 in the example data), normalized with respect to a maximum point size of 15 (argument psize.factor = 15). As we can see, the health sector (industry code Q, green bubble) has the highest absolute relevance, which can be attributed to the local university hospital (see Section 4.2.2). The axes in the growth portfolio are segmented at x = 0 and y = 0, respectively, which means a differentiation between positive and negative growth. As we can see, most industries have grown from 2008 to 2017 in both the region and the whole economy (see quadrant I) with similar growth rates. There is one outlier: Industry R (arts, entertainment, and recreation) shows a regional growth of more than 75 percent, while the national growth is about 10 percent. Note that we see percentage growth rates from 2008 to 2017 here (if average growth rates are desired, use the function argument time.periods).
Looking at the growth-specialization portfolio, we can identify absolute relevance and growth rate as well as regional specialization of the industries (The colors and bubble sizes are equal to those in Figure 6a). In quadrant I, we find the industries which are overrepresented in Göttingen (specialization) and growing at this regional level. As expected in this university city and related to our results in Section 4.2.2, the “stars” in Göttingen are education (code P), health (code Q) and professional, scientific and technical services (code M).
While the portfolio matrix analysis tells us about the industry-specific growth, the shift-share analysis decomposes this growth into the national, industrial and regional components. In the first step, we perform a static shift-share analysis in the sense of Dunn Jr. (1960) for the same data as in the portfolio analysis by applying the function shift():
This is our (visible) output:
In this cross-sectional analysis, we see that the overall employment in Göttingen increased by 10,411 persons from 2008 to 2017. However, a large share of this growth is due to the growth in the national economy (n_{jt,t+y} ≈ 9,178 employees), which is only a bit lower than Göttingen. The industrial mix component (m_{jt,t+y}) shows that approximately 2,205 additional employees must be attributed to an overrepresentation of growing industries in Göttingen. The regional share is negative (c_{jt,t+y} ≈-972), which indicates locational disadvantages. When interpreting the industrial mix also as a regional aspect (which seems plausible), we can look at the sum of the industrial mix and the regional share: The net total shift (t_{t+y}) is equal to 1,233 employees, representing the growth difference between the region and the whole economy.
We confirm our results using the Gerfin technique. We request it by setting the argument shift.method of the shift() function equal to "Gerfin":
The output is:
In the index method, there is no national share component (implicitly, it is equal to one), thus, we only take a look at the industrial mix and the regional share as well as the net total shift. The industrial mix component is above one (n_{jt,t+y} ≈ 1.03), showing a more advantageous sector structure in Göttingen compared to Germany. While the regional share in the Dunn-type shift-share analysis was negative, this component in the Gerfin analysis is slightly below one (c_{jt,t+y} ≈ 0.99), indicating locational disadvantages.
These traditional techniques only regard the overall growth with respect to cross-sectional data. To gain a deeper insight and take into account also seasonal effects, we perform a dynamic shift-share analysis in the sense of Barff, Knight (1988) which distinguishes between the 15 industries simultaneously. This can be done via the REAT function shiftid(), requiring data for the initial time period and at least for two following periods. In the Goettingen dataset, the rows 2 to 16 represent the industries and the columns represent the years (2008 to 2017). Data for the regarded region and the whole economy is arranged successively. We also use the industry codes in column WZ2008_Code. In this function, we have to define the start and end periods explicitly:
The result is:
The visible output is a matrix containing one row for each component (the number of components depends on the selected shift-share method, here: Dunn) and I columns (one for each industry). As we calculate industry-specific components, there is no industrial mix effect, which means that the calculations are on the level of single industries. Again, we detect large absolute growth for industries P (education) and Q (health) (see Table 9). Interestingly, this growth can be mainly attributed to effects in the whole economy. The corresponding regional shares are small but positive, showing locational advantages with respect to these industries in Göttingen.
The logic of shift-share analysis can also be regarded in two other examples: If industry C (manufacturing) had developed as in the national trend, the absolute growth in Göttingen would be equal to 255 employees. In fact, there was a decline of 1,117 employees, resulting in a negative regional share of -1,372 employees, indicating locational disadvantages with respect to the manufacturing sector. The opposite is true for the industries with code BDE (including electricity, gas, water supply, etc.): The absolute growth of 29 employees would not have occurred if this sector had developed as in the whole economy (negative national share equal to -9 employees). The residuum (regional share) is equal to 38 employees, indicating a trend contrary to the national.
Using the same data, we now perform a commercial area prognosis for Göttingen. We load our data:
When using the GIFPRO-based commercial area prognosis techniques, several parameters have to be defined (employment shares in commercial areas a_{i}, resettlement rate sq_{ij}, relocation rate rq_{ij} and areal index ai_{ij}; a reutilization rate ru_{ij} is optional, thus, we ignore the reutilization of commercial area in this example). These parameters have to be defined for each industry. In our example, we use the employment shares as well as the resettlement and relocation rates from Deutsches Institut für Urbanistik GmbH, Spath + Nagel (GbR) (2010). Note that some sectors are, per definition, not located within commercial areas (e.g. agriculture), resulting in an employment share of a_{i} = 0. As we want to reuse the sets of parameters, we save them as single numeric vectors:
Now, we compute the traditional commercial area prognosis using the gifpro() function and the Goettingen data as well as the parameters defined above. We forecast the commercial area for five years (tinterval = 5). Our base is 2017 (time.base = 2017), as this is the last year empirical data is available for. We save the (invisible) output in the list object gifpro_goettingen:
As we have set output = "full", the visible function output contains overall as well as annual values:
In all 15 industries, 1,114 new employees are predicted for the year 2022, resulting in 212,928 square meters required for new commercial area. As the employment prognosis is not based on (nonlinear) trend regression but on constant growth, the absolute employment growth and the required commercial area are equal in each year (223 employees and 42,596 sqm, respectively).
The object gifpro_goettingen contains a list called components containing the single components of prognosis as well as the results already shown in the visible output (results). To understand the GIFPRO technique and the related REAT function, we take a look at the single components:
As we defined some industries as not relevant for commercial areas (a_{i} = 0), they do not contribute any employees neither resettled nor relocated (such as A - agriculture, B - mining and quarrying or R - arts, entertainment, and recreation). We see that e.g. in the manufacturing sector (code C), there is an annual increase of about 12 employees attributed to resettlement and 55 employees related to relocation each year (see row 3 in resettlement and relocation, respectively). As we ignored the reutilization of commercial area, the matrix containing the commercial area-relevant employment related to reutilization (reuse) contains only zeros. The sum of all three components is stored in the fourth matrix, employment. There is an annual increase of nearly 67 employees in the manufacturing sector. The contents of the results list is the same as shown in the visible output.
In the next step, we apply the trend-based commercial area prognosis (TBS-GIFPRO) to the Goettingen data. In the gifpro.tbs() function, we use the employment data from 2008 to 2017 (columns 3 to 12), and assume an exponential function for employment prognosis (function argument prog.func, repeating the argument "exp" for each industry). The employment prognosis is plotted (prog.plot = TRUE), showing all 15 plots in one (plot.single = FALSE):
The visible function output is similar to the output above:
The resulting plot containing the employment forecasting functions is shown in Figure 7. The black vertical lines divide the plots into the esimation segment (2008 to 2017) and the prognosis segment (2018 to 2022). Four function types are supplied: linear (blue), power (green), exponential (yellow) and logistic (red). Note that a linear trend seems unrealistic as it implies continuous growth and may result in negative employment if the slope is negative. At this point, we should normally discuss and find the “best” forecasting model for each industry and rerun our analysis a few times. In our example, we skip this step and just take a look at the prognosis functions: In most cases, an exponential growth (or decline) seems to be an appropriate approximation. The power functions (green lines) are nearly invisible as their data fit is nearly the same as that of the exponential functions. Thus, we could choose them instead. In our case, the exponential function seems sufficient.
As expected, a nonlinear industry growth results in a nonlinear overall employment growth and, consequently, the commercial area-relevant employment also grows in a nonlinear way. As we can see from the gifpro.tbs() output, employment increases by about 228 employees per year on average and by about 1,140 employees over the five years regarded (2018 to 2022). The annual commercial area required ranges from 42,946 sqm (2018) to 43,499 sqm (2022), all in all 216,012 sqm up to 2022. In our case, the estimated commercial area exceeds the prognosis derived from the simple GIFPRO analysis, which can be attributed to the positive differences between the exponential prognosis and a linear prognosis (see Figure 7). We skip the inspections of the components, which could be addressed by saving the results in an object (list), as we did in the first GIFPRO example.
This paper has shown how R and specifically the package REAT can be used for regional economic analysis. It should be noted that this package aims at width with respect to the treated analysis subjects rather than depth. The subsections provide the basic analysis methods regarded as most important from the package developer’s point of view (with respect to usage in current papers and discussion in current textbooks as well as application in own research projects), while there are several other approaches as well as extensions of the basic methods. A more detailed survey of the common methods can be found in the cited literature, especially in review articles (e.g. Nakamura, Morrison Paul 2009; Portnov, Felsenstein 2010) and textbooks (e.g. Farhauer, Kröll 2014).
Finally, we have to keep in mind that this package (like nearly any other free software) was developed in a non-commercial context (and published under the GNU General Public License). All functions have been tested several times using various real data and single functions have already been used in a few research projects. However, there is no warranty that all functions always work perfectly. Like nearly any other R package, REAT is continuously refined, which means extending functions as well as correcting errors. This requires attentive usage and, of course, constructive feedback from the package users. It can be easily transmitted using the contact information on the CRAN package website.
Albacete X, Olaru D, Paül V, Biermann S (2017) Measuring the accessibility of public transport: A critical comparison between methods in Helsinki. Applied Spatial Analysis and Policy 10[2]: 161–188. CrossRef.
Allington NF, McCombie J (2007) Economic Growth and Beta-Convergence in the East European Transition Economies. In: Arestis P, Baddeley M, McCombie J (eds), Economic Growth. Edward Elgar publishing, Cheltenham, 200–222
Arcelus FJ (1984) An extension of shift-share analysis. Growth and Change 15[1]: 3–8. CrossRef.
Bai CE, Tao Z, Tong YS (2008) Bureaucratic integration and regional specialization in China. China Economic Review 19[2]: 308–319. CrossRef.
Baker P, von Kirchbach F, Mimouni M, Pasteels JM (2002) Analytical Tools for Enhancing the Participation of Developing Countries in the Multilateral Trading System in the Context of the Doha Development Agenda. Aussenwirtschaft 57[3]: 343–372. https://EconPapers.repec.org/RePEc:usg:auswrt:2002:57:03:343-372
Balassa B (1965) Trade Liberalisation and “Revealed” Comparative Advantage. The Manchester School 33[2]: 99–123. CrossRef.
Barff RA, Knight PL (1988) Dynamic shift-share analysis. Growth and Change 19[2]: 1–10. CrossRef.
Barro RJ, Sala-i Martin X (2004) Economic Growth (2nd ed.). MIT Press
Bonny HW, Kahnert R (2005) Zur Ermittlung des Gewerbeflächenbedarfs. Raumforschung und Raumordnung 63[3]: 232–240
Capello R, Nijkamp P (2009) Introduction: Regional growth and development theories in the twenty-first century - recent theoretical advances and future challenges. In: Capello R, Nijkamp P (eds), Handbook of Regional Growth and Development Theories. 1–18
Casler SD (1989) A Theoretical Context for Shift and Share Analysis. Regional Studies 23[1]: 43–48. CrossRef.
Ceapraz IL (2008) The Concepts of Specialisation and Spatial Concentration and the Process of Economic Integration: Theoretical Relevance and Statistical Measures. The Case of Romania’s Regions. Romanian Journal of Regional Science 2[1]: 68–93
Charles-Coll JA (2011) Unterstanding Income Equality: Concept, Causes and Management. International Journal of Economics and Management Science 1[3]: 17–28
CIMA Projekt + Entwicklung GmbH, NIW Niedersächsisches Institut für Wirtschaftsforschung, NORD/LB Regionalwirtschaft, Planquadrat Dortmund GbR (2011) Gewerbeflächenkonzeption für die Metropolregion Hamburg (GEFEK). Research report
Cracau D, Durán Lima JE (2016) On the Normalized Herfindahl-Hirschman Index: A Technical Note. International Journal on Food System Dynamics 7[4]: 382–386
Damgaard C, Weiner J (2000) Describing inequality in plant size or fecundity. Ecology 81[4]: 1139–1142. CrossRef.
Dapena AD, Fernández Vázquez E, Rubiera Morollón F (2016) The role of spatial scale in regional convergence: the effect of MAUP in the estimation of β-convergence equations. The Annals of Regional Science 56[2]: 473–489. CrossRef.
Dauth W, Fuchs M, Otto A (2015) Standortmuster in Westdeutschland: Nur wenige Branchen sind räumlich stark konzentriert. IAB Kurzbericht 16/2015, Institut für Arbeitsmarkt- und Berufsforschung. http://doku.iab.de/kurzber/2015/kb1615.pdf
Dauth W, Fuchs M, Otto A (2018) Long-run processes of geographical concentration and dispersion: Evidence from Germany. Papers in Regional Science 97[3]: 569–593. CrossRef.
Deutsches Institut für Urbanistik GmbH, Spath + Nagel (GbR) (2010) Stadtentwicklungskonzept Gewerbe für die Landeshauptstadt Potsdam. Research report, Landeshauptstadt Potsdam. https://www.potsdam.de/sites/default/files/documents/STEK_Gewerbe_Langfassung_2010.pdf
Dinc M (2015) Introduction to Regional Economic Development. Major Theories and Basic Analytical Tools. Elgar
Dixon R, Freebairn J (2009) Trends in Regional Specialisation in Australia. Australasian Journal of Regional Studies 15[3]: 281–296
Doran J, Jordan D (2013) Decomposing European NUTS2 regional inequality from 1980 to 2009: National and European policy implications. Journal of Economic Studies 40[1]: 22–38. CrossRef.
Dunn Jr. ES (1960) A Statistical and Analytical Technique for Regional Analysis. Papers in Regional Science 6[1]: 97–112. CrossRef.
Duranton G, Puga D (2000) Diversity and Specialisation in Cities: Why, Where and When Does it Matter? Urban Studies 37[3]: 533–555. CrossRef.
Ellison G, Glaeser E (1997) Geographic Concentration in U.S. Manufacturing Industries: A Dartboard Approach. Journal of Political Economy 105[5]: 889–927
Espa G, Arbia G, Giuliani D (2010) Measuring industrial agglomeration with inhomogeneous K-function: the case of ICT firms in Milan (Italy). Department of Economics Working Papers 1014, Department of Economics, University of Trento, Italia
Esteban-Marquillas JM (1972) I. A reinterpretation of shift-share analysis. Regional and Urban Economics 2[3]: 249 – 255. CrossRef.
Farhauer O, Kröll A (2014) Standorttheorien. Regional- und Standortökonomik in Theorie und Praxis (2nd ed.). Springer, Heidelberg
Fujita M, Krugman P, Venables A (2001) The Spatial Economy: Cities, Regions, and International Trade (1st ed.), Volume 1. The MIT Press
Fülöp G, Kopetsch T, Schöpe P (2011) Catchment areas of medical practices and the role played by geographical distance in the patient’s choice of doctor. The Annals of Regional Science 46[3]: 691–706. CrossRef.
Furceri D (2005) Beta and sigma convergence: A mathematical relation of causality. Economics Letters 89[2]: 212–215. CrossRef.
Gerfin H (1964) Gesamtwirtschaftliches Wachstum und regionale Entwicklung. Kyklos 17[4]: 565–593. CrossRef.
Gini C (1912) Variabilità e Mutuabilità. Contributo allo Studio delle Distribuzioni e delle Relazioni Statistiche. Cuppini
Gluschenko K (2018) Measuring regional inequality: to weight or not to weight? Spatial Economic Analysis 13[1]: 36–59. CrossRef.
Goecke H, Hüther M (2016) Regional Convergence in Europe. Intereconomics 51[3]: 165–171. CrossRef.
Goschin Z, Constantin D, Roman M, Ileanu B (2009) Regional specialization and geographic concentration of industries in Romania. South-Eastern Europe Journal of Economics 1[1]: 99–113. https://ojs.lib.uom.gr/index.php/seeje/article/view/5536
Haas A, Südekum J (2005) Spezialisierung und Branchenkonzentration in Deutschland: Regionalanalyse. IAB-Kurzbericht 1/2005. http://hdl.handle.net/10419/158181
Habánik J, Hošták P, Kútik J (2013) Economic and social disparity development within regional development of the Slovak Republic. Economics and Management 18[3]: 457–464. CrossRef.
Hansen WG (1959) How Accessibility Shapes Land Use. Journal of the American Institute of Planners 25[2]: 73–76. CrossRef.
Harris CD (1954) The Market as a Factor in the Localization of Industry in the United States. Annals of the Association of American Geographers 44[4]: 315–348
Haynes KE, Parajuli J (2014) Shift-share analysis: decomposition of spatially integrated systems. In: Handbook of Research Methods and Applications in Spatially Integrated Social Science. Elgar, 315–344. CrossRef.
Heinemann M (2008) Messung und Darstellung von Ungleichheit. Working Paper Series in Economics 108, University of Lüneburg, Institute of Economics. https://EconPapers.repec.org/RePEc:lue:wpaper:108
Henderson BD (1973) The Experience Curve - Reviewed. IV. The Growth Share Matrix or The Product Portfolio. Reprint 135. https://www.bcg.com/documents/file13904.pdf
Herfindahl OC (1950) Concentration in the U.S. Steel Industry. Colombia University Press
Hirschman AO (1945) National Power and the Structure of Foreign Trade. Publications of the Bureau of Business and Economic Research. University of California Press
Hoen AR, Oosterhaven J (2006) On the measure of comparative advantage. The Annals of Regional Science 40[3]: 677–691. CrossRef.
Hoffmann J, Hirsch S, Simons J (2017) Identification of spatial agglomerations in the German food processing industry. Papers in Regional Science 96[1]: 139–162. CrossRef.
Hoover EM (1936) The Measurement of Industrial Localization. The Review of Economics and Statistics 18[4]: 162–171
Howard D (2007) A regional economic performance matrix – an aid to regional economic policy development. Journal of Economic and Social Policy 11[2]: article 4. https://EconPapers.repec.org/RePEc:usg:auswrt:2002:57:03:343-372
Howard E, Newman C, Tarp F (2016) Measuring industry coagglomeration and identifying the driving forces. Journal of Economic Geography 16[5]: 1055–1078
Huang Y, Leung Y (2009) Measuring Regional Inequality: A Comparison of Coefficient of Variation and Hoover Concentration Index. The Open Geography Journal 2[1]: 25–34. CrossRef.
Jiang L, Guan M, Tian J (2007) On Chinese Regional Specialization and Industry Concentration. In: 2007 International Conference on Machine Learning and Cybernetics, Volume 6, 3396–3400
Kabacoff RI (2017) Quick-R: Data Types. Manual. https://www.statmethods.net/input/datatypes.html
Kassenärztliche Bundesvereinigung (2013) Die neue Bedarfsplanung. Grundlagen, Instrumente und regionale Möglichkeiten. Brochure. https://www.kbv.de/media/sp/Instrumente_Bedarfsplanung_Broschuere.pdf
Kim S (1995) Expansion of Markets and the Geographic Distribution of Economic Activities: The Trends in U. S. Regional Manufacturing Structure, 1860–1987. The Quarterly Journal of Economics 110[4]: 881–908. CrossRef.
Kiskowski MA, Hancock JF, Kenworthy A (2009) On the Use of Ripley’s K-function and its Derivatives to Analyze Domain Skriderize. Biophysical Journal 97[4]: 1095–1103. CrossRef.
Kohn W, Öztürk R (2013) Statistik für Ökonomen. Datenanalyse mit R und SPSS (2nd ed.). Springer Gabler
Krider R, Putler DS (2013) Which Birds of a Feather Flock Together? Clustering and Avoidance Patterns of Similar Retail Outlets. Geographical Analysis 45[2]: 123–149. CrossRef.
Krugman P (1979) Increasing returns, monopolistic competition, and international trade. Journal of International Economics 9[4]: 469–479. CrossRef.
Krugman P (1991) Geography and trade. MIT Press
Larsson JP, Öner Ö (2014) Location and co-location in retail: a probabilistic approach using geo-coded data for metropolitan retail markets. The Annals of Regional Science 52[2]: 385–408. CrossRef.
Lehocký F, Rusnák J (2016) Regional specialization and geographic concentration: experiences from Slovak industry. Miscellanea Geographica – Regional Studies on Development 20[3]: 5–13. https://www.degruyter.com/downloadpdf/j/mgrsd.2016.20.issue-3/mgrsd-2016-0011/mgrsd-2016-0011.pdf
Lessmann C (2005) Regionale Disparitäten in Deutschland und ausgesuchten OECD-Staaten im Vergleich. ifo Dresden berichtet 3/2005: 25–33
Lessmann C (2014) Spatial inequality and development - Is there an inverted-U relationship? Journal of Development Economics 106: 35–51. CrossRef.
Lessmann C (2016) Regional inequality and internal conflict. German Economic Review 17[2]: 157–191. CrossRef.
Lessmann C, Seidel A (2017) Regional inequality, convergence, and its determinants – a view from outer space. European Economic Review 92: 110–132. CrossRef.
Litzenberger T, Sternberg R (2006) Der Clusterindex – eine Methodik zur Identifizierung regionaler Cluster am Beispiel deutscher Industriebranchen. Geographische Zeitschrift 94[2]: 209–224
Longley PA, Goodchild MF, Maguire DJ, Rhind DW (2005) Geographical Information Systems and Science (2nd ed.). Wiley
Lorenz MO (1905) Methods of measuring the concentration of wealth. Publications of the American Statistical Association 9[70]: 209–219. CrossRef.
Martin C (2015) Kreative Klasse 2015. Kreativität als entscheidender Faktor für wirtschaftlichen Erfolg: Entwicklungen und Ausprägungen in Deutschland. Research report. https://www.kreativ-sta.de/wp-content/uploads/2017/10/agiplan_Kreative_Klasse_2015_Studie.pdf
Midelfart-Knarvik K, Overman H, Redding S, Venables A (2000) The Location of European Industry. European Economy - Economic Papers 142
Moga LM, Constantin DL (2011) Specialization and Geographic Concentration of the Economic Activities in the Romanian Regions. Journal of Applied Quantitative Methods 6[2]: 12–21. https://pdfs.semanticscholar.org/aa9d/365d6a8ef4c3585595c8ba03fe373ab02010.pdf
Mulligan GF (2006) Logistic Population Growth in the World’s Largest Cities. Geographical Analysis 38[4]: 344–370. CrossRef.
Mussini M (2017) Decomposing Changes in Inequality and Welfare Between EU Regions: The Roles of Population Change, Re-Ranking and Income Growth. Social Indicators Research 130[2]: 455–478. CrossRef.
Myrdal G (1957) Economic theory and under-developed regions. G. Duckworth
Nakamura R, Morrison Paul C (2009) Measuring agglomeration. In: Capello R, Nijkamp P (eds), Handbook of Regional Growth and Development Theories. Elgar, 305–328
Nischwitz G, Böhme R, Fortmann F (2017) Kommunale Wirtschaftsförderung in Bremen: Handlungsrahmen, Programme und Wirkungen. Schriftenreihe Institut Arbeit und Wirtschaft 23/2017. http://hdl.handle.net/10419/172756
O’Donoghue D, Gleave B (2004) A Note on Methods for Measuring Industrial Agglomeration. Regional Studies 38[4]: 419–427. CrossRef.
OECD (2019) OECD Territorial Reviews. Website. https://www.oecd-ilibrary.org/fr/urban-rural-and-regional-development/oecd-territorial-reviews_19900759
Palan N (2017) Konzentrations- und Ungleichheitsindizes: ein methodischer Überblick sowie ein empirischer Vergleich anhand der Textilindustrie. Zeitschrift für Wirtschaftsgeographie 61[3-4]: 135–155. CrossRef.
Peña Carrera L (2002) Tracing accessibility over time: two swiss case studies. Technical report. http://hdl.handle.net/2099.1/6327
Petrakos G, Psycharis Y (2016) The spatial aspects of economic crisis in Greece. Cambridge Journal of Regions, Economy and Society 9[1]: 137–152. CrossRef.
Planungsgruppe MWM (2009) Flächennutzungsplanung Gemeinde Wachtberg - Fachbeitrag Arbeiten. Report. http://www.wachtberg.de/imperia/md/content/cms127/gemeindeentwicklung/fnp-fb-arbeiten-24-02-2009.pdf
Pooler J (1987) Measuring geographical accessibility: a review of current approaches and problems in the use of population potentials. Geoforum 18[3]: 269 – 289. CrossRef.
Porter ME (1990) The Competitive Advantage of Nations. Free Press
Portnov BA, Felsenstein D (2005) Measures of Regional Inequality for Small Countries. In: Felsenstein D, Portnov B (eds), Regional Disparities in Small Countries. 47–62. CrossRef.
Portnov BA, Felsenstein D (2010) On the suitability of income inequality measures for regional analysis: Some evidence from simulation analysis and bootstrapping tests. Socio-Economic Planning Sciences 44[4]: 212–219. CrossRef.
Puente S (2017) Regional convergence in Spain: 1980-2015. Research report. Economic Bulletin 3/2017, Banco de Espana
R Core Team (2018a) R: A Language and Environment for Statistical Computing. Software, Vienna, Austria. https://www.R-project.org/
R Core Team (2018b) The R Manuals. Manual. https://cran.r-project.org/manuals.html
Reggiani A, Bucci P, Russo G (2011) Accessibility and Impedance Forms: Empirical Applications to the German Commuting Network. International Regional Science Review 34[2]: 230–252. CrossRef.
Ricardo D (1821) On the Principles of Political Economy and Taxation (3rd ed.). McMaster University Archive for the History of Economic Thought
Ripley BD (1976) The second-order analysis of stationary point processes. Journal of Applied Probability 13[2]: 255–266. CrossRef.
RStudio Team (2016) RStudio: Integrated Development Environment for R. Software, RStudio, Inc., Boston, MA. http://www.rstudio.com/
Schmidt H (1997) Konvergenz wachsender Volkswirtschaften. Theoretische und empirische Konzepte sowie eine Analyse der Produktivitätsniveaus westdeutscher Regionen, Volume 152 of Wirtschaftswissenschaftliche Beiträge. Springer
Schönebeck C (1996) Wirtschaftsstruktur und Regionalentwicklung : theoretische und empirische Befunde für die Bundesrepublik Deutschland, Volume 75 of Dortmunder Beiträge zur Raumplanung Blaue Reihe. IRPUD
Schätzl L (2000) Wirtschaftsgeographie 2: Empirie (3rd ed.). Schöningh
Smith TE (2016) Notebook on Spatial Data Analysis. Technical report. http://www.seas.upenn.edu/~ese502/#notebook
Spiekermann K, Wegener M (2008) Modelle in der Raumplanung I: 4. Input-Output-Modelle. Presentation, Lecture “Modelle in der Raumplanung” WS 2008/2009. http://www.spiekermann-wegener.de/mir/pdf/MIR1_4_111108.pdf
Stark KD, Velsinger P, Bauer M, Bonny HW, Kricke J, Schwetlick D, Striedel HD (1981) Flächenbedarfsberechnung für Gewerbe- und Industrieansiedlungsbereiche: GIFPRO. Number 4.029 in Schriftenreihe Landes- und Stadtentwicklungsforschung des Landes Nordrhein-Westfalen. ILS, Dortmund
Statistisches Bundesamt (2008) German Classification of Economic Activities, Edition 2008. Dataset (XLS). https://www.destatis.de/DE/Methoden/Klassifikationen/GueterWirtschaftklassifikationen/klassifikationWZ08englisch.xls
Störmann W (2009) Regionalökonomik. Theorie und Praxis. Oldenbourg, Munich
Taylor JK, Cihon C (2004) Statistical Techniques for Data Analysis (2nd ed.). Taylor and Francis
Theil H (1967) Economics and information theory. North-Holland
Tian Z (2013) Measuring agglomeration using the standardized location quotient with a bootstrap method. Journal of Regional Analysis and Policy 43[2]: 186–197
Vallée D, Witte A, Brandt T, Bischof T (2012) Bedarfsberechnung für die Darstellung von Allgemeinen Siedlungsbereichen (ASB) und Gewerbe- und Industrieansiedlungsbereichen (GIB) in Regionalplänen. Research report, Staatskanzlei des Landes Nordrhein-Westfalen. https://www.wirtschaft.nrw/sites/default/files/asset/document/lep_nrw_flaechenbedarf_endbericht_endfassung_04122012.pdf
Vogiatzoglou K (2006) Increasing agglomeration or dispersion? Industrial specialization and geographic concentration in NAFTA. Journal of Economic Integration 21[2]: 379–396
von Neumann J, Kent RH, Bellinson HR, Hart BI (1941) The Mean Square Successive Difference. The Annals of Mathematical Statistics 12[2]: 153–162. CrossRef.
Weddige-Haaf K, Kool C (2017) Determinants of regional growth and convergence in Germany. Discussion paper. Discussion Paper Series 17-12, Utrecht University School of Economics
Wieland T (2019) REAT: Regional Economic Analysis Toolbox. R package version 3.0.1. Software. https://CRAN.R-project.org/package=REAT
Wieland T, Dittrich C (2016) Bestands- und Erreichbarkeitsanalyse regionaler Gesundheitseinrichtungen in der Gesundheitsregion Göttingen. Research report, Georg-August-Universität Göttingen, Geographisches Institut, Abteilung Humangeographie. http://webdoc.sub.gwdg.de/pub/mon/2016/3-wieland.pdf
Wieland T, Fuchs H (2018) Regionalökonomische Disparitäten im Spiegel von Raumtypisierungen. Ein Konzept zur Identifikation strukturell benachteiligter Gebiete in Südtirol (Italien). Standort - Zeitschrift für Angewandte Geographie 42[3]: 152–163. CrossRef.
Williamson JG (1965) Regional Inequality and the Process of National Development: A Description of the Patterns. Economic Development and Cultural Change 13[4]: 1–84
Yamamura S, Goto H (2018) Location patterns and determinants of knowledge-intensive industries in the Tokyo Metropolitan Area. Japan Architectural Review 1[4]: 443–456. CrossRef.
Young AT, Higgins MJ, Levy D (2008) Sigma Convergence versus Beta Convergence: Evidence from U.S. County-Level Data. Journal of Money, Credit and Banking 40[5]: 1083–1093. CrossRef.