| Title: | Computed ABC Analysis |
|---|---|
| Description: | Identify the most relative data points by dividing a numeric data set into three classes A, B, and C, where class A items are the "import few", class C items are the "trivial many" with class B items being something in between, resembling the idea of the Pareto principle. This ABC classification is done using an ABC curve, which plots cumulative "Yield" against "Effort", similar to a Lorenz curve. Class borders are then precisely mathematically defined on that curve, aiding in interpretation. Based on: Ultsch A, Lotsch J (2015) "Computed ABC Analysis for rational Selection of most informative Variables in multivariate Data". PLoS ONE 10(6): e0129767. <doi:10.1371/journal.pone.0129767>. |
| Authors: | Jorn Lotsch [aut] (ORCID: <https://orcid.org/0000-0002-5818-6958>), André Himmelspach [aut, cre] (ORCID: <https://orcid.org/0009-0009-9857-227X>) |
| Maintainer: | André Himmelspach <[email protected]> |
| License: | GPL-3 |
| Version: | 1.0 |
| Built: | 2026-06-05 06:13:58 UTC |
| Source: | https://github.com/andrehdev/cabc_analysis |
Divides a numeric dataset into three classes (A, B, and C) using ABC analysis. The classification is based on geometric properties of the ABC curve and identifies regions of high, balanced, and low efficiency. Class interpretation:
| A: | Low effort, high yield (Pareto items) |
| B: | Balanced effort and yield |
| C: | High effort, low yield (submarginal items) |
cABC_analysis(Data, PlotIt = FALSE, useGGPlot = TRUE)cABC_analysis(Data, PlotIt = FALSE, useGGPlot = TRUE)
Data |
Positive numeric vector which is not uniformly distributed. If matrix or dataframe then the first column will be used. |
PlotIt |
Logical. If |
useGGPlot |
Logical, default |
Calculation of Boundaries is done on the ABC Curve
(see cABC_curve) with:
| Pareto Point: | The point with minimal distance to (0,1) -> A|B Boundary |
| Breakeven Point: | The point where slope equals to 1 |
| Juren Point: | The point with minimal distance to (BreakevenPoint_x,1) -> B|C Boundary |
For more calculation details see: Ultsch A, Lotsch J (2015) "Computed ABC Analysis for rational Selection of most informative Variables in multivariate Data". PLoS ONE 10(6): e0129767. <doi:10.1371/journal.pone.0129767>.
Data cleaning: Before classification, non-numeric values and
NAs are coerced to 0, negative values are set to 0.
A warning is issued when items are converted. If a matrix or data frame is
supplied, only the first column is used.
Degenerate inputs (single point, all-identical values, very small datasets)
are caught before curve fitting, see cABC_handle_specials for
the full behavior. Boundary duplicate values that span two classes after
classification are resolved by cABC_postprocess_classes.
In both cases a warning is issued when a special case is triggered.
A list containing:
Integer vectors of indices (into the original
Data) for items assigned to classes A, B, and C respectively.
In special-case returns (single point or all-identical), only
Aind is populated; Bind and Cind are
integer(0).
Logical; TRUE if the Pareto point and
Break-even point were swapped to maintain coordinate logic (i.e. the
Break-even point was to the left of the Pareto point on the curve).
c(x, y) coordinates for the Pareto point (A),
the Break-even point (B), and the Submarginal point (C).
NULL in special-case returns.
Cumulative yield at the boundary of Class A.
NULL in special-case returns.
Cumulative yield at the boundary of Class B.
NULL in special-case returns.
Index of the A boundary in the
interpolated [p, ABC] curve. NULL in special-case
returns.
Index of the C boundary in the
interpolated [p, ABC] curve. NULL in special-case
returns.
Numeric vector of effort values (x-axis) of the interpolation
curve. NULL in special-case returns.
Numeric vector of yield values (y-axis) of the interpolation
curve. NULL in special-case returns.
Data value closest to the threshold separating Class A
from Class B. NULL in special-case returns.
Data value closest to the threshold separating Class B
from Class C. NULL in special-case returns.
André Himmelspach (01/2026)
data("SwissInhabitants") abc <- cABC_analysis(SwissInhabitants, PlotIt = TRUE) # Extract the data belonging to each class A <- abc$Aind; B <- abc$Bind; C <- abc$Cind Agroup <- SwissInhabitants[A] Bgroup <- SwissInhabitants[B] Cgroup <- SwissInhabitants[C]data("SwissInhabitants") abc <- cABC_analysis(SwissInhabitants, PlotIt = TRUE) # Extract the data belonging to each class A <- abc$Aind; B <- abc$Bind; C <- abc$Cind Agroup <- SwissInhabitants[A] Bgroup <- SwissInhabitants[B] Cgroup <- SwissInhabitants[C]
Number of inhabitants in the 2896 villages of Switzerland in the year 1900.
data("SwissInhabitants")data("SwissInhabitants")
A numeric vector of length 2896 containing the population counts.
This data set consists of the number of inhabitants in the 2896 communes (cities and villages) in Switzerland in 1900. The data is unordered for anonymity reasons.
Schuler, M., Ullmann, D. (2002). Eidgenossische Volkszahlung: Bevoelkerungsentwicklung der Gemeinden. Bundesamt fur Statistik, Neuchatel, Switzerland.
Behnisch, M., Ultsch, A. (2010). Population Patterns in Switzerland 1850-2000. In: Gaul, W. et al (Eds), Advances in Data Analysis, Data Handling and Business Intelligence, Springer, Heidelberg, pp. 163-173.
data(SwissInhabitants) summary(SwissInhabitants)data(SwissInhabitants) summary(SwissInhabitants)