Most things come in stocks. Earth itself is a set of stocks : one of land, another of water, still another of air. Thus, a wide range of phenomena can be viewed as processes of stock depletion: an epidemic spreading through a stock of healthy, infectable population; economies extracting oil, copper or phosphat from finite stocks of minable resources; adoption of a technology by a market made up of a stock of enterprises or consumers.
Although stocks may increase, they are not limitless, unless feeding inflows persistently exceed the outflows that cause stocks to shrink. This is unfortunately neither the case of the earth, nor of the savings account of the common person. Furthermore, some stocks evolve naturally only on a geological time-scale – such as is the case for oil.
Limited stocks place a ceiling on growth, a phenomenon that we can observe in numerous areas. Our planet cannot accommodate ever-growing billions of people. An epidemic will stop spreading and will recede once its potential victims have succumbed or become immune. Selling cars, mobile phones, watches, computers, or what have you will eventually slow down and reach a plateau after potential buyers have bought what they wanted and only get new stuff to replace the old. We cannot pump oil or freshwater from the ground forever – some day wells will dry out. Important questions are then : how fast does a stock deplete over time? Can one forecast how much is left of a stock?
Such growth (or alternatively, stock depletion) processes cannot be mathematically represented by a constant growth or linear function – commonly used in econometric models. Instead, they require a varying or non-linear growth function; the logistic (or sigmoid or S-curve) function has been found to be well-adapted to describe such phenomena. The typical example of a mold that initially spreads in a culture at a rate of 100% (i.e. it doubles in size each period of time) would, in a constant growth model, cover its environment in 7 periods. The S-curve shows that it will do so only at the 15th period.
The logistic function shows an initial exponential growth until the inflection point, and an exponential decay from then on until reaching the upper asymptote (i.e. exhausting the stock). Hence the typical S-shaped pattern which gave the curve its popular name.
Obviously, the S-curve tool is not a crystal ball, it neither foretells the future, nor is it a fully dependable representation of reality — non linearity does not always yield to the elegance of the simple logistic function. It is however a practical help in understanding how things might develop, how possible futures may look like.
S-curve calculator : 1 parameter estimate
The solution of the simple logistic curve is given by the formula :
The parameters are: upper asymptote M (i.e. maximum stock, saturation, carrying capacity), coefficient of growth c, lower asymptote n₀ (i.e. initial stock, initial value), and time t.
Only one parameter estimation — the saturation value M — is required, because t and n₀ are provided by the known data, and the calculator itself computes c by the ordinary least squares method — try areppim's S-Curve Calculator.
Example
After you saw a TV ad for a magnificent deep blue water lily with large floating leaves capable of multiplying very rapidly, you decided to have a pond built in your garden to grow water lilies. You started with 2 plants, which became 6 by the end of 8 days. Each lily spreads on average to 60 cm, thus covering an area (Pi*r²) of 2,827 cm². The pond being 4m x 3m (12 m² or 120,000 cm²) it can carry 42 lilies (120,000 ÷ 2827 = 42). To determine when you should invite your friends to admire the wondrous pond, enter the following in areppim's S-Curve calculator:
Start time : 1
End time : 8
Value at the beginning : 2
Value at the end : 6
Forecast horizon : 45 (or whatever horizon you wish to test)
Estimated maximum value : 42 (carrying capacity of the pond)
The calculator produces the S-Curve values that are represented graphically by the chart :
Lilies reproduce exponentially at 17% per day until day 19, when they reach the midpoint, then slow down until reaching the limit (carrying capacity). The view of the pond will dazzle your guests after only 3 weeks, provided of course that such water lilies exist.
S-curve calculator : 3 parameter estimates
The 1-parameter simple logistics solution is fast and easy, but it suffers from the limitations inherent to the fitting of a nonlinear function by the ordinary least squares method so useful for linear models. The procedure produces a mediocre fit of the calculated to the observed data, leaving a wide dispersion of residuals. In the case of the S-curve, it rises the risk of overestimating the saturation value, and of misplacing the inflection point of the series.
To mitigate the risk, we can scale the saturation down to, say 90%, and estimate the time to grow from 10% to 90%. The solution is given by :
for M : the saturation point; Δ t : the interval during which growth progresses from 10% to 90% of the saturation; and tm : the midpoint, or the time when nt = M ÷ 2.
Small deviations around the midpoint can heavily impact the estimation of the saturation point. Since the linear regression cannot provide a closed-form expression for the coefficient values, we must approximate the nonlinear model by a succession of iterations of the linear one. Beginning with a tentative solution, we refine the parameters to improve the fit, until improvement becomes marginally irrelevant. To achieve this, we may elect various procedures :
Trial and error : the parameters of the model are adjusted by hand until the fit between observed and calculated data gives satisfaction.
Mathematical methods : e.g. resampling (for example, applying Monte Carlo to the distribution of the residuals from the least squares), Bayesian parameter estimation or other — try areppim's S-Curve Calculator.
Example
Available historical data show how the number of mobile phone devices grew worldwide from 0.02 million in 1980 to 5,972 million devices in 2011. By successive iterations we estimate the 3 parameters as follows :
Saturation M = 9,100 million.
Midpoint tm = 2009.
Growth time Δ t = 15.
The calculator returns the S-Curve values that are represented graphically by the chart :
The fit between the observed and the model data is good. As relates to the forecast, we can consider that, provided the future behaves similarly to the past, which may or not be the case, growth is already decelerating and saturation is reached by the early 2020s.
Bi-logistic (double S-curve) and multi-logistic regression
Up to this point, we considered the logistic growth of one single process, from its inception, through its exponential rise to the midpoint, and the ensuing exponential decay to the saturation plateau.
However, many growth processes may be more complex. We can think of some familiar instances, such as the growth of the market for mobile phones. The single logistic regression provides a good fit to the actual data until 2016. Then, an upward surge suggests a fresh exponential thrust, maybe the take off of a sequential S curve, likely caused by the market success of the new gadget-loaded smart-phones. Sequential S curves are the dream of all product managers who crave for extended life cycles for the products under their responsibility.
Another multiple growth case is illustrated by an epidemic spread. When the rate of new infections seems to slow down and approach its asymptote, suggesting that the situation gets under control, a new surge triggers an upturn of the number of contagions, signaling a further dissemination of the disease. The situation may be prompted by a mutation of the the pathogen strain, by the inefficient efforts to contain the disease, by the collapse of the health care system, by other factors or a combination thereof. The chart of the Covid-19 pandemic cases in Switzerland shows such an upturn by mid June 2020.
We can analyze such growth cases by building a model consisting of the sum of two 3-parameter logistic curves. The parameters for each component logistic curve may be estimated from the time series, as suggested earlier for the case of a single logistic function:
For even more complex cases, where we detect several waves of change, we can extend the method to a multi-logistic function, by adding the appropriate number of single logistic functions, say three or four, to achieve the best fit of the curve to the actual data. The solution is given by the equation:
The estimation of the different sets of 3 parameters becomes more cumbersome, requiring therefore dedicated tools.