GitHunt
JO

JonasMoss/kdensity

An R package for kernel density estimation with parametric starts and asymmetric kernels.

kdensity

R build status
CRAN_Status_Badge
DOI

An R package for univariate kernel density estimation with parametric
starts and asymmetric kernels.

News

kdensity is now linked to univariateML, meaning it supports the
approximately 30+ parametric starts from that package!

Overview

kdensity is an implementation of univariate kernel density estimation
with support for parametric starts and asymmetric kernels. Its main
function is kdensity, which is has approximately the same syntax as
stats::density. Its new functionality is:

  • kdensity has built-in support for many parametric starts, such
    as normal and gamma, but you can also supply your own. For a
    list of supported parametric starts, see the readme of
    univariateML.
  • It supports several asymmetric kernels ones such as gcopula and
    gamma kernels, but also the common symmetric ones. In addition,
    you can also supply your own kernels.
  • A selection of choices for the bandwidth function bw, again
    including an option to specify your own.
  • The returned value is density function. This can be used for
    e.g. numerical integration, numerical differentiation, and point
    evaluations.

A reason to use kdensity is to avoid boundary bias when estimating
densities on the unit interval or the positive half-line. Asymmetric
kernels such as gamma and gcopula are designed for this purpose. The
support for parametric starts allows you to easily use a method that is
often superior to ordinary kernel density estimation.

Several R packages deal with kernel estimation. For an overview see
Deng & Hadley Wickham
(2011)
. While no
other R package handles density estimation with parametric starts,
several packages supports methods that handle boundary bias.
evmix provides
a variety of boundary bias correction methods in the bckden function.
kde1d corrects for boundary bias
using transformed univariate local polynomial kernel density estimation.
logKDE corrects for
boundary bias on the half line using a logarithmic transform.
ks supports boundary
correction through the kde.boundary function, while
Ake corrects for boundary
bias using tailored kernel functions.

Installation

From inside R, use one of the following commands:

# For the CRAN release
install.packages("kdensity")
# For the development version from GitHub:
# install.packages("devtools")
devtools::install_github("JonasMoss/kdensity")

Usage Example

Call the library function and use it just like stats::density, but
with optional additional arguments.

library("kdensity")
plot(kdensity(mtcars$mpg, start = "normal"))

Description

Kernel density estimation with a parametric start was introduced by
Hjort and Glad in Nonparametric Density Estimation with a Parametric
Start (1995)
. The idea
is to start out with a parametric density before you do your kernel
density estimation, so that your actual kernel density estimation will
be a correction to the original parametric estimate. The resulting
estimator will outperform the ordinary kernel density estimator in terms
of asymptotic integrated mean squared error whenever the true density is
close to your suggestion; and the estimator can be superior to the
ordinary kernel density estimator even when the suggestion is pretty far
off.

In addition to parametric starts, the package implements some
asymmetric kernels. These kernels are useful when modelling data with
sharp boundaries, such as data supported on the positive half-line or
the unit interval. Currently we support the following asymmetric
kernels:

These features can be combined to make asymmetric kernel densities
estimators with parametric starts, see the example below. The package
contains only one function, kdensity, in addition to the generics
plot, points, lines, summary, and print.

Usage

The function kdensity takes some data, a kernel kernel and a
parametric start start. You can optionally specify the support
parameter, which is used to find the normalizing constant.

The following example uses the data set. The black curve is a
gamma-kernel density estimate with a gamma start, the red curve a fully
parametric gamma density and and the blue curve an ordinary density
estimate. Notice the boundary bias of the ordinary density estimator.
The underlying parameter estimates are always maximum likelilood.

library("kdensity")
kde = kdensity(airquality$Wind, start = "gamma", kernel = "gamma")
plot(kde, main = "Wind speed (mph)")
lines(kde, plot_start = TRUE, col = "red")
lines(density(airquality$Wind, adjust = 2), col = "blue")
rug(airquality$Wind)

Since the return value of kdensity is a function, kde is callable
and can be used as any density function in R (such as stats::dnorm).
For example, you can do:

kde(10)
#> [1] 0.09980471
integrate(kde, lower = 0, upper = 1) # The cumulative distribution up to 1.
#> 1.27532e-05 with absolute error < 2.2e-19

You can access the parameter estimates by using coef. You can also
access the log likelihood (logLik), AIC and BIC of the parametric
start distribution.

coef(kde)
#> Maximum likelihood estimates for the Gamma model 
#>  shape    rate  
#> 7.1873  0.7218
logLik(kde)
#> 'log Lik.' 12.33787 (df=2)
AIC(kde)
#> [1] -20.67574

How to Contribute or Get Help

If you encounter a bug, have a feature request or need some help, open a
Github issue. Create a
pull requests to contribute. This project follows a Contributor Code of
Conduct
.

References

Languages

R89.7%Rez5.2%TeX5.1%

Contributors

Other
Created February 7, 2018
Updated March 4, 2025