Reference¶
This page documents the public spur-python API.
Overview¶
| Function | Description |
|---|---|
spur() |
Run the full SPUR workflow and return diagnostics plus levels and transformed branches |
spurtest() |
Wrapper to run one of the four SPUR diagnostic tests |
spurtest_i0() |
Variable-level I(0) test |
spurtest_i1() |
Variable-level I(1) test |
spurtest_i0resid() |
Residual-based I(0) test for a regression formula |
spurtest_i1resid() |
Residual-based I(1) test for a regression formula |
spurtransform() |
Transform all variables referenced in a formula |
spurhalflife() |
Estimate a spatial half-life confidence interval |
load_chetty_data() |
Load the packaged Chetty dataset |
standardize() |
Standardize selected columns in a DataFrame |
Conventions¶
Coordinates¶
Functions that require spatial coordinates accept exactly one of:
lonandlatfor geographic coordinates in degreescoords_euclideanfor planar coordinates
Do not pass both specifications in the same call.
Simulation Controls¶
The SPUR tests and half-life interval are simulation-based.
q: number of low-frequency weights used in the statistic; must satisfy1 <= q < nnrep: number of Monte Carlo draws used to simulate the reference distributionseed: random seed used for reproducibility
Formula Inputs¶
- Variable-level diagnostics accept either a bare variable name such as
"am"or a one-variable formula such as"am ~ 1" - Residual diagnostics, transformation, and the full pipeline accept a
two-sided formula such as
"am ~ gini + fracblack"
Full Workflow¶
spur()¶
Run the full SPUR workflow.
Signature
spur(
formula,
data,
*,
lon=None,
lat=None,
coords_euclidean=None,
q=15,
nrep=100000,
seed=42,
avc=0.03,
uncond=False,
cvs=False,
)
Parameters
formula(str): two-sided regression formula. The left-hand side is the dependent variable. The right-hand side defines the regressors used in both the levels and transformed branches.data(pandas.DataFrame): input data containing the variables referenced informulaand the coordinate columns.lon,lat(str | None): longitude and latitude column names for geographic distance calculations.coords_euclidean(Sequence[str] | None): Euclidean coordinate column names. Use instead oflonandlat.q,nrep,seed: SPUR diagnostic simulation controls.avc(float): upper bound on average pairwise correlation passed through toscpc().uncond(bool): passed through toscpc(). IfTrue, use the unconditional SCPC branch.cvs(bool): passed through toscpc(). IfTrue, store additional critical values in the SCPC result.
Returns
PipelineResult: full pipeline output.tests:Testsobject containingi0,i1,i0resid, andi1residfits:Fitsobject containinglevelsandtransformed- each fit is a
RegressionResultwithmodelandscpc
Diagnostics¶
spurtest()¶
Dispatch to one of the four public SPUR tests.
Signature
spurtest(
formula,
data,
*,
test,
lon=None,
lat=None,
coords_euclidean=None,
q=15,
nrep=100000,
seed=42,
)
Parameters
formula(str): variable name or formula, depending ontest.- for
test="i0"andtest="i1":"am"or"am ~ 1" - for
test="i0resid"andtest="i1resid": two-sided regression formula data(pandas.DataFrame): input data.test(Literal["i0", "i1", "i0resid", "i1resid"]): diagnostic to run.lon,lat,coords_euclidean,q,nrep,seed: as above.
Returns
TestResult
spurtest_i0()¶
Variable-level I(0) test.
Signature
spurtest_i0(
var,
data,
*,
lon=None,
lat=None,
coords_euclidean=None,
q=15,
nrep=100000,
seed=42,
)
Parameters
var(str): variable to test.data(pandas.DataFrame): input data containingvarand the coordinate columns.lon,lat,coords_euclidean,q,nrep,seed: as above.
Returns
TestResulttest_typeLRpvaluecvha_param
spurtest_i1()¶
Variable-level I(1) test.
Signature
spurtest_i1(
var,
data,
*,
lon=None,
lat=None,
coords_euclidean=None,
q=15,
nrep=100000,
seed=42,
)
Parameters
var(str): variable to test.data(pandas.DataFrame): input data containingvarand the coordinate columns.lon,lat,coords_euclidean,q,nrep,seed: as above.
Returns
TestResult
spurtest_i0resid()¶
Residual-based I(0) test for a regression formula.
Signature
spurtest_i0resid(
formula,
data,
*,
lon=None,
lat=None,
coords_euclidean=None,
q=15,
nrep=100000,
seed=42,
)
Parameters
formula(str): two-sided regression formula defining the residual process to be tested.data(pandas.DataFrame): input data.lon,lat,coords_euclidean,q,nrep,seed: as above.
Returns
TestResult
spurtest_i1resid()¶
Residual-based I(1) test for a regression formula.
Signature
spurtest_i1resid(
formula,
data,
*,
lon=None,
lat=None,
coords_euclidean=None,
q=15,
nrep=100000,
seed=42,
)
Parameters
formula(str): two-sided regression formula defining the residual process to be tested.data(pandas.DataFrame): input data.lon,lat,coords_euclidean,q,nrep,seed: as above.
Returns
TestResult
Transformation¶
spurtransform()¶
Transform all variables referenced in a formula and append the transformed columns to a copy of the input data.
Signature
spurtransform(
formula,
data,
*,
lon=None,
lat=None,
coords_euclidean=None,
prefix="h_",
transformation="lbmgls",
radius=None,
clustvar=None,
)
Parameters
formula(str): formula identifying the variables to transform. Every variable referenced in the formula is transformed.data(pandas.DataFrame): input data.lon,lat,coords_euclidean: coordinate specification for distance-based transformations.prefix(str): prefix applied to transformed variable names. Default:h_.transformation(str): transformation mode."lbmgls": GLS-style transformation; default branch used byspur()"nn": nearest-neighbor differencing"iso": radius-based differencing"cluster": within-cluster demeaningradius(float | None): radius used by the isotropic transformation; required whentransformation="iso".clustvar(str | None): cluster label column; required whentransformation="cluster".
Returns
pandas.DataFrame: copy of the input data with transformed columns appended
Persistence¶
spurhalflife()¶
Estimate a confidence interval for the spatial half-life of a variable.
Signature
spurhalflife(
var,
data,
*,
lon=None,
lat=None,
coords_euclidean=None,
q=15,
nrep=100000,
level=95,
normdist=False,
seed=42,
)
Parameters
var(str): variable to analyze.data(pandas.DataFrame): input data containingvarand the coordinate columns.lon,lat,coords_euclidean,q,nrep,seed: as above.level(float): confidence level in percent.normdist(bool): ifTrue, report the interval as a fraction of the maximum pairwise distance; otherwise report it in distance units.
Returns
HalfLifeResultci_lowerci_uppermax_distlevelnormdist
Helpers¶
load_chetty_data()¶
Load the packaged Chetty dataset.
Signature
load_chetty_data()
Returns
pandas.DataFrame
standardize()¶
Standardize selected columns and return a new DataFrame.
Signature
standardize(
df,
vars,
appendix=None,
)
Parameters
df(pandas.DataFrame): input data.vars(list[str]): columns to standardize.appendix(str | None): suffix applied to the standardized columns. IfNone, overwrite the original columns.
Returns
pandas.DataFrame: copy of the input data with standardized columns
Return Objects¶
TestResult¶
Returned by spurtest(), spurtest_i0(), spurtest_i1(),
spurtest_i0resid(), and spurtest_i1resid().
Fields
test_typeLRpvaluecvha_param
Methods
summary()
HalfLifeResult¶
Returned by spurhalflife().
Fields
ci_lowerci_uppermax_distlevelnormdist
Methods
summary()
PipelineResult¶
Returned by spur().
Fields
testsfits
tests contains:
i0i1i0residi1resid
fits contains:
levelstransformed
Each fit is a RegressionResult with:
modelscpc
Methods
summary()