fet.explorer

Features explorer.

class fet.explorer.Explorer(y=None)

Bases: object

Dataset explorer.

Parameters

y (str, optional) – Target/dependent variable. Defaults to None.

boxenplot(cols=None, return_fig=False, **kwargs)

Plot boxen plot - enhanced box plot.

Parameters
  • cols (list, optional) – List of columns to include. Defaults to None - which includes all feature columns.

  • return_fig (bool, optional) – Return matplotlib figure. Defaults to False.

  • **kwargs – Other keyword arguments for seaborn.boxenplot.

Returns

Returns figure if return_fig=True.

Return type

matplotlib.figure.Figure

boxplot(cols=None, return_fig=False, **kwargs)

Plot box plots.

Parameters
  • cols (list, optional) – List of columns to include. Defaults to None - which includes all feature columns.

  • return_fig (bool, optional) – Return matplotlib figure. Defaults to False.

  • **kwargs – Other keyword arguments for seaborn.boxplot.

Returns

Returns figure if return_fig=True.

Return type

matplotlib.figure.Figure

correlated_features(threshold=0.95)

Display correlated features and their correlation coefficient.

Parameters

threshold (float, optional) – Absolute correlation threshold. Defaults to 0.95.

correlation_matrix()

Plot correlation matrix of feature columns.

ecdfplot(cols=None, return_fig=False, **kwargs)

Plot empirical cumulative distribution functions.

Parameters
  • cols (list, optional) – List of columns to include. Defaults to None - which includes all feature columns.

  • return_fig (bool, optional) – Return matplotlib figure. Defaults to False.

  • **kwargs – Other keyword arguments for seaborn.ecdfplot.

Returns

Returns figure if return_fig=True.

Return type

matplotlib.figure.Figure

feature_importances(clf=None)

Evaluate feature importances using classifier.

Parameters

clf (object, optional) – Instantiated classifier. Defaults to None - which uses ExtraTreesClassifier.

Raises

ValueError – Nothing to classify w/o target variable.

Returns

Sorted list of tuples (feature, importance).

Return type

list

feature_scores(score_func=None)

Evaluate feature scores using scoring function.

Parameters

score_func ([callable], optional) – Scoring function. Defaults to None - which uses f_classif.

Raises

ValueError – Nothing to classify w/o target variable.

Returns

Sorted list of tuples (feature, score).

Return type

list

fit(df, remove_low_variance=True, module='pstats')

Fit DataFrame to Explorer.

Parameters
  • df (pandas.DataFrame) – DataFrame to explore.

  • remove_low_variance (bool, optional) – Remove low variance features. Defaults to True.

  • module (string, optional) – Features extraction module. Defaults to ‘pstats’.

histplot(cols=None, return_fig=False, **kwargs)

Plot univariate histograms.

Parameters
  • cols (list, optional) – List of columns to include. Defaults to None - which includes all feature columns.

  • return_fig (bool, optional) – Return matplotlib figure. Defaults to False.

  • **kwargs – Other keyword arguments for seaborn.histplot.

Returns

Returns figure if return_fig=True.

Return type

matplotlib.figure.Figure

kbest(k, score_func=None)

Select k highest features according to scoring function.

Parameters
  • k (int) – Number of top features to select.

  • score_func ([callable], optional) – Scoring function. Defaults to None - which uses f_classif.

Raises

ValueError – Nothing to classify w/o target variable.

Returns

Unsorted list of k best features.

Return type

list

kdeplot(cols=None, return_fig=False, **kwargs)

Plot univariate kernel density estimations.

Parameters
  • cols (list, optional) – List of columns to include. Defaults to None - which includes all feature columns.

  • return_fig (bool, optional) – Return matplotlib figure. Defaults to False.

  • **kwargs – Other keyword arguments for seaborn.kdeplot.

Returns

Returns figure if return_fig=True.

Return type

matplotlib.figure.Figure

pairplot(cols)

Plot pairwise plot of feature columns (and target variable if present).

Parameters

cols (list) – List of columns to include.

plot_feature_importances(clf=None)

Plot feature importances using classifier.

Parameters

clf (object, optional) – Instantiated classifier. Defaults to None - which uses ExtraTreesClassifier.

plot_feature_scores(clf=None)

Plot feature scores using scoring function.

Parameters

clf (object, optional) – Instantiated classifier. Defaults to None - which uses ExtraTreesClassifier.

plot_pca()

Plot scatterplot of the first two principal components.

remove_features(cols)

Remove features from feature vector.

Parameters

cols (list) – List of column names to remove.

remove_low_variance(threshold=0)

Remove low variance features.

Parameters

threshold (int, optional) – Variance threshold. Defaults to 0.

stripplot(cols=None, return_fig=False, **kwargs)

Plot scatter plots.

Parameters
  • cols (list, optional) – List of columns to include. Defaults to None - which includes all feature columns.

  • return_fig (bool, optional) – Return matplotlib figure. Defaults to False.

  • **kwargs – Other keyword arguments for seaborn.stripplot.

Returns

Returns figure if return_fig=True.

Return type

matplotlib.figure.Figure

violinplot(cols=None, return_fig=False, **kwargs)

Plot violin plots - combination of boxplot and kde.

Parameters
  • cols (list, optional) – List of columns to include. Defaults to None - which includes all feature columns.

  • return_fig (bool, optional) – Return matplotlib figure. Defaults to False.

  • **kwargs – Other keyword arguments for seaborn.violinplot.

Returns

Returns figure if return_fig=True.

Return type

matplotlib.figure.Figure