fet.explorer¶
Features explorer.
-
class
fet.explorer.
Explorer
(y=None)¶ Bases:
object
Dataset explorer.
- Parameters
y (str, optional) – Target/dependent variable. Defaults to None.
-
boxenplot
(cols=None, return_fig=False, **kwargs)¶ Plot boxen plot - enhanced box plot.
- Parameters
cols (list, optional) – List of columns to include. Defaults to None - which includes all feature columns.
return_fig (bool, optional) – Return matplotlib figure. Defaults to False.
**kwargs – Other keyword arguments for seaborn.boxenplot.
- Returns
Returns figure if return_fig=True.
- Return type
matplotlib.figure.Figure
-
boxplot
(cols=None, return_fig=False, **kwargs)¶ Plot box plots.
- Parameters
cols (list, optional) – List of columns to include. Defaults to None - which includes all feature columns.
return_fig (bool, optional) – Return matplotlib figure. Defaults to False.
**kwargs – Other keyword arguments for seaborn.boxplot.
- Returns
Returns figure if return_fig=True.
- Return type
matplotlib.figure.Figure
Display correlated features and their correlation coefficient.
- Parameters
threshold (float, optional) – Absolute correlation threshold. Defaults to 0.95.
-
correlation_matrix
()¶ Plot correlation matrix of feature columns.
-
ecdfplot
(cols=None, return_fig=False, **kwargs)¶ Plot empirical cumulative distribution functions.
- Parameters
cols (list, optional) – List of columns to include. Defaults to None - which includes all feature columns.
return_fig (bool, optional) – Return matplotlib figure. Defaults to False.
**kwargs – Other keyword arguments for seaborn.ecdfplot.
- Returns
Returns figure if return_fig=True.
- Return type
matplotlib.figure.Figure
-
feature_importances
(clf=None)¶ Evaluate feature importances using classifier.
- Parameters
clf (object, optional) – Instantiated classifier. Defaults to None - which uses ExtraTreesClassifier.
- Raises
ValueError – Nothing to classify w/o target variable.
- Returns
Sorted list of tuples (feature, importance).
- Return type
list
-
feature_scores
(score_func=None)¶ Evaluate feature scores using scoring function.
- Parameters
score_func ([callable], optional) – Scoring function. Defaults to None - which uses f_classif.
- Raises
ValueError – Nothing to classify w/o target variable.
- Returns
Sorted list of tuples (feature, score).
- Return type
list
-
fit
(df, remove_low_variance=True, module='pstats')¶ Fit DataFrame to Explorer.
- Parameters
df (pandas.DataFrame) – DataFrame to explore.
remove_low_variance (bool, optional) – Remove low variance features. Defaults to True.
module (string, optional) – Features extraction module. Defaults to ‘pstats’.
-
histplot
(cols=None, return_fig=False, **kwargs)¶ Plot univariate histograms.
- Parameters
cols (list, optional) – List of columns to include. Defaults to None - which includes all feature columns.
return_fig (bool, optional) – Return matplotlib figure. Defaults to False.
**kwargs – Other keyword arguments for seaborn.histplot.
- Returns
Returns figure if return_fig=True.
- Return type
matplotlib.figure.Figure
-
kbest
(k, score_func=None)¶ Select k highest features according to scoring function.
- Parameters
k (int) – Number of top features to select.
score_func ([callable], optional) – Scoring function. Defaults to None - which uses f_classif.
- Raises
ValueError – Nothing to classify w/o target variable.
- Returns
Unsorted list of k best features.
- Return type
list
-
kdeplot
(cols=None, return_fig=False, **kwargs)¶ Plot univariate kernel density estimations.
- Parameters
cols (list, optional) – List of columns to include. Defaults to None - which includes all feature columns.
return_fig (bool, optional) – Return matplotlib figure. Defaults to False.
**kwargs – Other keyword arguments for seaborn.kdeplot.
- Returns
Returns figure if return_fig=True.
- Return type
matplotlib.figure.Figure
-
pairplot
(cols)¶ Plot pairwise plot of feature columns (and target variable if present).
- Parameters
cols (list) – List of columns to include.
-
plot_feature_importances
(clf=None)¶ Plot feature importances using classifier.
- Parameters
clf (object, optional) – Instantiated classifier. Defaults to None - which uses ExtraTreesClassifier.
-
plot_feature_scores
(clf=None)¶ Plot feature scores using scoring function.
- Parameters
clf (object, optional) – Instantiated classifier. Defaults to None - which uses ExtraTreesClassifier.
-
plot_pca
()¶ Plot scatterplot of the first two principal components.
-
remove_features
(cols)¶ Remove features from feature vector.
- Parameters
cols (list) – List of column names to remove.
-
remove_low_variance
(threshold=0)¶ Remove low variance features.
- Parameters
threshold (int, optional) – Variance threshold. Defaults to 0.
-
stripplot
(cols=None, return_fig=False, **kwargs)¶ Plot scatter plots.
- Parameters
cols (list, optional) – List of columns to include. Defaults to None - which includes all feature columns.
return_fig (bool, optional) – Return matplotlib figure. Defaults to False.
**kwargs – Other keyword arguments for seaborn.stripplot.
- Returns
Returns figure if return_fig=True.
- Return type
matplotlib.figure.Figure
-
violinplot
(cols=None, return_fig=False, **kwargs)¶ Plot violin plots - combination of boxplot and kde.
- Parameters
cols (list, optional) – List of columns to include. Defaults to None - which includes all feature columns.
return_fig (bool, optional) – Return matplotlib figure. Defaults to False.
**kwargs – Other keyword arguments for seaborn.violinplot.
- Returns
Returns figure if return_fig=True.
- Return type
matplotlib.figure.Figure