eda_report.bivariate#
- class eda_report.bivariate.Dataset(data: Iterable)[source]#
Analyze two-dimensional datasets to obtain descriptive statistics and correlation information.
Input data is stored as a
pandas.DataFrame
in order to leverage pandas’ built-in statistical methods.- Parameters:
data (Iterable) – The data to analyze.
Example
>>> Dataset(iris_data) Summary Statistics for Numeric features (4) ------------------------------------------- count avg stddev min 25% 50% 75% max skewness kurtosis sepal_length 150 5.8433 0.8281 4.3 5.1 5.80 6.4 7.9 0.3149 -0.5521 sepal_width 150 3.0573 0.4359 2.0 2.8 3.00 3.3 4.4 0.3190 0.2282 petal_length 150 3.7580 1.7653 1.0 1.6 4.35 5.1 6.9 -0.2749 -1.4021 petal_width 150 1.1993 0.7622 0.1 0.3 1.30 1.8 2.5 -0.1030 -1.3406 Summary Statistics for Categorical features (1) ----------------------------------------------- count unique top freq relative freq species 150 3 setosa 50 33.33% Pearson's Correlation (Top 20) ------------------------------ petal_length & petal_width -> very strong positive correlation (0.96) sepal_length & petal_length -> very strong positive correlation (0.87) sepal_length & petal_width -> very strong positive correlation (0.82) sepal_width & petal_length -> moderate negative correlation (-0.43) sepal_width & petal_width -> weak negative correlation (-0.37) sepal_length & sepal_width -> very weak negative correlation (-0.12)