eda_report.univariate#

class eda_report.univariate.Variable(data: Iterable, *, name: str = None)[source]#

Obtain summary statistics and properties such as data type, missing value info & cardinality from one-dimensional datasets.

Parameters:
  • data (Iterable) – The data to analyze.

  • name (str, optional) – The name to assign the variable. Defaults to None.

Examples

>>> from eda_report.univariate import Variable
>>> Variable(range(1, 51), name="1 to 50")

Name: 1 to 50
Type: numeric
Non-null Observations: 50
Unique Values: 50 -> [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, [...]
Missing Values: None

                  Summary Statistics
                  ------------------
        Average:                      25.5000
        Standard Deviation:           14.5774
        Minimum:                       1.0000
        Lower Quartile:               13.2500
        Median:                       25.5000
        Upper Quartile:               37.7500
        Maximum:                      50.0000
        Skewness:                      0.0000
        Kurtosis:                     -1.2000

                  Tests for Normality
                  -------------------
                               p-value Conclusion at α = 0.05
D'Agostino's K-squared test  0.0015981  Unlikely to be normal
Kolmogorov-Smirnov test      0.0000000  Unlikely to be normal
Shapiro-Wilk test            0.0580895        Possibly normal
>>> Variable(["mango", "apple", "pear", "mango", "pear", "mango"], name="fruits")

Name: fruits
Type: categorical
Non-null Observations: 6
Unique Values: 3 -> ['apple', 'mango', 'pear']
Missing Values: None
Mode (Most frequent): mango
Maximum frequency: 3

                Most Common Items
                -----------------
                   mango: 3 (50.00%)
                    pear: 2 (33.33%)
                   apple: 1 (16.67%)
>>> import pandas as pd
>>> dt = pd.date_range("2022-03-08", periods=20, freq="D")
>>> Variable(dt, name="dttm")

Name: dttm
Type: datetime
Non-null Observations: 20
Unique Values: 20 -> [Timestamp('2022-03-08 00:00:00'), [...]
Missing Values: None

                  Summary Statistics
                  ------------------
        Average:              2022-03-17 12:00:00
        Minimum:              2022-03-08 00:00:00
        Lower Quartile:       2022-03-12 18:00:00
        Median:               2022-03-17 12:00:00
        Upper Quartile:       2022-03-22 06:00:00
        Maximum:              2022-03-27 00:00:00
missing#

The number of missing values in the form number (% of total count) e.g “4 (16.67%)”.

Type:

str

name#

The variable’s name. If no name is specified, the name will be set the value of the name attribute of the input data, or None.

Type:

str

num_unique#

The number of unique values present in the variable.

Type:

int

rename(name: str) None[source]#

Update the variable’s name.

Parameters:

name (str) – New name.

summary_stats#

Descriptive statistics

Type:

dict

unique_values#

The unique values present in the variable.

Type:

list

var_type#

The type of variable — one of “boolean”, “categorical”, “datetime”, “numeric” or “numeric (<=10 levels)”.

Type:

str