eda_report.univariate#
- class eda_report.univariate.Variable(data: Iterable, *, name: str = None)[source]#
Obtain summary statistics and properties such as data type, missing value info & cardinality from one-dimensional datasets.
- Parameters:
data (Iterable) – The data to analyze.
name (str, optional) – The name to assign the variable. Defaults to None.
Examples
>>> from eda_report.univariate import Variable >>> Variable(range(1, 51), name="1 to 50") Name: 1 to 50 Type: numeric Non-null Observations: 50 Unique Values: 50 -> [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, [...] Missing Values: None Summary Statistics ------------------ Average: 25.5000 Standard Deviation: 14.5774 Minimum: 1.0000 Lower Quartile: 13.2500 Median: 25.5000 Upper Quartile: 37.7500 Maximum: 50.0000 Skewness: 0.0000 Kurtosis: -1.2000 Tests for Normality ------------------- p-value Conclusion at α = 0.05 D'Agostino's K-squared test 0.0015981 Unlikely to be normal Kolmogorov-Smirnov test 0.0000000 Unlikely to be normal Shapiro-Wilk test 0.0580895 Possibly normal
>>> Variable(["mango", "apple", "pear", "mango", "pear", "mango"], name="fruits") Name: fruits Type: categorical Non-null Observations: 6 Unique Values: 3 -> ['apple', 'mango', 'pear'] Missing Values: None Mode (Most frequent): mango Maximum frequency: 3 Most Common Items ----------------- mango: 3 (50.00%) pear: 2 (33.33%) apple: 1 (16.67%)
>>> import pandas as pd >>> dt = pd.date_range("2022-03-08", periods=20, freq="D") >>> Variable(dt, name="dttm") Name: dttm Type: datetime Non-null Observations: 20 Unique Values: 20 -> [Timestamp('2022-03-08 00:00:00'), [...] Missing Values: None Summary Statistics ------------------ Average: 2022-03-17 12:00:00 Minimum: 2022-03-08 00:00:00 Lower Quartile: 2022-03-12 18:00:00 Median: 2022-03-17 12:00:00 Upper Quartile: 2022-03-22 06:00:00 Maximum: 2022-03-27 00:00:00
- missing#
The number of missing values in the form
number (% of total count)
e.g “4 (16.67%)”.- Type:
- name#
The variable’s name. If no name is specified, the name will be set the value of the
name
attribute of the input data, orNone
.- Type: