eda_report#

eda_report.get_word_report(data: Iterable, *, title: str = 'Exploratory Data Analysis Report', graph_color: str = 'cyan', groupby_variable: str | int = None, output_filename: str = 'eda-report.docx', table_style: str = 'Table Grid') ReportDocument[source]#

Analyze data, and generate a report document in Word (.docx) format.

Parameters:
  • data (Iterable) – The data to analyze.

  • title (str, optional) – The title to assign the report. Defaults to “Exploratory Data Analysis Report”.

  • graph_color (str, optional) – The color to apply to the graphs. Defaults to “cyan”.

  • groupby_variable (Union[str, int], optional) – The label/index for the column to use to group values. Defaults to None.

  • output_filename (str, optional) – The name/path to save the report document. Defaults to “eda-report.docx”.

  • table_style (str, optional) – The style to apply to the tables created. Defaults to “Table Grid”.

Returns:

Document object with analysis results.

Return type:

ReportDocument

Example

>>> import eda_report
>>> eda_report.get_word_report(iris_data)
Analyze variables:  100%|███████████████████████████████████| 5/5
Plot variables:     100%|███████████████████████████████████| 5/5
Bivariate analysis: 100%|███████████████████████████████████| 6/6 pairs.
[INFO 16:14:53.648] Done. Results saved as 'eda-report.docx'
<eda_report.document.ReportDocument object at 0x7f196753bd60>
eda_report.summarize(data: Iterable) Variable | Dataset[source]#

Get summary statistics for the supplied data.

Parameters:

data (Iterable) – The data to analyze.

Returns:

Analysis results.

Return type:

Union[Variable, Dataset]

Example

>>> eda_report.summarize(iris_data)

                  Summary Statistics for Numeric features (4)
                  -------------------------------------------
                count     avg  stddev  min  25%   50%  75%  max  skewness  kurtosis
  sepal_length    150  5.8433  0.8281  4.3  5.1  5.80  6.4  7.9    0.3149   -0.5521
  sepal_width     150  3.0573  0.4359  2.0  2.8  3.00  3.3  4.4    0.3190    0.2282
  petal_length    150  3.7580  1.7653  1.0  1.6  4.35  5.1  6.9   -0.2749   -1.4021
  petal_width     150  1.1993  0.7622  0.1  0.3  1.30  1.8  2.5   -0.1030   -1.3406

                Summary Statistics for Categorical features (1)
                -----------------------------------------------
                    count unique     top freq relative freq
            species   150      3  setosa   50        33.33%


                        Pearson's Correlation (Top 20)
                        ------------------------------
      petal_length & petal_width -> very strong positive correlation (0.96)
     sepal_length & petal_length -> very strong positive correlation (0.87)
      sepal_length & petal_width -> very strong positive correlation (0.82)
      sepal_width & petal_length -> moderate negative correlation (-0.43)
       sepal_width & petal_width -> weak negative correlation (-0.37)
      sepal_length & sepal_width -> very weak negative correlation (-0.12)