| Title: | Turnkey Visualisations for Exploratory Data Analysis |
|---|---|
| Description: | Provides interactive visualisations for exploratory data analysis of high-dimensional datasets. Includes parallel coordinate plots for exploring large datasets with mostly quantitative features, but also stacked one-dimensional visualisations that more effectively show missingness and complex categorical relationships in smaller datasets. |
| Authors: | Sam El-Kamand [aut, cre] (ORCID: <https://orcid.org/0000-0003-2270-8088>), Children's Cancer Institute Australia [cph] |
| Maintainer: | Sam El-Kamand <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.2.0.9000 |
| Built: | 2026-05-25 06:28:50 UTC |
| Source: | https://github.com/CCICB/ggEDA |
An artificially generated dataset describing basic demographics and accessorization choices of baseball fans as part of a a hypothetical market research study from stadium merchandise vendors. None of the data are real; they were produced for illustrative and testing purposes only.
baseballfansbaseballfans
baseballfansA data frame with 19 rows and 10 columns:
Unique integer identifier for each individual.
Age in years at time of observation.
Self‐reported gender (“Male” or “Female”).
Eye color (“Brown”, “Green”, “Blue”), or missing (NA) if not recorded.
Height in centimeters; missing (NA) if not recorded.
Hair color (“Black”, “Blond”, “Red”, “Brown”).
Logical flag (TRUE/FALSE) indicating whether the individual wears glasses.
Logical flag (TRUE/FALSE) indicating whether the individual is wearing a hat.
Type of hat worn, if any (e.g., “baseball cap”, “stetson”, “fedora”, “top hat”); empty when WearingHat == FALSE.
Date of observation in day/month/year format (e.g., 9/05/2023). Stored as character vector
#' @source Synthetic data; no real persons were observed.
This mock dataset was created to demonstrate ggEDA functionality. All entries are fictional.
Takes an input string and 'beautify' by converting underscores to spaces and
beautify(string, autodetect_units = TRUE)beautify(string, autodetect_units = TRUE)
string |
input string |
autodetect_units |
automatically detect units (e.g. mm, kg, etc) and wrap in brackets. |
string
Parse a tibble and ensure it meets standards
column_info_table( data, maxlevels = 6, col_id = NULL, cols_to_plot, tooltip_column_suffix = "_tooltip", ignore_column_regex = "_ignore$", palettes, colours_default, colours_default_logical, verbose )column_info_table( data, maxlevels = 6, col_id = NULL, cols_to_plot, tooltip_column_suffix = "_tooltip", ignore_column_regex = "_ignore$", palettes, colours_default, colours_default_logical, verbose )
data |
data.frame to autoplot (data.frame) |
maxlevels |
for categorical variables, what is the maximum number of distinct values to allow (too many will make it hard to find a palette that suits). (number) |
col_id |
name of column to use as an identifier. If null, artificial IDs will be created based on row-number. |
cols_to_plot |
names of columns in data that should be plotted. By default plots all valid columns (character) |
tooltip_column_suffix |
the suffix added to a column name that indicates column should be used as a tooltip (string) |
ignore_column_regex |
a regex string that, if matches a column name, will cause that column to be excluded from plotting (string). If NULL no regex check will be performed. (default: "_ignore$") |
palettes |
A list of named vectors. List names correspond to data column names (categorical only). Vector names to levels of columns. Vector values are colours, the vector names are used to map values in data to a colour. |
colours_default |
Default colors for categorical variables without a custom palette. |
colours_default_logical |
Colors for binary variables: a vector of three colors representing |
verbose |
Numeric value indicating the verbosity level:
|
tibble with the following columns:
colnames
coltype (categorical/numeric/tooltip/invalid)
ndistinct (number of distinct values)
plottable (should this column be plotted)
tooltip_col (the name of the column to use as the tooltip) or NA if no obvious tooltip column found
Visualize relationships between numeric variables and categorical groupings using parallel coordinate plots.
ggparallel( data, col_id = NULL, col_colour = NULL, highlight = NULL, interactive = TRUE, order_columns_by = c("appearance", "random", "auto"), order_observations_by = c("frequency", "original"), verbose = TRUE, palette_colour = palette.colors(palette = "Set2"), palette_highlight = c("red", "grey90"), convert_binary_numeric_to_factor = TRUE, scaling = c("uniminmax", "none"), return = c("plot", "data"), options = ggparallel_options() )ggparallel( data, col_id = NULL, col_colour = NULL, highlight = NULL, interactive = TRUE, order_columns_by = c("appearance", "random", "auto"), order_observations_by = c("frequency", "original"), verbose = TRUE, palette_colour = palette.colors(palette = "Set2"), palette_highlight = c("red", "grey90"), convert_binary_numeric_to_factor = TRUE, scaling = c("uniminmax", "none"), return = c("plot", "data"), options = ggparallel_options() )
data |
A data frame containing the variables to plot. |
col_id |
The name of the column to use as an identifier. If |
col_colour |
Name of the column to use for coloring lines in the plot. If |
highlight |
A level from |
interactive |
Produce interactive ggiraph visualiastion (flag) |
order_columns_by |
Strategy for ordering columns in the plot. Options include:
|
order_observations_by |
Strategy for ordering lines in the plot. Options include:
Ignored if |
verbose |
Logical; whether to display informative messages during execution. (default: |
palette_colour |
A named vector of colors for categorical levels in |
palette_highlight |
A two-color vector for highlighting ( |
convert_binary_numeric_to_factor |
Logical; whether to convert numeric columns containing only 0, 1, and NA to factors. (default: |
scaling |
Method for scaling numeric variables. Options include:
|
return |
What to return. Options include:
|
options |
A list of additional visualization parameters created by |
A ggplot object or a processed data frame, depending on the return parameter.
ggparallel( data = minibeans, col_colour = "Class", order_columns_by = "auto" ) ggparallel( data = minibeans, col_colour = "Class", highlight = "DERMASON", order_columns_by = "auto" ) # Customise appearance using options argument ggparallel( data = minibeans, col_colour = "Class", order_columns_by = "auto", options = ggparallel_options(show_legend = FALSE) )ggparallel( data = minibeans, col_colour = "Class", order_columns_by = "auto" ) ggparallel( data = minibeans, col_colour = "Class", highlight = "DERMASON", order_columns_by = "auto" ) # Customise appearance using options argument ggparallel( data = minibeans, col_colour = "Class", order_columns_by = "auto", options = ggparallel_options(show_legend = FALSE) )
Configures aesthetic and layout settings for plots generated by ggparallel.
ggparallel_options( show_legend = TRUE, show_legend_titles = FALSE, legend_position = c("bottom", "right", "left", "top"), legend_title_position = c("left", "top", "bottom", "right"), legend_nrow = NULL, legend_ncol = NULL, legend_key_size = 1, beautify_text = TRUE, beautify_values = FALSE, beautify_function = beautify, max_digits_bounds = 1, x_axis_text_angle = 90, x_axis_text_hjust = 0, x_axis_text_vjust = 0.5, fontsize_x_axis_text = 12, show_column_names = TRUE, show_points = FALSE, show_bounds_labels = FALSE, show_bounds_rect = FALSE, expand_x = ggplot2::waiver(), line_alpha = 0.5, line_width = 0.5, line_type = 1, x_axis_gridlines = ggplot2::element_line(colour = "black"), interactive_svg_width = NULL, interactive_svg_height = NULL )ggparallel_options( show_legend = TRUE, show_legend_titles = FALSE, legend_position = c("bottom", "right", "left", "top"), legend_title_position = c("left", "top", "bottom", "right"), legend_nrow = NULL, legend_ncol = NULL, legend_key_size = 1, beautify_text = TRUE, beautify_values = FALSE, beautify_function = beautify, max_digits_bounds = 1, x_axis_text_angle = 90, x_axis_text_hjust = 0, x_axis_text_vjust = 0.5, fontsize_x_axis_text = 12, show_column_names = TRUE, show_points = FALSE, show_bounds_labels = FALSE, show_bounds_rect = FALSE, expand_x = ggplot2::waiver(), line_alpha = 0.5, line_width = 0.5, line_type = 1, x_axis_gridlines = ggplot2::element_line(colour = "black"), interactive_svg_width = NULL, interactive_svg_height = NULL )
show_legend |
Display the legend on the plot (flag). |
show_legend_titles |
Display titles for legends (flag). |
legend_position |
Position of the legend ("right", "left", "bottom", "top"). |
legend_title_position |
Position of the legend title ("top", "bottom", "left", "right"). |
legend_nrow |
Number of rows in the legend (number). |
legend_ncol |
Number of columns in the legend. If set, |
legend_key_size |
Size of the legend key symbols. (number). |
beautify_text |
Beautify y-axis text and legend titlesto more human-readable forms (e.g. converting 'my_title' to 'My Title') (flag). |
beautify_values |
Beautify legend values to more human-readable forms (e.g. converting 'my_value' to 'My Value') (flag) |
beautify_function |
a function that takes a string and returns a nicely formatted string. Use to beautify axis & legend titles when |
max_digits_bounds |
Number of digits to round the axis bounds label text to (number) |
x_axis_text_angle |
Angle of the x axis text describing column names (number) |
x_axis_text_hjust |
Horizontal Justification of the x axis text describing column names (number) |
x_axis_text_vjust |
Vertical Justification of the x axis text describing column names (number) |
fontsize_x_axis_text |
fontsize of the x-axis text describing column names (number) |
show_column_names |
Show column names as x axis text (flag) |
show_points |
Show points (flag) |
show_bounds_labels |
Show bounds (min and max value) of each feature with labels above / below the axes (flag) |
show_bounds_rect |
Show bounds (min and max value) of each feature with a rectangular graphic (flag) |
expand_x |
A vector of range expansion constants used to add some padding around the data to ensure that they are placed some distance away from the axes. Use the convenience function |
line_alpha |
Alpha of line geom (number) |
line_width |
Width of the line geom (number) |
line_type |
Type of line geom (number or string. see |
x_axis_gridlines |
Customise look of x axis gridlines. Must be either a call to |
interactive_svg_width, interactive_svg_height
|
Width and height of the interactive graphic region (in inches). Only used when |
A list of visualization parameters for ggparallel.
ggparallel( data = minibeans, col_colour = "Class", order_columns_by = "auto" ) ggparallel( data = minibeans, col_colour = "Class", highlight = "DERMASON", order_columns_by = "auto" ) # Customise appearance using options argument ggparallel( data = minibeans, col_colour = "Class", order_columns_by = "auto", options = ggparallel_options(show_legend = FALSE) )ggparallel( data = minibeans, col_colour = "Class", order_columns_by = "auto" ) ggparallel( data = minibeans, col_colour = "Class", highlight = "DERMASON", order_columns_by = "auto" ) # Customise appearance using options argument ggparallel( data = minibeans, col_colour = "Class", order_columns_by = "auto", options = ggparallel_options(show_legend = FALSE) )
Visualize all columns in a data frame with ggEDA's vertically aligned plots and automatic plot selection based on variable type. Plots are fully interactive, and custom tooltips can be added.
ggstack( data, col_id = NULL, col_sort = NULL, order_matches_sort = TRUE, maxlevels = 7, verbose = 2, drop_unused_id_levels = FALSE, interactive = TRUE, return = c("plot", "column_info", "data"), palettes = NULL, sort_type = c("frequency", "alphabetical"), desc = TRUE, limit_plots = TRUE, max_plottable_cols = 10, cols_to_plot = NULL, tooltip_column_suffix = "_tooltip", ignore_column_regex = "_ignore$", convert_binary_numeric_to_factor = TRUE, options = ggstack_options(show_legend = !interactive) )ggstack( data, col_id = NULL, col_sort = NULL, order_matches_sort = TRUE, maxlevels = 7, verbose = 2, drop_unused_id_levels = FALSE, interactive = TRUE, return = c("plot", "column_info", "data"), palettes = NULL, sort_type = c("frequency", "alphabetical"), desc = TRUE, limit_plots = TRUE, max_plottable_cols = 10, cols_to_plot = NULL, tooltip_column_suffix = "_tooltip", ignore_column_regex = "_ignore$", convert_binary_numeric_to_factor = TRUE, options = ggstack_options(show_legend = !interactive) )
data |
data.frame to autoplot (data.frame) |
col_id |
name of column to use as an identifier. If null, artificial IDs will be created based on row-number. |
col_sort |
name of columns to sort on. To do a hierarchical sort, supply a vector of column names in the order they should be sorted (character). |
order_matches_sort |
should the column plots be stacked top-to-bottom in the order they appear in |
maxlevels |
for categorical variables, what is the maximum number of distinct values to allow (too many will make it hard to find a palette that suits). (number) |
verbose |
Numeric value indicating the verbosity level:
|
drop_unused_id_levels |
if col_id is a factor with unused levels, should these be dropped or included in visualisation |
interactive |
produce interactive ggiraph visualiastion (flag) |
return |
a string describing what this function should return. Options include:
|
palettes |
A list of named vectors. List names correspond to data column names (categorical only). Vector names to levels of columns. Vector values are colours, the vector names are used to map values in data to a colour. |
sort_type |
controls how categorical variables are sorted.
Numerical variables are always sorted in numerical order irrespective of the value given here.
Options are |
desc |
sort in descending order (flag) |
limit_plots |
throw an error when there are > |
max_plottable_cols |
maximum number of columns that can be plotted (default: 10) (number) |
cols_to_plot |
names of columns in data that should be plotted. By default plots all valid columns (character) |
tooltip_column_suffix |
the suffix added to a column name that indicates column should be used as a tooltip (string) |
ignore_column_regex |
a regex string that, if matches a column name, will cause that column to be excluded from plotting (string). If NULL no regex check will be performed. (default: "_ignore$") |
convert_binary_numeric_to_factor |
If a numeric column conatins only values 0, 1, & NA, then automatically convert to a factor. |
options |
a list of additional visual parameters created by calling |
ggiraph interactive visualisation
# Create Basic Plot ggstack(baseballfans, col_id = "ID", col_sort = "Glasses") # Configure plot ggstack_options() ggstack( lazy_birdwatcher, col_sort = "Magpies", palettes = list( Birdwatcher = c(Robert = "#E69F00", Catherine = "#999999"), Day = c(Weekday = "#999999", Weekend = "#009E73") ), options = ggstack_options( show_legend = TRUE, fontsize_barplot_y_numbers = 12, legend_text_size = 16, legend_key_size = 1, legend_nrow = 1, ) )# Create Basic Plot ggstack(baseballfans, col_id = "ID", col_sort = "Glasses") # Configure plot ggstack_options() ggstack( lazy_birdwatcher, col_sort = "Magpies", palettes = list( Birdwatcher = c(Robert = "#E69F00", Catherine = "#999999"), Day = c(Weekday = "#999999", Weekend = "#009E73") ), options = ggstack_options( show_legend = TRUE, fontsize_barplot_y_numbers = 12, legend_text_size = 16, legend_key_size = 1, legend_nrow = 1, ) )
Configures aesthetic and layout settings for plots generated by ggstack.
ggstack_options( colours_default = c("#66C2A5", "#FC8D62", "#8DA0CB", "#E78AC3", "#A6D854", "#FFD92F", "#E5C494"), colours_default_logical = c(`TRUE` = "#648fff", `FALSE` = "#dc267f"), colours_missing = "grey90", show_legend_titles = FALSE, legend_title_position = c("top", "bottom", "left", "right"), legend_nrow = 4, legend_ncol = NULL, legend_title_size = NULL, legend_text_size = NULL, legend_key_size = 0.3, legend_orientation_heatmap = c("horizontal", "vertical"), show_legend = TRUE, legend_position = c("right", "left", "bottom", "top"), na_marker = "!", na_marker_size = 8, na_marker_colour = "black", show_na_marker_categorical = FALSE, show_na_marker_heatmap = FALSE, colours_heatmap_low = "purple", colours_heatmap_high = "seagreen", transform_heatmap = c("identity", "log10", "log2"), fontsize_values_heatmap = 3, show_values_heatmap = FALSE, colours_values_heatmap = "white", vertical_spacing = 0, numeric_plot_type = c("bar", "heatmap"), y_axis_position = c("left", "right"), width = 0.9, relative_height_numeric = 4, cli_header = "Running ggstack", inter_plot_spacing = 10, expand_x = ggplot2::waiver(), interactive_svg_width = NULL, interactive_svg_height = NULL, fontsize_barplot_y_numbers = 8, max_digits_barplot_y_numbers = 3, fontsize_y_title = 12, fontface_y_title = c("plain", "italic", "bold", "bold.italic"), beautify_text = TRUE, beautify_values = FALSE, beautify_function = beautify, margin_y_title = NULL, margin_y_numbers = NULL )ggstack_options( colours_default = c("#66C2A5", "#FC8D62", "#8DA0CB", "#E78AC3", "#A6D854", "#FFD92F", "#E5C494"), colours_default_logical = c(`TRUE` = "#648fff", `FALSE` = "#dc267f"), colours_missing = "grey90", show_legend_titles = FALSE, legend_title_position = c("top", "bottom", "left", "right"), legend_nrow = 4, legend_ncol = NULL, legend_title_size = NULL, legend_text_size = NULL, legend_key_size = 0.3, legend_orientation_heatmap = c("horizontal", "vertical"), show_legend = TRUE, legend_position = c("right", "left", "bottom", "top"), na_marker = "!", na_marker_size = 8, na_marker_colour = "black", show_na_marker_categorical = FALSE, show_na_marker_heatmap = FALSE, colours_heatmap_low = "purple", colours_heatmap_high = "seagreen", transform_heatmap = c("identity", "log10", "log2"), fontsize_values_heatmap = 3, show_values_heatmap = FALSE, colours_values_heatmap = "white", vertical_spacing = 0, numeric_plot_type = c("bar", "heatmap"), y_axis_position = c("left", "right"), width = 0.9, relative_height_numeric = 4, cli_header = "Running ggstack", inter_plot_spacing = 10, expand_x = ggplot2::waiver(), interactive_svg_width = NULL, interactive_svg_height = NULL, fontsize_barplot_y_numbers = 8, max_digits_barplot_y_numbers = 3, fontsize_y_title = 12, fontface_y_title = c("plain", "italic", "bold", "bold.italic"), beautify_text = TRUE, beautify_values = FALSE, beautify_function = beautify, margin_y_title = NULL, margin_y_numbers = NULL )
colours_default |
Default colors for categorical variables without a custom palette. |
colours_default_logical |
Colors for binary variables: a vector of three colors representing |
colours_missing |
Color for missing ( |
show_legend_titles |
Display titles for legends (flag). |
legend_title_position |
Position of the legend title ("top", "bottom", "left", "right"). |
legend_nrow |
Number of rows in the legend (number). |
legend_ncol |
Number of columns in the legend. If set, |
legend_title_size |
Size of the legend title text (number). |
legend_text_size |
Size of the text within the legend (number). |
legend_key_size |
Size of the legend key symbols (number). |
legend_orientation_heatmap |
should legend orientation be "horizontal" or "vertical". |
show_legend |
Display the legend on the plot (flag). |
legend_position |
Position of the legend ("right", "left", "bottom", "top"). |
na_marker |
Text used to mark |
na_marker_size |
Size of the text marker for |
na_marker_colour |
Color of the |
show_na_marker_categorical |
Show a marker for |
show_na_marker_heatmap |
Show a marker for |
colours_heatmap_low |
Color for the lowest value in heatmaps (string). |
colours_heatmap_high |
Color for the highest value in heatmaps (string). |
transform_heatmap |
Transformation to apply before visualizing heatmap values ("identity", "log10", "log2"). |
fontsize_values_heatmap |
Font size for heatmap values (number). |
show_values_heatmap |
Display numerical values on heatmap tiles (flag). |
colours_values_heatmap |
Color for heatmap values (string). |
vertical_spacing |
Space between each data row in points (number). |
numeric_plot_type |
Type of visualization for numeric data: "bar" or "heatmap". |
y_axis_position |
Position of the y-axis ("left" or "right"). |
width |
controls how much space is present between bars and tiles within each plot. Can be 0-1 where values of 1 makes bars/tiles take up 100% of available space (no gaps between bars). |
relative_height_numeric |
how many times taller should numeric plots be relative to categorical tile plots. Only taken into account if numeric_plot_type == "bar" (number) |
cli_header |
Text used for h1 header. Included so it can be tweaked by packages that use ggstack, so they can customise how the info messages appear. |
inter_plot_spacing |
How vertical space to add between plots. Measured in pts (numeric) |
expand_x |
A vector of range expansion constants used to add some padding around the data to ensure that they are placed some distance away from the axes. Use the convenience function |
interactive_svg_width, interactive_svg_height
|
width and height of the interactive graphic region (in inches). Only used when |
fontsize_barplot_y_numbers |
fontsize of the text describing numeric barplot max & min values (number). |
max_digits_barplot_y_numbers |
Number of digits to round the numeric barplot max and min values to (number). |
fontsize_y_title |
Font size of the y axis titles (a.k.a the data.frame column names) (number). |
fontface_y_title |
Font face of the y axis titles (a.k.a the data.frame column names). One of "plain", "italic", "bold", "bold.italic". |
beautify_text |
Beautify y-axis text and legend titles to more human-readable forms (e.g. converting 'my_title' to 'My Title') (flag). |
beautify_values |
Beautify legend values to more human-readable forms (e.g. converting 'my_value' to 'My Value') (flag) |
beautify_function |
a function that takes a string and returns a nicely formatted string. Use to beautify axis & legend titles when |
margin_y_title |
Margin of y axis titles for discrete properties (or numeric properties when numeric_plot_type = "heatmap"). Expects NULL, or a call to |
margin_y_numbers |
Margin of y axis titles and numbers and title of numeric properties (when numeric_plot_type = "bar"). Expects NULL, or a call to |
A list of visualization parameters for ggstack.
# Create Basic Plot ggstack(baseballfans, col_id = "ID", col_sort = "Glasses") # Configure plot ggstack_options() ggstack( lazy_birdwatcher, col_sort = "Magpies", palettes = list( Birdwatcher = c(Robert = "#E69F00", Catherine = "#999999"), Day = c(Weekday = "#999999", Weekend = "#009E73") ), options = ggstack_options( show_legend = TRUE, fontsize_barplot_y_numbers = 12, legend_text_size = 16, legend_key_size = 1, legend_nrow = 1, ) )# Create Basic Plot ggstack(baseballfans, col_id = "ID", col_sort = "Glasses") # Configure plot ggstack_options() ggstack( lazy_birdwatcher, col_sort = "Magpies", palettes = list( Birdwatcher = c(Robert = "#E69F00", Catherine = "#999999"), Day = c(Weekday = "#999999", Weekend = "#009E73") ), options = ggstack_options( show_legend = TRUE, fontsize_barplot_y_numbers = 12, legend_text_size = 16, legend_key_size = 1, legend_nrow = 1, ) )
A simulated dataset describing the number of magpies observed by two birdwatchers.
lazy_birdwatcherlazy_birdwatcher
lazy_birdwatcherA data frame with 45 rows and 3 columns:
Number of magpies observed
Was the day of observation a weekday or a weekend?
Name of the birdwatcher
A subsample of the Koklu & Ozkan (2020) dry beans dataset produced by imaging a total of 13,611 grains from 7 varieties of dry beans. The original dataset contains 13,611 observations, but here we include a random subsample of 1000.
minibeansminibeans
minibeansA data frame with 1000 rows and 17 columns:
The area of a bean zone and the number of pixels within its boundaries.
Bean circumference is defined as the length of its border.
The distance between the ends of the longest line that can be drawn from a bean.
The longest line that can be drawn from the bean while standing perpendicular to the main axis.
Defines the relationship between L and l.
Eccentricity of the ellipse having the same moments as the region.
Number of pixels in the smallest convex polygon that can contain the area of a bean seed.
The diameter of a circle having the same area as a bean seed area.
The ratio of the pixels in the bounding box to the bean area.
Also known as convexity. The ratio of the pixels in the convex shell to those found in beans.
Calculated with the following formula: (4piA)/(P^2).
Measures the roundness of an object: Ed/L.
Shape factor 1.
Shape factor 2.
Shape factor 3.
Shape factor 4.
Seker, Barbunya, Bombay, Cali, Dermosan, Horoz, and Sira.
Koklu, M, and IA Ozkan. 2020. Multiclass Classification of Dry Beans Using Computer Vision and Machine Learning Techniques. Computers and Electronics in Agriculture, 174: 105507. doi: 10.1016/j.compag.2020.105507, https://doi.org/10.24432/C50S4B
Computes mutual information between each feature in the features data frame and the target vector.
The features are discretized using the "equalfreq" method from infotheo::discretize().
mutinfo(features, target, return_colnames = FALSE)mutinfo(features, target, return_colnames = FALSE)
features |
A data frame of features. These will be discretized using the "equalfreq" method
(see |
target |
A vector (character or factor) representing the variable to compute mutual information with. |
return_colnames |
Logical; if |
If return_colnames = FALSE, a named numeric vector of mutual information scores is returned (one for each column in features), sorted in descending order.
The names of the vector correspond to the column names of features.
If return_colnames = TRUE, only the ordered column names of features are returned.
data(iris) # Compute mutual information scores mutinfo(iris[1:4], iris[[5]]) # Get column names ordered by mutual information with target column (most mutual info first) mutinfo(iris[1:4], iris[[5]], return_colnames = TRUE)data(iris) # Compute mutual information scores mutinfo(iris[1:4], iris[[5]]) # Get column names ordered by mutual information with target column (most mutual info first) mutinfo(iris[1:4], iris[[5]], return_colnames = TRUE)
Find sensible values to add 2 breaks at for a ggplot2 axis
sensible_2_breaks(vector)sensible_2_breaks(vector)
vector |
vector fed into ggplot axis you want to define sensible breaks for |
vector of length 2. first element descripts upper break position, lower describes lower break