Skip to content

ambuvjyn/fixr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

version license version language Commands

fixr

Fixing Data Made Easy for Statistical Analysis

Developed by Ambu Vijayan and Dr. J. Sreekumar

fixr is an R package that provides an easy way to do basic data manipulations for statistical analysis. The package contains various functions that can help you check and fix issues with your data, such as missing values, outliers, and consistency problems.

Installation

You can install fixr from GitHub using the following command:

devtools::install_github("ambuvjyn/fixr")

📓 fixr : Fixing Data Made Easy for Statistical Analysis

🔢 Version 0.1.0

#️⃣ Commands :

Usage

Here is a list of functions provided by fixr:

📂 Data consistency checks :

  • ▶️ check_data_consistency: Checks if the data is consistent across variables and time periods.

📂 Data distribution checks :

  • ▶️ check_data_distribution: Checks if the data is distributed normally or if it has any skewness.

📂 Data quality checks :

  • ▶️ check_data_quality: Checks the overall quality of the data based on various criteria.

📂 Data reliability checks :

  • ▶️ check_data_reliability: Checks if the data is reliable and if there are any sources of error.

📂 Data structure checks :

  • ▶️ check_data_structure: Checks if the data is in the correct format and structure.

📂 Data value checks :

  • ▶️ check_for_negative_values: Checks if there are any negative values in the data.

📂 Missing data checks :

  • ▶️ check_missing_values: Checks if there are any missing values in the data.

📂 Outlier checks :

  • ▶️ check_outliers: Checks if there are any outliers in the data.

📂 Sample size checks :

  • ▶️ check_sample_size: Checks if the sample size is adequate for the analysis.

📂 Package utilities :

  • ▶️ find.packages: Finds packages that are not installed but are required for a particular function.

  • ▶️ find.packages_path: Finds the path of a package.

📂 Data Duplication Check :

  • ▶️ find_duplicate_cols: Finds duplicate columns in the data.

  • ▶️ find_duplicate_rows: Finds duplicate rows in the data.

📂 Data cleaning and manipulation :

  • ▶️ fix.data: Fixes any data-related issues.

  • ▶️ fix_blanks_with_na: Fixes blank spaces in the data.

📂 Fix Name Issues :

  • ▶️ fix_column_names: Fixes column names.

  • ▶️ fix_data_names: Fixes data names.

  • ▶️ fix_row_names: Fixes row names.

📂 Fix Spaces :

  • ▶️ fix_row_spaces: Fixes spaces in row names.

  • ▶️ fix_col_spaces: Fixes spaces in column names.

📂 Fix Missing Values :

  • ▶️ fix_missing_alphanumeric_values: Fixes missing alphanumeric values.

  • ▶️ fix_missing_numeric_values: Fixes missing numeric values.

📂 Fix Special Characters :

  • ▶️ fix_special_characters_in_data: Fixes special characters in the data.

  • ▶️ fix_special_characters_in_names: Fixes special characters in the column and row names.

📂 Fix Data Duplication :

  • ▶️ fix_duplicate_cols: Fixes duplicate columns in the data.

  • ▶️ fix_duplicate_rows: Fixes duplicate rows in the data.

📂 Fix Outliers :

  • ▶️ fix_outliers: Fixes outliers in the data.

Commands are explained below

check_data_consistency: This function checks the consistency of the data, ensuring that the data is uniform and follows a certain pattern or structure. This function helps identify inconsistencies in the data, which can cause problems when conducting statistical analysis.

check_data_distribution: This function checks the distribution of the data, ensuring that the data follows a certain distribution, such as a normal distribution. This function is useful for identifying any anomalies or outliers in the data that might affect the results of statistical analysis.

check_data_quality: This function checks the quality of the data, ensuring that the data is accurate, complete, and relevant. This function helps identify any errors or discrepancies in the data that can affect the validity of statistical analysis.

check_data_reliability: This function checks the reliability of the data, ensuring that the data is consistent and produces the same results when repeated. This function helps identify any sources of error or inconsistency in the data that can affect the accuracy of statistical analysis.

check_data_structure: This function checks the structure of the data, ensuring that the data is organized and formatted correctly. This function helps identify any formatting or structural issues in the data that can affect the accuracy of statistical analysis.

check_for_negative_values: This function checks for negative values in the data, ensuring that there are no negative values where they should not be. This function helps identify any errors or discrepancies in the data that can affect the validity of statistical analysis.

check_missing_values: This function checks for missing values in the data, ensuring that there are no missing values where they should not be. This function helps identify any errors or discrepancies in the data that can affect the validity of statistical analysis.

check_outliers: This function checks for outliers in the data, ensuring that there are no extreme values that skew the results of statistical analysis. This function helps identify any anomalies or outliers in the data that might affect the results of statistical analysis.

check_sample_size: This function checks the sample size of the data, ensuring that the sample size is sufficient for conducting statistical analysis. This function helps identify any issues related to the sample size that can affect the validity of statistical analysis.

find.packages: This function finds packages in the R environment, ensuring that the required packages are available for conducting statistical analysis. This function helps identify any missing packages that might be required for conducting statistical analysis.

find.packages_path: This function finds the path of packages in the R environment, ensuring that the required packages are available for conducting statistical analysis. This function helps identify any missing packages that might be required for conducting statistical analysis.

find_duplicate_cols: This function finds duplicate columns in the data, ensuring that the data is organized and formatted correctly. This function helps identify any formatting or structural issues in the data that can affect the accuracy of statistical analysis.

find_duplicate_rows: This function finds duplicate rows in the data, ensuring that the data is organized and formatted correctly. This function helps identify any formatting or structural issues in the data that can affect the accuracy of statistical analysis.

fix.data: This function fixes any data-related issues, ensuring that the data is consistent, accurate, complete, and relevant. This function helps resolve any errors or discrepancies in the data that can affect the validity of statistical analysis.

fix_blanks_with_na: This function fixes blank spaces in the data by replacing them with NA values. This function helps resolve any formatting or structural issues in the data that can affect the accuracy of statistical analysis.

fix_col_spaces: This function fixes spaces in column names by replacing them with underscores. This function helps resolve any formatting or structural issues in the data that can affect the accuracy of statistical analysis.

fix_column_names: This function fixes column names by ensuring that they are properly formatted and follow a certain structure. This function helps resolve any formatting or structural issues in the data that can affect the accuracy of statistical analysis.

fix_data_names: This function fixes the names of data in a dataset. Often, dataset names are poorly formatted, contain special characters, or are too long to be easily interpreted. This function renames the data in a dataset to make it more readable and consistent.

fix_duplicate_cols: This function identifies duplicate columns in a dataset and provides an option to remove or rename the duplicate columns. Duplicate columns can occur due to human error, data entry issues, or other factors, and can negatively impact data analysis.

fix_duplicate_rows: This function identifies duplicate rows in a dataset and provides an option to remove or rename the duplicate rows. Duplicate rows can occur due to human error, data entry issues, or other factors, and can negatively impact data analysis.

fix_missing_alphanumeric_values: This function replaces missing alphanumeric values in a dataset with NA values. This is useful for data cleaning and analysis, as missing values can impact the results of data analysis.

fix_missing_numeric_values: This function replaces missing numeric values in a dataset with NA values. This is useful for data cleaning and analysis, as missing values can impact the results of data analysis.

fix_outliers: This function identifies outliers in a dataset and provides an option to remove or replace them. Outliers can occur due to data entry errors, measurement errors, or other factors, and can negatively impact data analysis.

fix_row_names: This function fixes the names of rows in a dataset. Often, row names are poorly formatted, contain special characters, or are too long to be easily interpreted. This function renames the rows in a dataset to make it more readable and consistent.

fix_row_spaces: This function removes spaces in row names in a dataset. Spaces in row names can cause issues in data analysis and can be difficult to work with.

fix_special_characters_in_data: This function removes special characters in the data of a dataset. Special characters can be difficult to work with and can cause issues in data analysis.

fix_special_characters_in_names: This function removes special characters in the names of columns and rows in a dataset. Special characters in column and row names can be difficult to work with and can cause issues in data analysis.

Links : 🔗 fixr version 0.1.0 : fixr: Fixing Data Made Easy for Statistical Analysis

Authors :

Ambu Vijayan Young Professional, ICAR - Central Tuber Crops Research Institute

Dr. J. Sreekumar Principal Scientist, ICAR - Central Tuber Crops Research Institute

LinkedIn LinkedIn

Maintainer : Ambu Vijayan