rcausim
is an R package designed to generate
causally-simulated data to serve as ground truth for evaluating methods
in causal discovery and effect estimation. This is particularly useful
for researchers in fields such as artificial intelligence, statistics,
biology, medicine, epidemiology, economics, and social sciences, who are
developing a general or a domain-specific methods to discover causal
structures and estimate causal effects.
Define Functions and Edges: Set up functions based on specified edges and conversely, set up edges based on functions.
Data Simulation: Generate data according to predefined functions and network structures, adhering to principles of structural causal modeling.
You can install the development version of rcausim
from
GitHub with:
# install.packages("devtools")
::install_github("herdiantrisufriyana/rcausim") devtools
Start by defining the causal structure as a data frame of edges:
library(rcausim)
# Load predefined edge data
data(edges)
print(edges)
Assist in setting up functions based on these edges:
# Generate function setups from edge definitions
<- function_from_edge(edges)
functions print(functions)
Define specific functions:
# Define a function for vertex B
<- function(n){ rnorm(n, mean = 90, sd = 5) }
function_B <- define(functions, 'B', function_B)
functions print(functions)
You can also start by defining functions directly:
# Define a function for vertex B
<- function(n){ rnorm(n, mean = 90, sd = 5) }
function_B
# Define a function for vertex A
<- function(B){ ifelse(B>=95, 1, 0) }
function_A
# Combine functions in a list
<- list(A = function_A, B = function_B)
functions <- function_from_user(functions) functions
Ensure the causal structure is a directed acyclic graph (DAG):
library(igraph)
# Set up edges based on functions
<- edge_from_function(functions)
edges
# Check if the resulting edges form a DAG
<- graph_from_data_frame(edges, directed = TRUE)
g is_dag(g)
Generate simulated data based on the predefined functions:
# Assume completed functions setup
data(functions)
# Generate simulated data
set.seed(1)
<- data_from_function(functions, n = 100)
simulated_data print(simulated_data)
Explore detailed examples and methodologies in the following vignettes:
Quick
Start: Get started quickly with the rcausim
package. This vignette offers a practical introduction to the essential
features, providing a quick glimpse into how you can effectively use the
package to generate causally-simulated data.
Causal
Simulation Exemplar: A guide through basic causal
simulation scenarios demonstrating how to use rcausim
to
set up and simulate data.
Reference
Manual: Comprehensive documentation of all functions and
features available in rcausim
. Ideal for detailed reference
and advanced use cases.
rcausim
is licensed under the GNU General Public License
v3.0 (GPL-3), which ensures that all derivatives of the software are
free to use under the same terms. See the LICENSE file for more
details.
If you use rcausim
in your research, please consider
citing it:
@misc{rcausim2024,
author = {Herdiantri Sufriyana and Emily Chia-Yu Su},
title = {rcausim: An R package to generate causally-simulated data},
year = {2024},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\\url{https://github.com/herdiantrisufriyana/rcausim}}
}
For questions or support, please contact herdi[at]tmu.edu.tw.