Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add new data by columns into duckdb out of memory #85

Closed
mytarmail opened this issue Nov 18, 2023 · 2 comments
Closed

add new data by columns into duckdb out of memory #85

mytarmail opened this issue Nov 18, 2023 · 2 comments

Comments

@mytarmail
Copy link

I have incoming data that I want to store on disk in a database or something. The data looks something like this

incoming_data <- function(ncol=5){
  dat <- sample(1:10,100,replace = T) |> matrix(ncol = ncol) |> as.data.frame()
  random_names <- sapply(1:ncol(dat),\(x) paste0(sample(letters,1), sample(1:100,1)))
  colnames(dat) <- random_names
  dat
}
incoming_data()

This incoming_data is just for example.. In reality, one incoming_data set will have several 5k rows and about 50k columns. And the entire final file will be about 200-400 gigabytes

My question is how to add new data as columns to the database without loading the file into RAM

# your way
path <- "D:\\R_scripts\\new\\duckdb\\data\\DB.duckdb"
library(duckdb)
library(duckplyr)
con <- dbConnect(duckdb(), dbdir = path, read_only = FALSE)
#  write one piece of data in DB
dbWriteTable(con, "my_dat", incoming_data())


#### how to make something like this ####
my_dat <- cbind("my_dat", incoming_data())
@krlmlr
Copy link
Member

krlmlr commented Nov 18, 2023

Thanks. This is a very broad question, and not a good fit for this issue tracker. Either way, 50k columns sounds like way too many. Any chance you can the data "longer"?

@krlmlr krlmlr closed this as completed Nov 18, 2023
@mytarmail
Copy link
Author

mytarmail commented Nov 18, 2023

Thanks for your lightning fast response!
Yes I can keep the data "longer".

I understand that my question doesn't really fit the format and I apologize for that, but I would be very grateful for your help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants