Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create embulk config subcommand #117

Merged
merged 17 commits into from
Oct 29, 2015
Merged

Conversation

takkanm
Copy link
Contributor

@takkanm takkanm commented Aug 12, 2015

ref #80

I implemented connector:init that creates a configuration file of 'connector:guess' from options similar to 'import:prepare', 'import:upload'.

Now, supoprt format is S3 and mysql.

$ td connector:init "s3://access_key:secret_key@/my_bucket/path/to/*.csv" -o seed.yml
initialize configuration:

---
in:
  type: s3
  access_key_id: access_key
  secret_access_key: secret_key
  bucket: my_bucket
  path_prefix: path/to/
out:
  mode: append

Created seed.yml file.
If you need to change, please modify seed.yml.
Use 'td connector:guess seed.yml' to create bulk load configuration.

$ td connector:init mytable --format mysql --db-url jdbc:mysql://localhost/mydb --db-user myuser --db-password mypass
initialize configuration:

---
in:
  type: mysql
  host: localhost
  port: 3306
  database: mydb
  user: myuser
  password: mypass
  table: mytable
  select: "*"
out:
  mode: append

Created td-bulkload.yml file.
If you need to change, please modify td-bulkload.yml.
Use 'td connector:guess td-bulkload.yml' to create bulk load configuration.

@takkanm takkanm force-pushed the create_embulk_config_subcommand branch from a954a6e to 864d705 Compare August 19, 2015 08:18
@takkanm
Copy link
Contributor Author

takkanm commented Aug 19, 2015

@nahi @muga pleaze, review it.

@frsyuki
Copy link
Member

frsyuki commented Aug 19, 2015

I think subcommand should not be in connector: group because it's a migration tool for legacy a tool where new users don't have to be aware of it.
Why don't you add it to import: group? For example (I don't say this is the best):

$ td import:config logs/*.csv --format csv --columns time,uid,price,count --time-column time -o parts/
Generating seed.yml...
    Using local file input
    Using CSV parser plugin
    Using Treasure Data output
Done. Please use embulk to load the files.
Next steps:

    # install embulk
    $ embulk gem install embulk-output-td
    $ embulk guess seed.yml -o config.yml
    $ embulk preview config.yml
    $ embulk run config.yml

$ td import:config "s3://<s3_access_key>:<s3_secret_key>@/my_bucket/path/to/*.csv" --format csv --column-header --time-column date_time -o parts/
Generating seed.yml...
    Using S3 input
    Using CSV parser plugin
    Using Treasure Data data connector
Done. Please use connector:guess and connector:run to load the files.
Next steps:

    $ td connector:guess seed.yml -o config.yml
    $ td connector:preview config.yml
    $ td connector:run config.yml

As you notice, import from local files and most of mysql servers are not supported by data connector. In this case, we need to use embulk + embulk-output-td.

@nahi @muga any thoughts?

@muga
Copy link
Member

muga commented Aug 19, 2015

I think subcommand should not be in connector: group because it's a migration tool for legacy a tool where new users don't have to be aware of it.

Agree. it's better that this feature should be moved to import: group.

As you notice, import from local files and most of mysql servers are not supported by data connector. In this case, we need to use embulk + embulk-output-td.

Nice. Since most S3 buckets can be accessed by data connector, it can be migrated to data connector.

  • local files -> embulk + embulk-output-td
  • mysql -> embulk + embulk-output-td
  • s3 files -> data connector

@takkanm
Copy link
Contributor Author

takkanm commented Aug 20, 2015

Thank you for reviewing.
I think that it is trying to respond as follows.

  • command name is import:config
  • s3 files is data connector
  • local files and mysql is embulk + embulk-output-td

@toru-takahashi
Copy link
Contributor

import:config looks good.

Also, please check the following cases.

  1. If user don't have time column, they uses --time-value TIME,HOURS option.
  2. If user uses --column-header and --columns option. It should be used --columns name without reading header line.

@muga
Copy link
Member

muga commented Aug 22, 2015

@toru-takahashi Right. That would help users' migration.

@takkanm
Copy link
Contributor Author

takkanm commented Aug 25, 2015

I have one question. Should td import:config support --column option ?

td import:config create configuration file for guess commands.
I am understanding that guess commands create new column option from source file.
Therefore, I will leave it to guess about the column, I think that it is good for the user to edit.
How about you ?

@muga muga closed this Aug 25, 2015
@muga muga reopened this Aug 25, 2015
@takkanm takkanm force-pushed the create_embulk_config_subcommand branch 2 times, most recently from da83c8c to 362c4a7 Compare August 25, 2015 08:24
@muga
Copy link
Member

muga commented Aug 25, 2015

@takkanm That's right.

I thought that columns option should be migrated to rename filter's config. But the config requires existing column names. In seed.yml, existing column names are not defined.

column-header option could be ignored because, you know, import:guess command just generates seed.yml only.

@takkanm
Copy link
Contributor Author

takkanm commented Aug 26, 2015

@muga Thank you. So, I think I try to issue a warning message if user set --column and --column-header option.
How about you ?

@takkanm takkanm force-pushed the create_embulk_config_subcommand branch from e2753f9 to e57d900 Compare August 26, 2015 02:21
@muga
Copy link
Member

muga commented Aug 26, 2015

@takkanm sounds perfect. please go ahead.

takkanm added a commit that referenced this pull request Oct 29, 2015
@takkanm takkanm merged commit 52c2323 into master Oct 29, 2015
@takkanm takkanm deleted the create_embulk_config_subcommand branch October 29, 2015 01:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants