Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tap-postgres does initial_full_sync even after a successful Fastsync #990

Open
halilduygulu opened this issue Jul 11, 2022 · 2 comments
Open
Labels
bug Something isn't working

Comments

@halilduygulu
Copy link

Hi, I am trying to run tap-postgres with target-redshfit via following command, with latest version, make installed.

pipelinewise run_tap --tap postgres_db_logical --target redshift_dwh

Getting weird results, first fast_sync completes, then immediately singer sync starts and it is also doing an logical_initial full table copy for every table which fast_sync already copied.

logger_name=tap_postgres log_level=INFO message=Beginning sync of stream(public-xxxxx) with sync method(logical_initial)
logger_name=tap_postgres log_level=INFO message=Performing initial full table sync

And after initial load finish and pipelinewise said

TAP RUN SUMMARY
-------------------------------------------------------
    Status  : SUCCESS
    Runtime : 0:01:58.969750

I ran same command above, like multiple runs of this on schedule, and I get error below,

pipelinewise run_tap --tap postgres_db_logical --target redshift_dwh

time=2022-07-11 16:29:29 logger_name=pipelinewise log_level=INFO message=Profiling mode not enabled
time=2022-07-11 16:29:29 logger_name=pipelinewise.cli.pipelinewise log_level=INFO message=Running postgres_db_logical tap in redshift_dwh target
time=2022-07-11 16:29:29 logger_name=pipelinewise.cli.pipelinewise log_level=INFO message=No table available that needs to be sync by fastsync
time=2022-07-11 16:29:29 logger_name=pipelinewise.cli.pipelinewise log_level=INFO message=Table(s) selected to sync by singer: ['public-xxx', 'public-xx', 'public-xxx', 'public-xxx', 'public-xxx', 'public-xxx', 'public-xxxx']
time=2022-07-11 16:29:29 logger_name=pipelinewise.cli.commands log_level=INFO message=Writing output into /root/.pipelinewise/redshift_dwh/postgres_db_logical/log/redshift_dwh-postgres_db_logical-20220711_162929.singer.log
time=2022-07-11 16:29:29 logger_name=pipelinewise.cli.pipelinewise log_level=ERROR message=Command failed. Return code: 1
Error(s) found:
time=2022-07-11 16:29:29 logger_name=tap_postgres log_level=CRITICAL message='last_replication_method'

Full log: /root/.pipelinewise/redshift_dwh/postgres_db_logical/log/redshift_dwh-postgres_db_logical-20220711_162929.singer.log.failed
Traceback (most recent call last):
  File "/home/ec2-user/pipelinewise/pipelinewise/cli/pipelinewise.py", line 1307, in run_tap
    stream_buffer_size=stream_buffer_size,
  File "/home/ec2-user/pipelinewise/pipelinewise/cli/pipelinewise.py", line 1082, in run_tap_singer
    commands.run_command(command, self.tap_run_log_file, update_state_file)
  File "/home/ec2-user/pipelinewise/pipelinewise/cli/commands.py", line 534, in run_command
    f'Command failed. Return code: {proc_rc}\n'
pipelinewise.cli.commands.RunCommandException: Command failed. Return code: 1
Error(s) found:
time=2022-07-11 16:29:29 logger_name=tap_postgres log_level=CRITICAL message='last_replication_method'

Full log: /root/.pipelinewise/redshift_dwh/postgres_db_logical/log/redshift_dwh-postgres_db_logical-20220711_162929.singer.log.failed

My state file looks like this after fast_sync

{
    "bookmarks": {
        "mydbname-public-xxxxxx": {
            "lsn": 568720577528040,
            "version": 1
        },
        "mydbname-public-xxxxxx": {
            "lsn": 568716359188480,
            "version": 1
        },
.....
}}

And after fast+singer sync (removed many tables, but basically, some has last_replication_method , some don't)

cat state.json
{
   "bookmarks":{
      "mydbname-public-xxxxxx":{
         "lsn":568757755840832,
         "version":1
      },
      "mydbname-public-xxxxxxx":{
         "lsn":568757755840832,
         "version":1
      },
      "public-xxxx":{
         "last_replication_method":"LOG_BASED",
         "lsn":568757755840832,
         "version":1657556819603,
         "xmin":null
      },
      "public-xxxxxx":{
         "last_replication_method":"LOG_BASED",
         "lsn":568757755840832,
         "version":1657556852981,
         "xmin":null
      }
   },
   "currently_syncing":null
}

So there are 2 problems here,

  • why tap-postgres starts full singer sync after full_sync
  • why state file has dbname in fastsync state entires but not in singer entires

using latest version from master, installed friday.

@halilduygulu
Copy link
Author

halilduygulu commented Jul 12, 2022

I modified state.json file manually to understand problem, now it is able to work with consecutive runs, but I can not understand how this is passing tests or other people use this project.

state file looks like this after fast_sync;

{
    "bookmarks": {
        "mysbname-public-xxxx": {
            "lsn": 568778962241816,
            "version": 1
        },
        .....
    },
    "currently_syncing": null
}

state file after singer sync that immediately follows fast_sync in one run_tap command run;

{
    "bookmarks": {
        "mydb_name-public-xxxx": {
            "lsn": 568778962241816,
            "version": 1
        },
        "public-xxxx": {
            "last_replication_method": "LOG_BASED",
            "lsn": 568779997835968,
            "version": 1657612174694,
            "xmin": null
        },
        .....
    },
    "currently_syncing": null
}

Notice there is database name prefix on fast_sync state entires, so I deleted that and added "last_replication_method": "LOG_BASED", to each table, then a run_tap command can run as expected using singer without last_replication_method key error above.

postgres_to_redshift.py is the only file contains database name in the write state function call :/
Screenshot 2022-07-12 at 10 32 22

@halilduygulu
Copy link
Author

for anyone faced same situation, it works after these changes
v0.47.1...halilduygulu:pipelinewise:master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant