Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support traffic replay #642

Open
18 of 43 tasks
djshow832 opened this issue Aug 26, 2024 · 0 comments
Open
18 of 43 tasks

Support traffic replay #642

djshow832 opened this issue Aug 26, 2024 · 0 comments
Assignees

Comments

@djshow832
Copy link
Collaborator

djshow832 commented Aug 26, 2024

Feature Request

Describe your feature request related problem

There are some cases when users want to capture the traffic on the production cluster and replay the traffic on a testing cluster:

  • A new TiDB version may have compatibility breakers, such as the statements failing, running slower, or resulting in different query results.
  • When the cluster runs unexpectedly, users want to capture the traffic so that they can investigate it later by replaying the traffic.
  • Test the maximum throughput of a scaled-up or scaled-down cluster using the real workload instead of Sysbench or TPCC.

Some traffic replay tools are widely used, including tcpcopy, mysql-replay, and query-playback. Tcpcopy and mysql-replay capture data like tcpdump, while query-playback is based on slow logs. Although some of them are built for MySQL, deploying them on the proxy instance also works.

However, they have some limitations:

  • Users need to learn the deployment and usage of these tools.
  • Tcpcopy only captures new connections, which is unfriendly for persistent connections. Mysql-replay can capture existing connections but it loses session states such as prepared statements and session variables, which may make replay fail.
  • The tools replay the traffic with one username and one current schema, which requires modification to the testing cluster.
  • Tcpcopy and mysql-replay don't support TLS because they can't decode the encrypted data.
  • Users need to verify the results and performance manually.

Describe the feature you'd like

Capturing traffic on one TiDB cluster and replaying the traffic on a new TiDB cluster to verify the SQL compatibility and performance of the new TiDB cluster.

Tasks

Design

Traffic Capture

P1:

P2:

  • Output execution duration to traffic files
  • Output the query result information to the traffic files
  • Order by the statements by start time
  • Support writing huge commands
  • Log a command without query result information if it runs too long
  • Support more command types
  • Memory control of traffic capture

P3:

  • Output traffic files with encryption
  • Output traffic files to a remote address

Traffic Replay

P1:

P2:

  • Streaming reading and sending large commands
  • Support statement filter for replay

P3:

  • Support more auth plugins
  • Support replaying traffic as fast as possible
  • Support reading from MySQL slow logs
  • Support reading from tcpdump files

Comparison and Report

P1:

P2:

  • Write failed statements into tables and then update their counts
  • Output slower statements and execution time
  • Output mismatched statements and their row counts and result set sizes

P3:

  • Compare metrics like CPU, memory, QPS, duration

User Interface

P1:

P2:

  • Support SQL to capture and replay traffic
  • Add a --force arg to remove the traffic file or report forcibly if it exists
  • Show the estimated disk usage of the traffic files

P3:

  • Support TiDB Dashboard to capture and replay traffic
  • Support TiDB Dashboard to show report
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants
@djshow832 and others