Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new Python API SBCommandInterpreter::GetTranscript() #90703

Merged
merged 19 commits into from
May 20, 2024

Conversation

royitaqi
Copy link
Contributor

@royitaqi royitaqi commented May 1, 2024

Motivation

Currently, the user can already get the "transcript" (for "what is the transcript", see CommandInterpreter::SaveTranscript). However, the only way to obtain the transcript data as a user is to first destroy the debugger, then read the save directory. Note that destroy-callbacks cannot be used, because 1\ transcript data is private to the command interpreter (see CommandInterpreter.h), and 2\ the writing of the transcript is after the invocation of destory-callbacks (see Debugger::Destroy).

So basically, there is no way to obtain the transcript:

  • during the lifetime of a debugger (including the destroy-callbacks, which often performs logging tasks, where the transcript can be useful)
  • without relying on external storage

In theory, there are other ways for user to obtain transcript data during a debugger's life cycle:

  • Use Python API and intercept commands and results.
  • Use CLI and record console input/output.

However, such ways rely on the client's setup and are not supported natively by LLDB.

Proposal

Add a new Python API SBCommandInterpreter::GetTranscript().

Goals:

  • It can be called at any time during the debugger's life cycle, including in destroy-callbacks.
  • It returns data in-memory.

Structured data:

  • To make data processing easier, the return type is SBStructuredData. See comments in code for how the data is organized.
  • In the future, SaveTranscript can be updated to write different formats using such data (e.g. JSON). This is probably accompanied by a new setting (e.g. interpreter.save-session-format).

Alternatives

The return type can also be std::vector<std::pair<std::string, SBCommandReturnObject>>. This will make implementation easier, without having to translate it to SBStructuredData. On the other hand, SBStructuredData can convert to JSON easily, so it's more convenient for user to process.

Privacy

Both user commands and output/error in the transcript can contain privacy data. However, as mentioned, the transcript is already available to the user. The addition of the new API doesn't increase the level of risk. In fact, it lowers the risk of privacy data being leaked later on, by avoiding writing such data to external storage.

Once the user (or their code) gets the transcript, it will be their responsibility to make sure that any required privacy policies are guaranteed.

Tests

bin/llvm-lit -sv ../external/llvm-project/lldb/test/API/python_api/interpreter/TestCommandInterpreterAPI.py
bin/llvm-lit -sv ../external/llvm-project/lldb/test/API/commands/session/save/TestSessionSave.py

Copy link

github-actions bot commented May 1, 2024

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be
notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write
permissions for the repository. In which case you can instead tag reviewers by
name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review
by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate
is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added the lldb label May 1, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented May 1, 2024

@llvm/pr-subscribers-lldb

Author: None (royitaqi)

Changes

Motivation

Currently, the user can already get the "transcript" (for "what is the transcript", see CommandInterpreter::SaveTranscript()). However, the only way to obtain the transcript data as a user is to first destroy the debugger, then read the save directory. Note that destroy-callbacks cannot be used, because 1\ transcript data is private to the command interpreter (see CommandInterpreter.h), and 2\ the writing of the transcript is after the invocation of destory-callbacks (see Debugger::Destroy()).

So basically, there is no way to obtain the transcript:

  • during the lifetime of a debugger (including the destroy-callbacks, which is often performs logging tasks)
  • without relying on external storage

Proposal

Add a new Python API SBCommandInterpreter::GetTranscript().

Goals:

  • It can be called at any time during the debugger's life cycle, including in destroy-callbacks.
  • It returns data in-memory.

Structured data:

  • To make data processing easier, the return type is SBStructuredData. See comments in code for how the data is organized.
  • In the future, SaveTranscript can be updated to write different formats using such data (e.g. JSON). This is probably accompanied by a new setting (e.g. interpreter.save-session-format).

Alternatives

The return type can also be std::vector&lt;std::pair&lt;std::string, SBCommandReturnObject&gt;&gt;. This will make implementation easier, without having to translate it to SBStructuredData. On the other hand, SBStructuredData can convert to JSON easily, so it's more convenient for user to process.


Full diff: https://github.com/llvm/llvm-project/pull/90703.diff

2 Files Affected:

  • (modified) lldb/include/lldb/API/SBCommandInterpreter.h (+9-3)
  • (modified) lldb/source/API/SBCommandInterpreter.cpp (+6-1)
diff --git a/lldb/include/lldb/API/SBCommandInterpreter.h b/lldb/include/lldb/API/SBCommandInterpreter.h
index ba2e049204b8e6..d65f06d676f91f 100644
--- a/lldb/include/lldb/API/SBCommandInterpreter.h
+++ b/lldb/include/lldb/API/SBCommandInterpreter.h
@@ -247,13 +247,13 @@ class SBCommandInterpreter {
                                        lldb::SBStringList &matches,
                                        lldb::SBStringList &descriptions);
 
-  /// Returns whether an interrupt flag was raised either by the SBDebugger - 
+  /// Returns whether an interrupt flag was raised either by the SBDebugger -
   /// when the function is not running on the RunCommandInterpreter thread, or
   /// by SBCommandInterpreter::InterruptCommand if it is.  If your code is doing
-  /// interruptible work, check this API periodically, and interrupt if it 
+  /// interruptible work, check this API periodically, and interrupt if it
   /// returns true.
   bool WasInterrupted() const;
-  
+
   /// Interrupts the command currently executing in the RunCommandInterpreter
   /// thread.
   ///
@@ -318,6 +318,12 @@ class SBCommandInterpreter {
 
   SBStructuredData GetStatistics();
 
+  /// Returns a list of handled commands, output and error. Each element in
+  /// the list is a dictionary with three keys: "command" (string), "output"
+  /// (list of strings) and optionally "error" (list of strings). Each string
+  /// in "output" and "error" is a line (without EOL characteres).
+  SBStructuredData GetTranscript();
+
 protected:
   friend class lldb_private::CommandPluginInterfaceImplementation;
 
diff --git a/lldb/source/API/SBCommandInterpreter.cpp b/lldb/source/API/SBCommandInterpreter.cpp
index 83c0951c56db60..242b3f8f09c48a 100644
--- a/lldb/source/API/SBCommandInterpreter.cpp
+++ b/lldb/source/API/SBCommandInterpreter.cpp
@@ -150,7 +150,7 @@ bool SBCommandInterpreter::WasInterrupted() const {
 
 bool SBCommandInterpreter::InterruptCommand() {
   LLDB_INSTRUMENT_VA(this);
-  
+
   return (IsValid() ? m_opaque_ptr->InterruptCommand() : false);
 }
 
@@ -571,6 +571,11 @@ SBStructuredData SBCommandInterpreter::GetStatistics() {
   return data;
 }
 
+SBStructuredData SBCommandInterpreter::GetTranscript() {
+  LLDB_INSTRUMENT_VA(this);
+  return SBStructuredData();
+}
+
 lldb::SBCommand SBCommandInterpreter::AddMultiwordCommand(const char *name,
                                                           const char *help) {
   LLDB_INSTRUMENT_VA(this, name, help);

Copy link
Collaborator

@clayborg clayborg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You will need to hook this up so it actually works, and then add a test case to verify this works as expected.

@medismailben
Copy link
Member

I think you forgot to include the implementation.

lldb/include/lldb/Utility/StructuredData.h Outdated Show resolved Hide resolved
lldb/source/Interpreter/CommandInterpreter.cpp Outdated Show resolved Hide resolved
lldb/source/Utility/StructuredData.cpp Outdated Show resolved Hide resolved
lldb/source/Utility/StructuredData.cpp Outdated Show resolved Hide resolved
lldb/source/Utility/StructuredData.cpp Outdated Show resolved Hide resolved
Copy link
Contributor Author

@royitaqi royitaqi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated according to @medismailben 's suggestions.

lldb/source/Interpreter/CommandInterpreter.cpp Outdated Show resolved Hide resolved
lldb/source/Utility/StructuredData.cpp Outdated Show resolved Hide resolved
lldb/include/lldb/Utility/StructuredData.h Outdated Show resolved Hide resolved
lldb/source/Utility/StructuredData.cpp Outdated Show resolved Hide resolved
lldb/include/lldb/API/SBCommandInterpreter.h Outdated Show resolved Hide resolved
lldb/include/lldb/Interpreter/CommandInterpreter.h Outdated Show resolved Hide resolved
lldb/include/lldb/Utility/StructuredData.h Outdated Show resolved Hide resolved
lldb/include/lldb/Utility/StructuredData.h Outdated Show resolved Hide resolved
Copy link
Member

@bulbazord bulbazord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking better. Thanks for sticking with us through the review! :)

lldb/source/Utility/StructuredData.cpp Outdated Show resolved Hide resolved
@royitaqi royitaqi requested a review from bulbazord May 6, 2024 17:51
@royitaqi
Copy link
Contributor Author

royitaqi commented May 6, 2024

I believe I have addressed all your comments, @bulbazord . Feel free to LMK if you notice anything else.

I appreciate your comments so far - they all make sense and are great comments to help me improve this PR. Thank you for spending time on this. Also thank you for working with me patiently - I'm learning a lot through this experience.

lldb/include/lldb/Interpreter/CommandInterpreter.h Outdated Show resolved Hide resolved
-1, false));
transcript_item->AddItem(
"error", StructuredData::Array::SplitString(result.GetErrorData(), '\n',
-1, false));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we time the "cmd_obj->Execute()" calls and add this as a key/value pair?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my rely to your other comment about I plan to do it as a separate, follow-up PR.

@jimingham
Copy link
Collaborator

Sorry to come in a little late on this, but I think we need a setting to turn this off as well. If you aren't planning to use the transcript, you shouldn't have to pay the cost for it.

@royitaqi
Copy link
Contributor Author

royitaqi commented May 7, 2024

Hi @jimingham ,

Sorry to come in a little late on this

No worries. Thanks for chiming in.

I think we need a setting to turn this off as well. If you aren't planning to use the transcript, you shouldn't have to pay the cost for it.

Could you elaborate about why this is necessary? If we really need this, can I add the setting in a follow-up PR?

--

FWIW, to help make the conversation faster (reduce rounds of communication), I will write down my unverified thoughts below. Please feel free to point out where we diverge (where I am probably wrong).

First, why I think the added cost is negligible:

  1. The baseline is that there is already a m_transcript_stream, which doesn't have any settings to turn off. This PR just doubles that cost (plus some string split operations). It doesn't add huge cost on top of this baseline.
  2. Maintaining transcript is a per-command operation, and commands are relatively infrequently invoked. Here I assume the common use cases run tens or at most hundreds of commands in a debug session. The above cost times 10x ~ 100x isn't much.
  3. The cost of the commands themselves (and the process) are much higher than that of the transcript. E.g. resolving breakpoints and printing variables all need to do actual work, rather than just recording info.

Secondly, if we are going to add settings, there needs to be product design considerations based on user usage/feedback, so we probably need more time to gather data points after shipping this feature.

  1. Prematurely add settings (which we cannot remove, I assume) can lead us down to a more difficult road.
  2. I think it will be the norm that we record transcripts, like bash/zsh records command histories, so by default this should be on. In fact it is already on by default, as in m_transcript_stream.
  3. Besides just a binary on/off, settings can also allow more granular control. E.g. verbosity; only log transcript for a subset of commands; limit the memory footprint by using a cyclic buffer; etc. If we add these settings, they will cause the on/off setting to be redundant and needs to be removed.
  4. (As mentioned in "Secondly") The above are all possibilities. We need user usage/feedback to guide the design of these settings. So I think we should ship this PR first then gather data points.

@jimingham
Copy link
Collaborator

jimingham commented May 7, 2024 via email

@royitaqi
Copy link
Contributor Author

royitaqi commented May 7, 2024

Hi @jimingham,

Thank you so much for the detailed explanation about the context and your thoughts. I appreciate it.

Good to know about the automated scripts and that you personally never used the existing transcript. It makes me feel that there are distinctively different use cases, and for the ones where the user know they won't need the transcript, they should be able to turn them off (and yes the text output can be large if someone opt to print e.g. statistics dump).

As a matter of fact, that's a better design since then I can disable the feature temporarily without having to mess up the settings I want when I do do it.

This is also an interesting design aspect that I never thought about. This is good learning.

--

Overall I feel what you said makes sense. I will open a follow-up PR specifically for the settings, where we can discuss more (e.g. default on or off; should the setting turn off the existing transcript; should we add a interpreter.save-session-format; etc).

@jimingham
Copy link
Collaborator

jimingham commented May 8, 2024 via email

Copy link
Collaborator

@clayborg clayborg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a few changes. See inline comments.

lldb/source/Interpreter/CommandInterpreter.cpp Outdated Show resolved Hide resolved
lldb/include/lldb/Utility/StructuredData.h Outdated Show resolved Hide resolved
lldb/source/Utility/StructuredData.cpp Outdated Show resolved Hide resolved
@royitaqi royitaqi requested a review from clayborg May 10, 2024 20:35
lldb/include/lldb/API/SBCommandInterpreter.h Outdated Show resolved Hide resolved
lldb/include/lldb/Interpreter/CommandInterpreter.h Outdated Show resolved Hide resolved
lldb/source/API/SBCommandInterpreter.cpp Outdated Show resolved Hide resolved
lldb/source/Interpreter/CommandInterpreter.cpp Outdated Show resolved Hide resolved
@@ -1891,6 +1893,12 @@ bool CommandInterpreter::HandleCommand(const char *command_line,

m_transcript_stream << "(lldb) " << command_line << '\n';

// The same `transcript_item` will be used below to add output and error of
// the command.
auto transcript_item = std::make_shared<StructuredData::Dictionary>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are not sharing it. Use std::make_unique

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two reasons for using the shared_ptr:

  1. m_transcript holds a shared_ptr anyways. The m_transcript->AddItem(transcript_item) call below this line also requires a shared_ptr.
  2. Many lines below this line, the transcript_item is used again in order to add the command's output and error.

If we use unique_ptr instead, then the code will look like:

  auto transcript_item = std::make_unique<StructuredData::Dictionary>();
  transcript_item->AddStringItem("command", command_line);
  m_transcript->AddItem(std::move(transcript_item));

  ...

  auto last_transcript_item = m_transcript.last();
  last_transcript_item->AddStringItem("output", result.GetOutputData());
  last_transcript_item->AddStringItem("error", result.GetErrorData());

I feel the last part is weird (i.e. first creating transcript_item and move into m_transcript, then getting it again from m_transcript). I feel a shared_ptr is more readable and performant (since m_transcript holds a shared_ptr anyways, rather than adding then getting).

WDYT?

Copy link
Contributor

@jeffreytan81 jeffreytan81 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General looks good to me. I do agree that we should hide this behind a setting to avoid memory leak in long running lldb sessions.

Copy link
Collaborator

@clayborg clayborg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few comments to fixup as mentioned and this is good to go.

Copy link

⚠️ Python code formatter, darker found issues in your code. ⚠️

You can test this locally with the following command:
darker --check --diff -r 7ecdf620330d8e044a48b6f59f8eddd2f88f01d4...542615a18949f6f325c03761104b2f0530270065 lldb/test/API/commands/session/save/TestSessionSave.py lldb/test/API/python_api/interpreter/TestCommandInterpreterAPI.py
View the diff from darker here.
--- python_api/interpreter/TestCommandInterpreterAPI.py	2024-05-20 22:05:58.000000 +0000
+++ python_api/interpreter/TestCommandInterpreterAPI.py	2024-05-20 22:39:44.633426 +0000
@@ -140,27 +140,31 @@
         # 1. Some of the asserts rely on the exact output format of the
         #    commands. Hopefully we are not changing them any time soon.
         # 2. We are removing the "seconds" field from each command, so that
         #    some of the validations below can be easier / more readable.
         for command in transcript:
-            del(command["seconds"])
+            del command["seconds"]
 
         # (lldb) version
         self.assertEqual(transcript[0]["command"], "version")
         self.assertIn("lldb version", transcript[0]["output"])
         self.assertEqual(transcript[0]["error"], "")
 
         # (lldb) an-unknown-command
-        self.assertEqual(transcript[1],
+        self.assertEqual(
+            transcript[1],
             {
                 "command": "an-unknown-command",
                 "output": "",
                 "error": "error: 'an-unknown-command' is not a valid command.\n",
-            })
+            },
+        )
 
         # (lldb) breakpoint set -f main.c -l <line>
-        self.assertEqual(transcript[2]["command"], "breakpoint set -f main.c -l %d" % self.line)
+        self.assertEqual(
+            transcript[2]["command"], "breakpoint set -f main.c -l %d" % self.line
+        )
         # Breakpoint 1: where = a.out`main + 29 at main.c:5:3, address = 0x0000000100000f7d
         self.assertIn("Breakpoint 1: where = a.out`main ", transcript[2]["output"])
         self.assertEqual(transcript[2]["error"], "")
 
         # (lldb) r
@@ -169,16 +173,18 @@
         self.assertIn("Process", transcript[3]["output"])
         self.assertIn("launched", transcript[3]["output"])
         self.assertEqual(transcript[3]["error"], "")
 
         # (lldb) p a
-        self.assertEqual(transcript[4],
+        self.assertEqual(
+            transcript[4],
             {
                 "command": "p a",
                 "output": "(int) 123\n",
                 "error": "",
-            })
+            },
+        )
 
         # (lldb) statistics dump
         statistics_dump = json.loads(transcript[5]["output"])
         # Dump result should be valid JSON
         self.assertTrue(statistics_dump is not json.JSONDecodeError)
@@ -191,11 +197,14 @@
     def test_save_transcript_setting_default(self):
         ci = self.buildAndCreateTarget()
         res = lldb.SBCommandReturnObject()
 
         # The setting's default value should be "false"
-        self.runCmd("settings show interpreter.save-transcript", "interpreter.save-transcript (boolean) = false\n")
+        self.runCmd(
+            "settings show interpreter.save-transcript",
+            "interpreter.save-transcript (boolean) = false\n",
+        )
         # self.assertEqual(res.GetOutput(), )
 
     def test_save_transcript_setting_off(self):
         ci = self.buildAndCreateTarget()
 
@@ -237,19 +246,39 @@
         # Run commands and get the transcript as structured data
         self.runCmd("version")
         structured_data_1 = ci.GetTranscript()
         self.assertTrue(structured_data_1.IsValid())
         self.assertEqual(structured_data_1.GetSize(), 1)
-        self.assertEqual(structured_data_1.GetItemAtIndex(0).GetValueForKey("command").GetStringValue(100), "version")
+        self.assertEqual(
+            structured_data_1.GetItemAtIndex(0)
+            .GetValueForKey("command")
+            .GetStringValue(100),
+            "version",
+        )
 
         # Run some more commands and get the transcript as structured data again
         self.runCmd("help")
         structured_data_2 = ci.GetTranscript()
         self.assertTrue(structured_data_2.IsValid())
         self.assertEqual(structured_data_2.GetSize(), 2)
-        self.assertEqual(structured_data_2.GetItemAtIndex(0).GetValueForKey("command").GetStringValue(100), "version")
-        self.assertEqual(structured_data_2.GetItemAtIndex(1).GetValueForKey("command").GetStringValue(100), "help")
+        self.assertEqual(
+            structured_data_2.GetItemAtIndex(0)
+            .GetValueForKey("command")
+            .GetStringValue(100),
+            "version",
+        )
+        self.assertEqual(
+            structured_data_2.GetItemAtIndex(1)
+            .GetValueForKey("command")
+            .GetStringValue(100),
+            "help",
+        )
 
         # Now, the first structured data should remain unchanged
         self.assertTrue(structured_data_1.IsValid())
         self.assertEqual(structured_data_1.GetSize(), 1)
-        self.assertEqual(structured_data_1.GetItemAtIndex(0).GetValueForKey("command").GetStringValue(100), "version")
+        self.assertEqual(
+            structured_data_1.GetItemAtIndex(0)
+            .GetValueForKey("command")
+            .GetStringValue(100),
+            "version",
+        )

@clayborg clayborg merged commit e8dc8d6 into llvm:main May 20, 2024
3 of 4 checks passed
Copy link

@royitaqi Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested
by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as
the builds can include changes from many authors. It is not uncommon for your
change to be included in a build that fails due to someone else's changes, or
infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself.
This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

zhyty added a commit that referenced this pull request Oct 5, 2024
…109020)

Somewhat recently, we made the change to hide the behavior to save LLDB
session history to the transcript buffer behind the flag
`interpreter.save-transcript`. By default, `interpreter.save-transcript`
is false. See #90703 for context.

I'm making a small update here to our `session save` messaging and some
help docs to clarify for users that aren't aware of this change. Maybe
`interpreter.save-transcript` could be true by default as well. Any
feedback welcome.

# Tests
```
bin/lldb-dotest -p TestSessionSave
```

---------

Co-authored-by: Tom Yang <[email protected]>
Kyvangka1610 pushed a commit to Kyvangka1610/llvm-project that referenced this pull request Oct 5, 2024
…lvm#109020)

Somewhat recently, we made the change to hide the behavior to save LLDB
session history to the transcript buffer behind the flag
`interpreter.save-transcript`. By default, `interpreter.save-transcript`
is false. See llvm#90703 for context.

I'm making a small update here to our `session save` messaging and some
help docs to clarify for users that aren't aware of this change. Maybe
`interpreter.save-transcript` could be true by default as well. Any
feedback welcome.

# Tests
```
bin/lldb-dotest -p TestSessionSave
```

---------

Co-authored-by: Tom Yang <[email protected]>
@royitaqi royitaqi deleted the new-interpreter-api branch October 24, 2024 21:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants