-
Notifications
You must be signed in to change notification settings - Fork 11.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new Python API SBCommandInterpreter::GetTranscript()
#90703
Conversation
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write If you have received no comments on your PR for a week, you can request a review If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
@llvm/pr-subscribers-lldb Author: None (royitaqi) ChangesMotivationCurrently, the user can already get the "transcript" (for "what is the transcript", see So basically, there is no way to obtain the transcript:
ProposalAdd a new Python API Goals:
Structured data:
AlternativesThe return type can also be Full diff: https://github.com/llvm/llvm-project/pull/90703.diff 2 Files Affected:
diff --git a/lldb/include/lldb/API/SBCommandInterpreter.h b/lldb/include/lldb/API/SBCommandInterpreter.h
index ba2e049204b8e6..d65f06d676f91f 100644
--- a/lldb/include/lldb/API/SBCommandInterpreter.h
+++ b/lldb/include/lldb/API/SBCommandInterpreter.h
@@ -247,13 +247,13 @@ class SBCommandInterpreter {
lldb::SBStringList &matches,
lldb::SBStringList &descriptions);
- /// Returns whether an interrupt flag was raised either by the SBDebugger -
+ /// Returns whether an interrupt flag was raised either by the SBDebugger -
/// when the function is not running on the RunCommandInterpreter thread, or
/// by SBCommandInterpreter::InterruptCommand if it is. If your code is doing
- /// interruptible work, check this API periodically, and interrupt if it
+ /// interruptible work, check this API periodically, and interrupt if it
/// returns true.
bool WasInterrupted() const;
-
+
/// Interrupts the command currently executing in the RunCommandInterpreter
/// thread.
///
@@ -318,6 +318,12 @@ class SBCommandInterpreter {
SBStructuredData GetStatistics();
+ /// Returns a list of handled commands, output and error. Each element in
+ /// the list is a dictionary with three keys: "command" (string), "output"
+ /// (list of strings) and optionally "error" (list of strings). Each string
+ /// in "output" and "error" is a line (without EOL characteres).
+ SBStructuredData GetTranscript();
+
protected:
friend class lldb_private::CommandPluginInterfaceImplementation;
diff --git a/lldb/source/API/SBCommandInterpreter.cpp b/lldb/source/API/SBCommandInterpreter.cpp
index 83c0951c56db60..242b3f8f09c48a 100644
--- a/lldb/source/API/SBCommandInterpreter.cpp
+++ b/lldb/source/API/SBCommandInterpreter.cpp
@@ -150,7 +150,7 @@ bool SBCommandInterpreter::WasInterrupted() const {
bool SBCommandInterpreter::InterruptCommand() {
LLDB_INSTRUMENT_VA(this);
-
+
return (IsValid() ? m_opaque_ptr->InterruptCommand() : false);
}
@@ -571,6 +571,11 @@ SBStructuredData SBCommandInterpreter::GetStatistics() {
return data;
}
+SBStructuredData SBCommandInterpreter::GetTranscript() {
+ LLDB_INSTRUMENT_VA(this);
+ return SBStructuredData();
+}
+
lldb::SBCommand SBCommandInterpreter::AddMultiwordCommand(const char *name,
const char *help) {
LLDB_INSTRUMENT_VA(this, name, help);
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You will need to hook this up so it actually works, and then add a test case to verify this works as expected.
I think you forgot to include the implementation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated according to @medismailben 's suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking better. Thanks for sticking with us through the review! :)
lldb/test/API/python_api/interpreter/TestCommandInterpreterAPI.py
Outdated
Show resolved
Hide resolved
I believe I have addressed all your comments, @bulbazord . Feel free to LMK if you notice anything else. I appreciate your comments so far - they all make sense and are great comments to help me improve this PR. Thank you for spending time on this. Also thank you for working with me patiently - I'm learning a lot through this experience. |
-1, false)); | ||
transcript_item->AddItem( | ||
"error", StructuredData::Array::SplitString(result.GetErrorData(), '\n', | ||
-1, false)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we time the "cmd_obj->Execute()" calls and add this as a key/value pair?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my rely to your other comment about I plan to do it as a separate, follow-up PR.
Sorry to come in a little late on this, but I think we need a setting to turn this off as well. If you aren't planning to use the transcript, you shouldn't have to pay the cost for it. |
Hi @jimingham ,
No worries. Thanks for chiming in.
Could you elaborate about why this is necessary? If we really need this, can I add the setting in a follow-up PR? -- FWIW, to help make the conversation faster (reduce rounds of communication), I will write down my unverified thoughts below. Please feel free to point out where we diverge (where I am probably wrong). First, why I think the added cost is negligible:
Secondly, if we are going to add settings, there needs to be product design considerations based on user usage/feedback, so we probably need more time to gather data points after shipping this feature.
|
On May 7, 2024, at 11:28 AM, royitaqi ***@***.***> wrote:
Hi @jimingham <https://github.com/jimingham> ,
Sorry to come in a little late on this
No worries. Thanks for chiming in.
I think we need a setting to turn this off as well. If you aren't planning to use the transcript, you shouldn't have to pay the cost for it.
Could you elaborate about why this is necessary? If we really need this, can I add the setting in a follow-up PR?
--
FWIW, to help make the conversation faster (reduce rounds of communication), I will write down my gut feeling below. Please kindly correct me if I'm wrong.
First, why I think the added cost is negligible:
The baseline is that there is already a m_transcript_stream, which doesn't have any settings to turn off. This PR just doubles that cost (plus some string split operations). It doesn't add huge cost on top of this baseline.
That's equally an argument for allowing me to turn that off as well - if I never plan to use `session save` then it's silly to take up memory for that either.
Maintaining transcript is a per-command operation, and commands are relatively infrequently invoked. Here I assume the common use cases run tens or at most hundreds of commands in a debug session. The above cost times 10x ~ 100x isn't much.
People have automated scripts that run debug sessions, and you are also storing output text so this can get fairly big. Plus if you start adding time stamps we're stopping to get the system clock on every command, it gets more expensive. For people running lldb on big beefy machines, this is unlikely to matter for, but people also run lldb on low memory devices, where resources are tighter.
The cost of the commands themselves (and the process) are much higher than that of the transcript. E.g. resolving breakpoints and printing variables all need to do actual work, rather than just recording info.
I don't think "unrelated thing X is bigger" is a a useful data point when considering whether it's good to do Y.
Secondly, if we are going to add settings, there needs to be product design considerations based on user usage/feedback, so we probably need more time to gather data points after shipping this feature.
Prematurely add settings (which we cannot remove, I assume) can lead us down to a more difficult road.
Not sure why a setting to turn off recording transcripts would ever be controversial.
I think it will be the norm that we record transcripts, like bash/zsh records command histories, so by default this should be on. In fact it is already, by m_transcript_stream.
I guess. I must admit I've used lldb since it existed and have never once used `session save`. After all, my Terminal has a nice big buffer... For my workflow that's all wasted memory.
Besides just a binary on/off, settings can also allow more granular control. E.g. verbosity; only log transcript for a subset of commands; limit the memory footprint by using a cyclic buffer; etc. If we add these settings, they will cause the on/off setting to be redundant and needs to be removed.
I don't see the force of this argument at all. Having a control "whether to do X at all" and then another set "if I were to do X, how should I do it" is a pretty standard approach. As a matter of fact, that's a better design since then I can disable the feature temporarily without having to mess up the settings I want when I do do it.
(As mentioned in "Secondly") The above are all possibilities. We need user usage/feedback to guide the design of these settings. So I think we should ship this PR first then gather data points.
I really don't think this is controversial, and I'm not sure how you would gather data points. If it helps to get started, however, I vote there should be a way to turn this off.
But if you are running out of time for this feature and really need to get it off your plate, deferring the ability to turn it off to a subsequent PR is okay.
Jim
… —
Reply to this email directly, view it on GitHub <#90703 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADUPVW5NYBNLLRY6ETNKPN3ZBEMMFAVCNFSM6AAAAABHBPBMWOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOJZGA2TGNRYGA>.
You are receiving this because you were mentioned.
|
Hi @jimingham, Thank you so much for the detailed explanation about the context and your thoughts. I appreciate it. Good to know about the automated scripts and that you personally never used the existing transcript. It makes me feel that there are distinctively different use cases, and for the ones where the user know they won't need the transcript, they should be able to turn them off (and yes the text output can be large if someone opt to print e.g. statistics dump).
This is also an interesting design aspect that I never thought about. This is good learning. -- Overall I feel what you said makes sense. I will open a follow-up PR specifically for the settings, where we can discuss more (e.g. default on or off; should the setting turn off the existing transcript; should we add a |
Sounds good to me.
Jim
… On May 7, 2024, at 4:52 PM, royitaqi ***@***.***> wrote:
Hi @jimingham <https://github.com/jimingham>,
Thank you so much for the detailed explanation about the context and your thoughts. I appreciate it.
Good to know about the automated scripts and that you personally never used the existing transcript. It makes me feel that there are distinctively different use cases, and for the ones where the user know they won't need the transcript, they should be able to turn them off (and yes the text output can be large if someone opt to print e.g. statistics dump).
As a matter of fact, that's a better design since then I can disable the feature temporarily without having to mess up the settings I want when I do do it.
This is also an interesting design aspect that I never thought about. This is good learning.
--
Overall I feel what you said makes sense. I will open a follow-up PR specifically for the settings, where we can discuss more (e.g. default on or off; should the setting turn off the existing transcript; should we add a interpreter.save-session-format; etc).
—
Reply to this email directly, view it on GitHub <#90703 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADUPVW3UEZVXIJWIB64IBITZBFSNTAVCNFSM6AAAAABHBPBMWOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOJZGQ4TAMZTGA>.
You are receiving this because you were mentioned.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs a few changes. See inline comments.
@@ -1891,6 +1893,12 @@ bool CommandInterpreter::HandleCommand(const char *command_line, | |||
|
|||
m_transcript_stream << "(lldb) " << command_line << '\n'; | |||
|
|||
// The same `transcript_item` will be used below to add output and error of | |||
// the command. | |||
auto transcript_item = std::make_shared<StructuredData::Dictionary>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are not sharing it. Use std::make_unique
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two reasons for using the shared_ptr
:
m_transcript
holds ashared_ptr
anyways. Them_transcript->AddItem(transcript_item)
call below this line also requires ashared_ptr
.- Many lines below this line, the
transcript_item
is used again in order to add the command's output and error.
If we use unique_ptr
instead, then the code will look like:
auto transcript_item = std::make_unique<StructuredData::Dictionary>();
transcript_item->AddStringItem("command", command_line);
m_transcript->AddItem(std::move(transcript_item));
...
auto last_transcript_item = m_transcript.last();
last_transcript_item->AddStringItem("output", result.GetOutputData());
last_transcript_item->AddStringItem("error", result.GetErrorData());
I feel the last part is weird (i.e. first creating transcript_item
and move into m_transcript
, then getting it again from m_transcript
). I feel a shared_ptr
is more readable and performant (since m_transcript
holds a shared_ptr
anyways, rather than adding then getting).
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General looks good to me. I do agree that we should hide this behind a setting to avoid memory leak in long running lldb sessions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few comments to fixup as mentioned and this is good to go.
Co-authored-by: Med Ismail Bennani <[email protected]>
…turedData::Array; add tests
d55005a
to
542615a
Compare
You can test this locally with the following command:darker --check --diff -r 7ecdf620330d8e044a48b6f59f8eddd2f88f01d4...542615a18949f6f325c03761104b2f0530270065 lldb/test/API/commands/session/save/TestSessionSave.py lldb/test/API/python_api/interpreter/TestCommandInterpreterAPI.py View the diff from darker here.--- python_api/interpreter/TestCommandInterpreterAPI.py 2024-05-20 22:05:58.000000 +0000
+++ python_api/interpreter/TestCommandInterpreterAPI.py 2024-05-20 22:39:44.633426 +0000
@@ -140,27 +140,31 @@
# 1. Some of the asserts rely on the exact output format of the
# commands. Hopefully we are not changing them any time soon.
# 2. We are removing the "seconds" field from each command, so that
# some of the validations below can be easier / more readable.
for command in transcript:
- del(command["seconds"])
+ del command["seconds"]
# (lldb) version
self.assertEqual(transcript[0]["command"], "version")
self.assertIn("lldb version", transcript[0]["output"])
self.assertEqual(transcript[0]["error"], "")
# (lldb) an-unknown-command
- self.assertEqual(transcript[1],
+ self.assertEqual(
+ transcript[1],
{
"command": "an-unknown-command",
"output": "",
"error": "error: 'an-unknown-command' is not a valid command.\n",
- })
+ },
+ )
# (lldb) breakpoint set -f main.c -l <line>
- self.assertEqual(transcript[2]["command"], "breakpoint set -f main.c -l %d" % self.line)
+ self.assertEqual(
+ transcript[2]["command"], "breakpoint set -f main.c -l %d" % self.line
+ )
# Breakpoint 1: where = a.out`main + 29 at main.c:5:3, address = 0x0000000100000f7d
self.assertIn("Breakpoint 1: where = a.out`main ", transcript[2]["output"])
self.assertEqual(transcript[2]["error"], "")
# (lldb) r
@@ -169,16 +173,18 @@
self.assertIn("Process", transcript[3]["output"])
self.assertIn("launched", transcript[3]["output"])
self.assertEqual(transcript[3]["error"], "")
# (lldb) p a
- self.assertEqual(transcript[4],
+ self.assertEqual(
+ transcript[4],
{
"command": "p a",
"output": "(int) 123\n",
"error": "",
- })
+ },
+ )
# (lldb) statistics dump
statistics_dump = json.loads(transcript[5]["output"])
# Dump result should be valid JSON
self.assertTrue(statistics_dump is not json.JSONDecodeError)
@@ -191,11 +197,14 @@
def test_save_transcript_setting_default(self):
ci = self.buildAndCreateTarget()
res = lldb.SBCommandReturnObject()
# The setting's default value should be "false"
- self.runCmd("settings show interpreter.save-transcript", "interpreter.save-transcript (boolean) = false\n")
+ self.runCmd(
+ "settings show interpreter.save-transcript",
+ "interpreter.save-transcript (boolean) = false\n",
+ )
# self.assertEqual(res.GetOutput(), )
def test_save_transcript_setting_off(self):
ci = self.buildAndCreateTarget()
@@ -237,19 +246,39 @@
# Run commands and get the transcript as structured data
self.runCmd("version")
structured_data_1 = ci.GetTranscript()
self.assertTrue(structured_data_1.IsValid())
self.assertEqual(structured_data_1.GetSize(), 1)
- self.assertEqual(structured_data_1.GetItemAtIndex(0).GetValueForKey("command").GetStringValue(100), "version")
+ self.assertEqual(
+ structured_data_1.GetItemAtIndex(0)
+ .GetValueForKey("command")
+ .GetStringValue(100),
+ "version",
+ )
# Run some more commands and get the transcript as structured data again
self.runCmd("help")
structured_data_2 = ci.GetTranscript()
self.assertTrue(structured_data_2.IsValid())
self.assertEqual(structured_data_2.GetSize(), 2)
- self.assertEqual(structured_data_2.GetItemAtIndex(0).GetValueForKey("command").GetStringValue(100), "version")
- self.assertEqual(structured_data_2.GetItemAtIndex(1).GetValueForKey("command").GetStringValue(100), "help")
+ self.assertEqual(
+ structured_data_2.GetItemAtIndex(0)
+ .GetValueForKey("command")
+ .GetStringValue(100),
+ "version",
+ )
+ self.assertEqual(
+ structured_data_2.GetItemAtIndex(1)
+ .GetValueForKey("command")
+ .GetStringValue(100),
+ "help",
+ )
# Now, the first structured data should remain unchanged
self.assertTrue(structured_data_1.IsValid())
self.assertEqual(structured_data_1.GetSize(), 1)
- self.assertEqual(structured_data_1.GetItemAtIndex(0).GetValueForKey("command").GetStringValue(100), "version")
+ self.assertEqual(
+ structured_data_1.GetItemAtIndex(0)
+ .GetValueForKey("command")
+ .GetStringValue(100),
+ "version",
+ )
|
@royitaqi Congratulations on having your first Pull Request (PR) merged into the LLVM Project! Your changes will be combined with recent changes from other authors, then tested Please check whether problems have been caused by your change specifically, as How to do this, and the rest of the post-merge process, is covered in detail here. If your change does cause a problem, it may be reverted, or you can revert it yourself. If you don't get any reports, no action is required from you. Your changes are working as expected, well done! |
…109020) Somewhat recently, we made the change to hide the behavior to save LLDB session history to the transcript buffer behind the flag `interpreter.save-transcript`. By default, `interpreter.save-transcript` is false. See #90703 for context. I'm making a small update here to our `session save` messaging and some help docs to clarify for users that aren't aware of this change. Maybe `interpreter.save-transcript` could be true by default as well. Any feedback welcome. # Tests ``` bin/lldb-dotest -p TestSessionSave ``` --------- Co-authored-by: Tom Yang <[email protected]>
…lvm#109020) Somewhat recently, we made the change to hide the behavior to save LLDB session history to the transcript buffer behind the flag `interpreter.save-transcript`. By default, `interpreter.save-transcript` is false. See llvm#90703 for context. I'm making a small update here to our `session save` messaging and some help docs to clarify for users that aren't aware of this change. Maybe `interpreter.save-transcript` could be true by default as well. Any feedback welcome. # Tests ``` bin/lldb-dotest -p TestSessionSave ``` --------- Co-authored-by: Tom Yang <[email protected]>
Motivation
Currently, the user can already get the "transcript" (for "what is the transcript", see
CommandInterpreter::SaveTranscript
). However, the only way to obtain the transcript data as a user is to first destroy the debugger, then read the save directory. Note that destroy-callbacks cannot be used, because 1\ transcript data is private to the command interpreter (seeCommandInterpreter.h
), and 2\ the writing of the transcript is after the invocation of destory-callbacks (seeDebugger::Destroy
).So basically, there is no way to obtain the transcript:
In theory, there are other ways for user to obtain transcript data during a debugger's life cycle:
However, such ways rely on the client's setup and are not supported natively by LLDB.
Proposal
Add a new Python API
SBCommandInterpreter::GetTranscript()
.Goals:
Structured data:
SBStructuredData
. See comments in code for how the data is organized.SaveTranscript
can be updated to write different formats using such data (e.g. JSON). This is probably accompanied by a new setting (e.g.interpreter.save-session-format
).Alternatives
The return type can also be
std::vector<std::pair<std::string, SBCommandReturnObject>>
. This will make implementation easier, without having to translate it toSBStructuredData
. On the other hand,SBStructuredData
can convert to JSON easily, so it's more convenient for user to process.Privacy
Both user commands and output/error in the transcript can contain privacy data. However, as mentioned, the transcript is already available to the user. The addition of the new API doesn't increase the level of risk. In fact, it lowers the risk of privacy data being leaked later on, by avoiding writing such data to external storage.
Once the user (or their code) gets the transcript, it will be their responsibility to make sure that any required privacy policies are guaranteed.
Tests