Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple quality of life improvements #20

Open
wants to merge 32 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 10 commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
26657c2
Removed spooky stacktrace
vadim0x60 Mar 8, 2024
88cb03c
new programlib
vadim0x60 Mar 8, 2024
26732d9
Revert "Process empty messages from LLMs while extracting code"
vadim0x60 Mar 21, 2024
34c5b83
RL eval
vadim0x60 Mar 25, 2024
18f77fd
Version bump
vadim0x60 Mar 25, 2024
14e6a1e
Fix
vadim0x60 Mar 27, 2024
2834c0d
fix
vadim0x60 Mar 27, 2024
35798f0
Require new programlib
vadim0x60 Mar 27, 2024
cdf2db2
program early shutdown fix
vadim0x60 Apr 5, 2024
1f304bc
RLeval fix
vadim0x60 Apr 9, 2024
31f7ede
RL error fix
vadim0x60 Apr 26, 2024
07cf6e5
Gymnasium eval fix
vadim0x60 May 2, 2024
fae86af
fix
vadim0x60 May 2, 2024
bc6bce6
RLEval fix
vadim0x60 May 31, 2024
3ec24b4
tunable error reward in RLEval
vadim0x60 May 31, 2024
051010b
fix
vadim0x60 May 31, 2024
3d64b0f
clean up processes
vadim0x60 Jun 4, 2024
1c4dbc4
fix
vadim0x60 Jun 4, 2024
560f005
version bump
vadim0x60 Jun 4, 2024
22edaff
editable openai base
vadim0x60 Jun 18, 2024
53b7d26
Version bump
vadim0x60 Jun 18, 2024
d7905b4
fix
vadim0x60 Jun 18, 2024
e6fc81a
version bump
vadim0x60 Jun 18, 2024
d109fb8
Addressing https://github.com/pexpect/pexpect/issues/47
vadim0x60 Jun 22, 2024
bf71a33
version bump
vadim0x60 Jun 22, 2024
bc4308a
Workaround pexpect/pexpect#462
vadim0x60 Jun 24, 2024
3b72715
Anthropic models
vadim0x60 Jun 26, 2024
3255ea1
handle anthropic rate limits
vadim0x60 Jun 28, 2024
30de595
more retries with claude
vadim0x60 Jul 2, 2024
66402fb
smart max_retries for claude
vadim0x60 Jul 2, 2024
ec35e77
anthropic fix
vadim0x60 Jul 2, 2024
30792da
Replaced failed Anthropic backoff with manually set delays
vadim0x60 Jul 4, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[tool.poetry]
name = "seidr"
version = "3.1.1"
version = "3.2.3"
description = "Synthesize Execute Instruct Debug Rank"
authors = ["Vadim Liventsev <[email protected]>", "Anastasia Grishina <[email protected]>"]
license = "MIT"
Expand All @@ -16,7 +16,7 @@ python = "^3.9"
psb2 = ">=1.1.1"
openai = "<1.0.0"
more-itertools = ">=8.0.0,<9.0.0"
programlib = ">=9.0.2,<10.0.0"
programlib = ">=11.0.0"
wandb = "<1.0.0"
gitpython = ">=3.0.0,<4.0.0"
tenacity = ">=8.0.0,<9.0.0"
Expand Down
50 changes: 49 additions & 1 deletion seidr/eval.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ class Evaluation(ABC):
Produces a binary pass/fail result, a float score, and a text report
"""

def __init__(self, SUT: Program, passing_score: float = 1.):
def __init__(self, SUT, passing_score: float = 1.):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't SUT of type Program?

SUT is an abbreviation and it's nice to have some explanation and typing hints

"""
SUT: System Under Test
passing_score: float score required to pass the evaluation
Expand Down Expand Up @@ -97,3 +97,51 @@ def pen_report(self) -> str:
else:
self.output = "\n".join(self.output) if type(self.output) == list else self.output
return self.output

class Gymnasium(Evaluation):
def __init__(self, env, code, language, passing_score):
agent = Program(code, language=language).spawn()
super().__init__(agent, passing_score)

self.env = env
self.tot_reward = 0
self.tot_txt = ''
self.done = False

def __del__(self):
self.SUT.close()

def play(self):
if self.done:
return

try:
observation, info = self.env.reset()
self.tot_txt += info.get('memos', '')
terminated = False
truncated = False

while not (terminated or truncated):
if 'ascii' in self.env.metadata.get('render.modes', []):
ascii_render = self.env.render(mode='ascii')
self.tot_txt += ascii_render

action, _ = self.SUT.predict(observation, deterministic=True)

observation, reward, terminated, truncated, info = self.env.step(action)
self.tot_reward += reward
self.tot_txt += info.get('memos', '')
except RuntimeError as e:
self.tot_reward = -1000
self.tot_txt = str(e)

self.done = True

def score(self):
self.play()
return self.tot_reward

def pen_report(self):
self.play()
self.tot_txt += f'\nFinal reward: {self.tot_reward}'
return self.tot_txt
2 changes: 1 addition & 1 deletion seidr/github.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ def ensure_repo(remote: str, path: pathlib.Path | str, branch: str = None) -> Re
if branch:
repo.git.checkout(branch)
except GitError as e:
logging.info(f'Git error in ensure repo {e}. \n{traceback.print_stack()}')
logging.info(f'Git error in ensure repo {e}.')
shutil.rmtree(path, ignore_errors=True)
repo = Repo.clone_from(remote, path)

Expand Down
4 changes: 2 additions & 2 deletions seidr/llm.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,9 @@ def extract_codes(
language: Language | str
) -> str:
"""Extract code out of a message and (if Python) format it with black"""

try:
code_blocks = list(extract_from_buffer(StringIO(message_content)))
code_blocks = [code for code in code_blocks if not bool(code)]
except RuntimeError as e:
code_blocks = []

Expand Down Expand Up @@ -90,7 +90,7 @@ def query_llm(
# Assistants are trained to respond with one message.
# it is theoretically possible to get more than one message, but it is very unlikely.
assert all(len(r) == 1 for r in result.generations), "The models are expected to respond with one message"
result = [r[0].message.content for r in result.generations if r[0].message.content]
result = [r[0].message.content for r in result.generations]

if mode == "repair":
logging.info(f"Generating repair candidates for bug summary: \n{kwargs['bug_summary']}\n")
Expand Down