[Task Submission] Natural Language Codesearch Classification (`nl_codesearch_clf` ) #16

drndr · 2023-08-01T15:01:03Z

[Natural Language Codesearch Classification]

The task consists of 8 subtasks and measures cross-lingual and domain generalization, and robustness to covariate shift
Includes a binary classification evaluation:
Given a natural language query, determine if a given code snippet is matches the natural language description or not.

Authors

Andor Diera [email protected]
Abdelhalim Dahou [email protected]
Lukas Galke [email protected]
Fabian Karl [email protected]
Florian Sihler [email protected]
Ansgar Scherp [email protected]

Implementation

For the binary classification setup the config files were used with task_type "multiple_choice"
The task.py script includes a custom get_dataset_raw method, where the negativ samples are created.

Usage

For binary classification the default "multiple_choice" usage.

Checklist:

I and my co-authors agree that, if this PR is merged, the code will be available under the same license as the genbench_cbt repository.
Prior to submitting, I have ran the GenBench CBT test suite using the genbench-cli test-task tool.
I have read the description of what should be in the doc.md of my task, and have added the required arguments.
I have submitted or will submit an accompanying paper to the GenBench workshop.

drndr · 2023-08-01T15:34:14Z

test_task failed due to non existent id
our submission includes two tasks nl_codesearch_clf and nl_codesearch_mrr

vernadankers · 2023-08-02T09:32:52Z

Hi, that sounds like something that can be fixed on our side! Could you open an issue for that, please? :-)

drndr · 2023-08-02T10:28:43Z

Thanks! i was also considering submitting two different branches, but if it can be fixed on your side even better :)

kazemnejad · 2023-08-02T11:40:51Z

Hi, the problem was that the automated CI was trying to read the task_id (which can be the parent task, but, your initial title contained nl_codesearch. nl_codesearch didn't exist as a task_id. I changed the title to contain nl_codesearch_clf and it passed all the test cases.

You are submitting two parent tasks nl_codesearch_clf and nl_codesearch_mrr. Currently, we only support one per task-submission. Please, open another PR for the second task.

drndr · 2023-08-02T12:35:18Z

Thanks! I removed the mrr part from this clf PR, and created a seperate one for mrr

vernadankers · 2023-09-01T10:32:15Z

Hello!

We are getting quite close to the deadline (September 1, 11:59PM anywhere on earth), so if your PR needs any final changes, please make them now,
and don't forget to submit your accompanying paper to Openreview via https://openreview.net/group?id=GenBench.org/2023/Workshop by September 1.

Good luck finalising your PR and paper, feel free to tag us if you have questions.
Cheers, Verna
On behalf of the GenBench team

drndr · 2023-09-04T17:16:28Z

Hi,

Sorry there were some style issues introduced in the last commit at submission and forgot the rerun the make commands.
Fixed these issues in the latest commit.

kazemnejad · 2023-11-01T16:05:46Z

@drndr We're in the process of merging the tasks into the repo.

Could you please include a single file usage_example.py of each task where you use each task. It seems your tasks supports both finetuning/in-context-learning. It'd be nice to have examples of both usages. (Preferably a pretrained huggingface model).

Please also include requirements-usage-example.txt for the python dependencies needed to be installed for running the example.

kazemnejad · 2023-11-16T15:43:35Z

src/genbench/tasks/nl_codesearch_clf/usage_example.py

+
+    # TRAIN_FILE = "./codesearchnet_adv/train_adv_clf.jsonl"
+
+    TRAIN_FILE = NlCodesearchClfCodesearchnetAdv(


This looks great. Just a minor comment. Could you replace the task loading using the official method? for here and the rest of this file?

from genbench import load_task task = load_task("nl_codesearch_clf:codesearchnet_adv")

kazemnejad · 2023-12-31T19:05:40Z

Manually merged in
https://github.com/GenBench/genbench_cbt/pull/41

drndr added 17 commits July 17, 2023 13:59

Add NL Codesearch Classification

3f9a010

update adv webquery clf configs

ac4fc0c

Merge branch 'GenBench:main' into nl_codesearch

5cd31ba

update clf configs

5b13c7a

fix clf configs

fd82cb4

add eval card and doc.md

fe51b6c

Update doc.md

7c8e79e

Update doc.md

e00b4df

Update doc.md

0d72e37

Update doc.md

c222b66

add new clf prompt and mrr task

6a07d78

update configs

5047d62

update main mrr config

c4ec13c

fix codesearchnet cfg json2jsonl

83249a6

Merge branch 'GenBench:main' into nl_codesearch

122dce4

add mrr task scripts

f43086f

Update doc.md

bfb9962

drndr changed the title ~~[Task Submission] Natural Language Codesearch Classification (nl_codesearch_clf)~~ [Task Submission] Natural Language Codesearch (nl_codesearch) Aug 1, 2023

vernadankers added the task-submission label Aug 1, 2023

drndr changed the title ~~[Task Submission] Natural Language Codesearch (nl_codesearch)~~ [Task Submission] Natural Language Codesearch (nl_codesearch_clf) Aug 1, 2023

Merge branch 'main' into nl_codesearch

aedd50a

fix indentation and docmd

7958b66

drndr changed the title ~~[Task Submission] Natural Language Codesearch (nl_codesearch_clf)~~ [Task Submission] Natural Language Codesearch (nl_codesearch) Aug 2, 2023

kazemnejad added task-submission and removed task-submission labels Aug 2, 2023

kazemnejad changed the title ~~[Task Submission] Natural Language Codesearch (nl_codesearch)~~ [Task Submission] Natural Language Codesearch (nl_codesearch_clf ) Aug 2, 2023

kazemnejad removed the task-submission label Aug 2, 2023

kazemnejad added the task-submission label Aug 2, 2023

Merge branch 'GenBench:main' into nl_codesearch

049020a

drndr changed the title ~~[Task Submission] Natural Language Codesearch (nl_codesearch_clf )~~ [Task Submission] Natural Language Codesearch Classification (nl_codesearch_clf ) Aug 2, 2023

remove mrr from clf branch

129c7c8

kazemnejad added task-submission and removed task-submission labels Aug 23, 2023

update dataset links and cfgs

4827834

kazemnejad added task-submission and removed task-submission labels Sep 4, 2023

fix style issues

e1aefc5

vernadankers added task-submission and removed task-submission labels Sep 4, 2023

add usage example

de44168

kazemnejad reviewed Nov 16, 2023

View reviewed changes

kazemnejad added task-submission and removed task-submission labels Nov 16, 2023

update task loading

f7bd4d4

kazemnejad added task-submission ready-to-be-merged and removed task-submission labels Nov 29, 2023

kazemnejad closed this Dec 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Task Submission] Natural Language Codesearch Classification (`nl_codesearch_clf` ) #16

[Task Submission] Natural Language Codesearch Classification (`nl_codesearch_clf` ) #16

drndr commented Aug 1, 2023 •

edited

Loading

drndr commented Aug 1, 2023

vernadankers commented Aug 2, 2023

drndr commented Aug 2, 2023

kazemnejad commented Aug 2, 2023 •

edited

Loading

drndr commented Aug 2, 2023

vernadankers commented Sep 1, 2023

drndr commented Sep 4, 2023

kazemnejad commented Nov 1, 2023

kazemnejad Nov 16, 2023

kazemnejad commented Dec 31, 2023


		# TRAIN_FILE = "./codesearchnet_adv/train_adv_clf.jsonl"

		TRAIN_FILE = NlCodesearchClfCodesearchnetAdv(

[Task Submission] Natural Language Codesearch Classification (nl_codesearch_clf ) #16

[Task Submission] Natural Language Codesearch Classification (nl_codesearch_clf ) #16

Conversation

drndr commented Aug 1, 2023 • edited Loading

[Natural Language Codesearch Classification]

Authors

Implementation

Usage

Checklist:

drndr commented Aug 1, 2023

vernadankers commented Aug 2, 2023

drndr commented Aug 2, 2023

kazemnejad commented Aug 2, 2023 • edited Loading

drndr commented Aug 2, 2023

vernadankers commented Sep 1, 2023

drndr commented Sep 4, 2023

kazemnejad commented Nov 1, 2023

kazemnejad Nov 16, 2023

Choose a reason for hiding this comment

kazemnejad commented Dec 31, 2023

[Task Submission] Natural Language Codesearch Classification (`nl_codesearch_clf` ) #16

[Task Submission] Natural Language Codesearch Classification (`nl_codesearch_clf` ) #16

drndr commented Aug 1, 2023 •

edited

Loading

kazemnejad commented Aug 2, 2023 •

edited

Loading