Skip to content

Commit

Permalink
updating website for dev phase release
Browse files Browse the repository at this point in the history
  • Loading branch information
Mantas Mazeika authored and Mantas Mazeika committed Jul 26, 2023
1 parent 4836786 commit babf758
Show file tree
Hide file tree
Showing 5 changed files with 25 additions and 24 deletions.
4 changes: 2 additions & 2 deletions faq.html
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,8 @@
<li><b>Are participants required to share the details of their method?</b> We encourage all participants to share their methods and code, either with the organizers or publicly. To be eligible for prizes, winning teams are required to share their methods, code, and models with the organizers.</li>
<li><b>What are the details for the Trojan Detection Track?</b> <a href="tracks.html#trojan-detection" style="text-decoration: underline;">Here</a>.</li>
<li><b>What are the details for the Red Teaming Track?</b> <a href="tracks.html#red-teaming" style="text-decoration: underline;">Here</a>.</li>
<li><b>Why are you using the baselines you have chosen?</b> Our baselines (PEZ, GBDA, Zero-Shot) are well-known text optimization and red teaming from the academic literature, which can be used for our trojan detection and red teaming tasks.</li>
<li><b>Why are you using the LLMs you have chosen?</b> We use models from the Pythia suite of LLMs, which are open-source. This enables broader participation compared to models that are not fully open-source. We also use different-sized models in the Base Model and Large Model subtracks, ranging from ~1B to ~10B parameters. This allows groups with a range of compute resources to participate.</li>
<li><b>Why are you using the baselines you have chosen?</b> Our baselines (PEZ, GBDA, UAT, Zero-Shot) are well-known text optimization and red teaming from the academic literature, which can be used for our trojan detection and red teaming tasks.</li>
<li><b>Why are you using the LLMs you have chosen?</b> For the Trojan Detection Track, we use models from the Pythia suite of LLMs, which are open-source. This enables broader participation compared to models that are not fully open-source. We also use different-sized models in the Base Model and Large Model subtracks, ranging from ~1B to ~10B parameters. This allows groups with a range of compute resources to participate. For the Red Teaming Track, we use Llama-2-chat models. These models are also open-source, and in testing we found them to be very robust to the baseline red teaming methods.</li>
<li><b>Why are you using the particular trojan attack you have chosen?</b> We use the simplest possible trojan attack on LLMs, where using the trigger as a prompt on its own causes the LLM to generate the target string. Existing trojan attacks for text models often consider triggers that modify clean inputs in various ways. We chose this simpler setting due to its strong resemblance to the red teaming task we consider, as part of the goal of this competition is to foster connections between the trojan detection and red teaming communities.</li>
<li><b>Is it "trojans" or "Trojans"?</b> Both are used in the academic literature. In the 2022 competition, we used "Trojans". However, this can make sentences a bit messy if one is using the word often, so we are using "trojans" for this competition.</li>
<li><b>What is the competition workshop?</b> Each NeurIPS 2023 competition has several hours allotted for a workshop specific to the competition. We will use this time to announce the winning teams for each track and describe the winning methods, takeaways, etc. More information will be announced about the competition workshop later in the competition.</li>
Expand Down
Binary file added img/red_team_combined_score.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
7 changes: 3 additions & 4 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -9,16 +9,15 @@

<p><b>Prizes:</b> There is a <u>$30,000 prize pool.</u> The first-place teams will also be invited to co-author a publication summarizing the competition results and will be invited to give a short talk at the competition workshop at NeurIPS 2023 (registration provided). Our current planned procedures for distributing the pool are <a href="prizes.html">here</a>.</p>

<!-- <p>For the TDC 2022 website, see <a href="https://2022.trojandetection.ai">here</a>.</p> -->
<!-- TODO: Make the background color correct and update the website to fix bugs and improve consistency -->

<h4>News</h4>
<ul>
<li><b>July 25:</b> The development phase has started. See <a href="prizes.html">here</a> for updates and more details.</li>
<li><b>July 24:</b> The start of the development phase has been postponed to 7/25.</li>
<li><b>July 20:</b> To allow time for final preparations, the start of the development phase has been postponed to 7/24.</li>
<li><b>July 17:</b> Registration has opened on CodaLab.</li>
</ul>

<p>For the TDC 2022 website, see <a href="https://2022.trojandetection.ai">here</a>.</p>

<!-- <details>
<summary><b>What are neural trojans?</b></summary>
Expand Down Expand Up @@ -71,7 +70,7 @@ <h2 id="rules">Rules</h2>
<li><b>Registration:</b> Double registration is not allowed. We expect teams to self-certify that all team members are not part of a different team registered for the competition, and we will actively monitor for violation of this rule. Teams may participate in multiple tracks. Organizers are not allowed to participate in the competition or win prizes.</li>
<li><b>Prize Distribution:</b> Monetary prizes will be awarded to teams as specified in the <a href="https://trojandetection.ai/prizes">Prizes page</a> of the competition website.</li>
<li><b>Compute Limits:</b> In both tracks, methods developed for the Base Model subtrack must run in at most 2 A100 GPU-days to generate a submission for the test phase. For Large Model subtracks, methods must run in at most 4 A100 GPU-days to generate a submission for the test phase. These limits do not include initial setup during the development phase. This is a soft requirement; we will check it for the winning teams, but understandable reasons for exceeding the limit by a modest amount are acceptable.</li>
<li><b>Allowed Methods (Trojan Detection Track):</b> The use of features that are clearly loopholes is not allowed (e.g., metadata). We may not anticipate all loopholes and we encourage participants to alert us to their existence. Legitimate features that do not constitute loopholes include all features derived from the trained parameters of networks, the target strings, training triggers, and text datasets. Similar to the Red Teaming Track, we also do not allow the submission of prompts that are intended to make LLMs copy-paste target strings from the prompt into the generation.</li>
<li><b>Allowed Methods (Trojan Detection Track):</b> The use of features that are clearly loopholes is not allowed (e.g., metadata). We may not anticipate all loopholes and we encourage participants to alert us to their existence. Legitimate features that do not constitute loopholes include all features derived from the trained parameters of networks, the target strings, training triggers, and text datasets. Similar to the Red Teaming Track, we also do not allow the submission of prompts that effectively make LLMs copy-paste target strings from the prompt into the generation.</li>
<li><b>Allowed Methods (Red Teaming Track):</b> Methods used for generating submitted test cases must satisfy the following requirements. (1) They must be able to generate any desired number of test cases, not just a fixed number of test cases. (2) They must be fully automated. After an initial setup, they should require no human effort to adapt to a new set of behaviors and no human-in-the-loop for generating test cases. (3) They must cause LLMs to generate behaviors mostly from scratch. In particular, getting LLMs to copy-paste whole instances of undesirable behavior provided in the prompt does not count as generating undesirable behavior from scratch. (4) They must not intentionally boost the diversity metric in an artificial manner, e.g., by adding random text to the beginning of a prompt.</li>
<li><b>Rule breaking</b> may result in disqualification, and significant rule breaking will result in an ineligibility for prizes.</li>
</ol>
Expand Down
4 changes: 2 additions & 2 deletions start.html
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,10 @@
---

<h2>Starter Kit</h2>
<p>Please see the <a href="https://www.example.com">GitHub repository [upcoming]</a> for the competition starter kit, including code for loading the datasets, training baseline detectors and evasive Trojans, and creating a submission.</p>
<p>Please see the <a href="https://github.com/centerforaisafety/tdc2023-starter-kit">GitHub repository [upcoming]</a> for the competition starter kit, including code for loading the datasets, running baselines, and creating a submission.</p>

<h2>Accessing the Data</h2>
<p>The models for each track can be accessed <a href="https://www.example.com">here [upcoming]</a> or through the download script in the starter kit.</p>
<p>The models and data can be accessed through the download scripts in the starter kit.</p>

<h2 id="submissions">Submissions</h2>
<p>We manage submissions and leaderboards through four linked CodaLab competition pages, one for each subtrack. All participants are required to register and agree to the rules before submitting.</p>
Expand Down
Loading

0 comments on commit babf758

Please sign in to comment.