Skip to content

Latest commit

 

History

History
221 lines (125 loc) · 21.5 KB

MLPerf_Results_Messaging_Guidelines.adoc

File metadata and controls

221 lines (125 loc) · 21.5 KB

MLPerf™ Results Messaging Guidelines

The MLPerf™ name and logo are trademarks of MLCommons™ Association (“MLCommons”) in the United States and other countries. The MLPerf name and logo are used to identify the source of MLCommons Association benchmarks. You may not use the MLPerf name or logo, except in a factual and non-trademark manner solely to accurately represent your benchmark score, in accordance with the following usage terms. Nothing in these MLPerf Results Messaging Guidelines (the “Guidelines”) grants you a license to use the MLPerf trademark or logo, or any other right or interest in or to the MLPerf trademark or logo.

1. Use of MLPerf™ Benchmark for Non-MLCommons Reviewed Results

If you used an MLPerf™ benchmark to obtain a result for your product or service but did not submit your product or service for MLCommons review, then your result is unverified, and you must disclose this fact. Submission of MLPerf benchmark results for MLCommons review is restricted to Members of MLCommons (as “Members” is defined in the MLCommons Bylaws) and Test Partners of MLCommons (as “Test Partners” is defined in the Test Partner Agreement); if you are not a Member or Test Partner in good standing, then you may not submit MLPerf benchmark results for MLCommons review.

Since your results have not gone through MLCommons review and, therefore, have not been verified by MLCommons, you must indicate your results are not verified by using the term “unverified” next to each result score and by using the following language when publishing or otherwise discussing your results: “Result not verified by MLCommons Association.” You can include this statement in a footnote, as described in Section 3 below.

2. Use of MLPerf™ Benchmark for MLCommons Reviewed and Verified Results

If you used an MLPerf benchmark to obtain a result for your product or service, you submitted your result for MLCommons review, and your result was verified through such review, you may indicate your results are verified when publishing or otherwise discussing your results, by indicating your results are “verified” or “official” or by otherwise following the examples below for verified results. You may also choose to use this language: “Result verified by MLCommons Association.” You can include this statement in a footnote, as described in Section 3 below.

3. MLPerf™ Benchmark results must clearly identify basic details in the main text, table, or figure

3.1. Required Information

Any disclosure of results must clearly identify the following information in the main text, table, or figure: submitting organization, benchmark name and version number, and system under test.

Example for Non-MLCommons Reviewed Result:

SmartAI Corp achieved an unverified score of 0.6 on the MLPerf™ Training v0.7 Image Classification benchmark using a SmartCluster [1].

Examples for MLCommons Reviewed Result:

SmartAI Corp achieved a score of 0.6 on the MLPerf™ Training v0.7 Image Classification benchmark using a SmartCluster [1].

or

SmartAI Corp achieved a verified score of 0.6 on the MLPerf™ Training v0.7 Image Classification benchmark using a SmartCluster [1].

or

SmartAI Corp achieved an official score of 0.6 on the MLPerf™ Training v0.7 Image Classification benchmark using a SmartCluster [1].

For Closed Division benchmarks, the model name may be used instead of the benchmark name, e.g. “SSD” instead of “Object Detection (light-weight)”.

3.2. Required Footnote

Any use of or reference to an MLPerf benchmark result must also include the following in a footnote: benchmark suite, version, and division; benchmark name and scenario if applicable; date and source of retrieval; MLPerf result ID (major-version.minor-version.entry.benchmark); and trademark ownership legend shown in Section 11.2 below. Further, if your result was not verified through MLCommons review, the footnote must state “Result not verified by MLCommons Association.” If your result was verified through MLCommons review, you may choose to state “Result verified by MLCommons Association.

For all footnotes required by these Guidelines, the size and color of the footnotes must be such that they are clearly visible and legible; and where an image is accompanied by text, the footnotes must appear as text rather than as part of the image.

Example for Non-MLCommons Reviewed Result:

[1] Unverified MLPerf™ v0.5 Inference Closed ResNet-v1.5 offline. Result not verified by MLCommons Association. The MLPerf™ name and logo are trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use strictly prohibited. See www.mlcommons.org for more information.”

Examples for MLCommons Reviewed Result:

[1] MLPerf™ v0.5 Inference Closed ResNet-v1.5 offline. Retrieved from https://mlcommons.org/en/training-normal-05/ 21 December 2018, entry 0.5-12. Result verified by MLCommons Association. The MLPerf™ name and logo are trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use strictly prohibited. See www.mlcommons.org for more information.

or

[1] Verified MLPerf™ score of v0.5 Inference Closed ResNet-v1.5 offline. Retrieved from https://mlcommons.org/en/training-normal-05/ 21 December 2018, entry 0.5-12. The MLPerf™ name and logo are trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use strictly prohibited. See www.mlcommons.org for more information.

or

[1] Official MLPerf™ score of v0.5 Inference Closed ResNet-v1.5 offline. Retrieved from https://mlcommons.org/en/training-normal-05/ 21 December 2018, entry 0.5-12. The MLPerf™ name and logo are trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use strictly prohibited. See www.mlcommons.org for more information.

4. MLPerf results may only be compared against similar MLPerf results

MLPerf results may only be compared against compatible MPerf results. Results are compatible if produced using the same benchmark and scenario from compatible versions of that benchmark. Compatible versions are determined by MLCommons. See the MLPerf Compatibility Table that list compatible versions and non-compatible versions.

MLPerf results may not be compared against non-MLPerf results. For example, an MLPerf Training v0.5 Closed ResNet-v1.5 result may not be compared against an arbitrary result not produced in accordance with MLPerf rules.

5. When comparing MLPerf results, you must identify any submission differences

When comparing results the main text, table, or figure must clearly identify any difference in version, division, category, verified or unverified status, scenario or chip count (count of the compute devices executing the largest number of ops, which could be processors or accelerators). When comparing Open and Closed division results, any ways in which the Open result would not qualify as a Closed result must be identified.

Example for Non-MLCommons Reviewed Result:

SmartAI Corp achieved an unverified score of 0.6 on the MLPerf™ Image Classification benchmark using a SmartCluster with 8 chips in the RDI category of Closed Division which is faster than the unverified result of 7.2 achieved by LessSmartAI Corp with 16 chips in the Available on-premise category of Closed Division.[1]

Required Footnote: “[1]Result not verified by MLCommons Association. MLPerf™ name and logo are trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use strictly prohibited. See www.mlcommons.org for more information.

Example for MLCommons Reviewed Result:

SmartAI Corp achieved a score of 0.6 on the MLPerf™ Image Classification benchmark using a SmartCluster with 8 chips in the RDI category of Closed Division which is faster than the result of 7.2 achieved by LessSmartAI Corp with 16 chips in the Available on-premise category of Closed Division.[1]

Required Footnote: “[1]Result verified by MLCommons Association. MLPerf™ name and logo are trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use strictly prohibited. See www.mlcommons.org for more information.”

Furthermore, a comparison of an unverified result with a verified result must include the following statement in a footnote: “Unverified results have not been through an MLPerf™ review and may use measurement methodologies and/or workload implementations that are inconsistent with the MLPerf™ specification for verified results.

Example (applicable to Non-MLCommons Reviewed Result):

SmartAI Corp announced an unverified score of 0.3 on the MLPerf™ Training Image Classification benchmark using a SmartCluster running MLFramework v4.1.[1].

[1] MLPerf™ v0.5 Training ResNet-v1.5. Result not verified by MLCommons Association. Unverified results have not been through an MLPerf™ review and may use measurement methodologies and/or workload implementations that are inconsistent with the MLPerf™ specification for verified results. The MLPerf™ name and logo are trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use strictly prohibited. See www.mlcommons.org for more information.

6. When comparing MLPerf Results, use official MLPerf power metrics

System power measured using the MLPerf Power methodology is the only MLCommons officially-sanctioned power metric to be used for the purposes of portraying MLPerf results and/or making comparisons. When stating or comparing MLPerf power metics results:

  • Submitters (i.e., those who have submitted results for review and verification by MLCommons) are prohibited from making public comparisons using normalized or derived metrics (e.g., perf/W, inferences/W), which use power metrics other than the MLPerf measured system power, including but not limited to TDP, rated power, PSU power rating, etc.

  • For submissions without a corresponding power measurement, no other proxy power metric should be used.

  • For submissions with a corresponding power measurement, only the official measured system power using the MLPerf Power methodology corresponding to that submission may be used.

Examples for MLCommons Reviewed Result:

AI_OEM1 Corp had a MLPerf™ score of 1000fps and a measured power of 200W. Therefore, AI_OEM1 achieved 5fps/W.[1]

Required Footnote: “[1]Result verified by MLCommons Association. MLPerf™ name and logo are trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use strictly prohibited. See www.mlcommons.org for more information.

or

AI_OEM1 Corp had a MLPerf™ score of 1000fps using Accelerator1-250W while AI_OEM2 Corp had a MLPerf™ perf. score of 900fps using Accelerator1-150W.[1][2]

Required Footnote 1: “[1] Note that this comparison does not derive perf/W from the accelerator TDP, but merely differentiates that the two submitters used the same accelerator in a different configuration and obtained different results.”

Required Footnote 2: “[2]Result verified by MLCommons Association. MLPerf™ name and logo are trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use strictly prohibited. See www.mlcommons.org for more information.”

or

AI_OEM1 Corp had a MLPerf™ score of 1000fps using Accelerator1-250W with a measured system power of 500W. AI_OEM2 Corp had a MLPerf™ score of 900fps using Accelerator1-150W and a measured system power of 400W. Therefore, AI_OEM1 Corp achieved 2fps/W while AI_OEM2 achieved 2.25fps/W using the same accelerator.[1]

Required Footnote: “[1]Result verified by MLCommons Association. MLPerf™ name and logo are trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use strictly prohibited. See www.mlcommons.org for more information.

(In this case, the submitters are allowed to specify the accelerator TDP configuration since it is part of the SKU name, so the first sentence is valid. However in the second sentence, when they measure power and derive their perf/W score, they used the MLPerf measured power. Therefore the second sentence is also valid and the statement does not violate this rule.)

7. Timing for Results Disclosures

Submitters (i.e., those who have submitted results for review and verification by MLCommons) are not allowed to publish any results for a given benchmark version before its official publication date.

Non-submitters (i.e., those who have not submitted results for review and verification by MLCommons) are not allowed to publish any results until two weeks after the official publication date for that benchmark version.

8. MLCommons allows but does not endorse combining results of benchmarks

Users may see fit to combine or aggregate results from multiple MLPerf benchmark tests. If publicly disclosed, these composite results must cite the MLPerf benchmark score as required above and clearly describe the method of combination. However, the composite result is not sanctioned by MLCommons and may not be represented as an official or verified MLPerf result or score. You must follow the rules for citing an unverified score, set forth above.

9. Comparisons based on secondary or derived metrics must be explicit

Each MLPerf benchmark has a primary metric, for instance time-to-train for Training Image Classification, or queries/sec for the Server scenario of Inference Image Classification (Datacenter system type). Any comparison based on different or derived metric such as power rating, cost, model size/architecture, accuracy, etc. must make the basis for comparison clear in the text and in a footnote. Secondary and derived metrics must not be presented as official or verified MLPerf metrics. You must follow the rules above for citing an unverified score.

Example:

SmartAI Corp has created a new neural network model called MagicEightBall that is 100% accurate for Top-1 image classification on the MLPerf™ v0.5 Training Open Division Image Classification benchmark using a cluster of 10 SmartChips running MLFramework v4.1 [1]. MagicEightBall achieved an unverified score of 20 minutes.

[1] Accuracy is not the primary metric of MLPerf™ Training. Result not verified by MLCommons Association. The MLPerf™ name and logo are trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use strictly prohibited. See www.mlcommons.org for more information.

10. Statements Must be Clear and Correct; No Misrepresentation as to Meaning.

All statements you make regarding your MLPerf benchmark results must be clear and correct. Claimed results must be compliant with that MLPerf benchmark’s rules, as specified in the given benchmark or where it is accessed.

If your results have not been verified through MLCommons review, you must indicate this using the language required above in all of your uses of the benchmark name and score. You may not imply your use is verified or official and, therefore, you must clearly disclose it is unverified.

Do not use the MLPerf name in any manner that is likely to suggest or imply MLCommons’ endorsement of a specific company and/or its products or services. You may not use the MLperf name in any way that could cause confusion as to source or as to ownership of the mark or in any way that could damage the goodwill in the mark.

11. Additional Requirements and Restrictions

11.1. Notice Symbol

You must include the ™ next to all uses of the MLPerf name. Do not use the ® symbol next to the MLPerf name; falsely indicating a mark is registered is a violation.

11.2. Attribution

You must attribute ownership of the MLPerf mark to MLCommons in all uses of the MLPerf name and you must list the MLCommons website for additional information about the benchmark, as follows: “The MLPerf™ name and logo are trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use strictly prohibited. See www.mlcommons.org for more information.”

11.3. Do not Alter

Do not alter or separate the MLPerf name, vary the spelling, add hyphens, make one word two words or more, use a similar mark, use a phonetic equivalent, use abbreviations, translate the mark, or otherwise alter or modify the mark in any way.

11.4. Use with Full Benchmark Name

You must follow the MLPerf name with the proper complete benchmark name, e.g., MLPerf™ Training v0.5 Open Image Classification benchmark. Never use the MLPerf name as a verb or noun, or in the possessive or plural forms.

11.5. No Use in Company or Product Names

Do not use the MLPerf name (or any variation thereof or confusingly similar name) in any company name, product name, service name, logo, model number, part number, service name, or domain name.

11.6. Do Not Use as a Trademark

The MLPerf mark is owned by MLCommons, and only MLCommons and its authorized licensees (who have a written license agreement with MLCommons) may use the MLPerf trademark in a trademark manner. If you desire to use the MLPerf name in any manner other than to make a factual statement about your MLPerf benchmark results in compliance with these Guidelines, you must contact [email protected] about your request, which MLCommons will consider in its discretion.

12. Violation Determination, Remedies, and Penalties

Any MLCommons Member, Test Partner, or third party may report a violation of these Guidelines via email to the MLCommons Executive Director (“ED”) & Working Group (“WG”) chairs of the appropriate benchmark. Upon confirming the violation in their discretion, ED & WG chairs would inform the potential violator and request remedial action. If the ED, WG chairs, and potential violator are unable to reach a mutually satisfactory conclusion, the issue can be raised in WG to seek resolution via WG vote.

A non-exhaustive list of possible remedial actions or penalties based on the degree of violation is noted below. Taking or not taking any or all actions on this list or otherwise does not constitute a waiver of any enforcement rights or other rights in the MLPerf benchmarks, software, and/or trademark.

  1. Requesting corrections to published materials in the form of marketing blog posts, journals, papers, and other media.

  2. If the violation was at a public event such as a conference, the WG may direct the violator to issue a public statement to correct claims in ways that conform to these Guidelines.

  3. The WG may issue a public statement citing the violation.

  4. The WG may prohibit the violator from submitting MLPerf benchmark results for MLCommons review in the future.

  5. Continued failure to conform to these Guidelines by a violator may lead to marking the results as non-compliant in the results database permanently.

13. DISCLAIMER; LIMITATION OF LIABILITY

THE BENCHMARKS, SOFTWARE, AND MLPERF AND MLCOMMONS TRADEMARKS ARE PROVIDED “AS IS” AND “AS AVAILABLE” WITHOUT ANY REPRESENTATION, WARRANTY, OR GUARANTEE OF ANY KIND, WHETHER EXPRESS, IMPLIED, STATUTORY, OR OTHERWISE. YOU USE THE BENCHMARKS, SOFTWARE, AND TRADEMARKS AT YOUR SOLE RISK. NEITHER MLCOMMONS NOR ITS PARENT, SUBSIDIARIES, OR AFFILIATES (ALL REFERRED TO COLLECTIVELY HERE AS “MLCOMMONS”) ACCEPT ANY RESPONSIBILITY OR LIABILITY FOR ANY DIFFICULTIES YOU MAY ENCOUNTER WITH THE BENCHMARKS, SOFTWARE, AND/OR TRADEMARKS. MLCOMMONS DOES NOT WARRANT THAT THE FUNCTIONS CONTAINED IN THE BENCHMARKS OR SOFTWARE OR OTHER PRODUCTS WILL MEET YOUR REQUIREMENTS, OR THAT THE OPERATION OF THE BENCHMARKS OR SOFTWARE OR OTHER PRODUCTS WILL BE UNINTERRUPTED OR ERROR-FREE, OR THAT DEFECTS IN THE BENCHMARKS OR SOFTWARE OR OTHER PRODUCTS WILL BE CORRECTED. NO ORAL OR WRITTEN INFORMATION, BENCHMARKS, BENCHMARK RESULTS, OR ADVICE GIVEN BY MLCOMMONS (COLLECTIVELY, THE “MLCOMMONS CONTENT”) SHALL CREATE ANY WARRANTY, REPRESENTATION, OR GUARANTEE.

MLCOMMONS IS NOT RESPONSIBLE FOR ANY PRODUCTS YOU MAY CHOOSE TO PURCHASE OR CHOOSE NOT TO PURCHASE AS A RESULT OF THE MLCOMMONS CONTENT. MLCOMMONS DISCLAIMS ANY AND ALL RESPONSIBILITY, LIABILITY, LOSS, AND/OR DAMAGE RELATED TO OR ARISING FROM YOUR USE OF THE MLCOMMONS.ORG WEBSITE (HEREAFTER, THE “WEBSITE”); THE MLCOMMONS CONTENT; THE BENCHMARKS, SOFTWARE, OR INFORMATION OBTAINED FROM THE WEBSITE; OR THESE GUIDELINES. MLCommons is not liable for, among other things, any loss of data, hardware or software, loss of use, or any liability resulting from: access delays; access interruptions; viruses; hackers; crackers; data non-delivery or mis-delivery; negligent acts, grossly negligent acts, or omissions by MLCommons; errors in any information, goods, or documents obtained due to the MLCommons Content; or any other direct, indirect, consequential, incidental, special, punitive, or enhanced damages whatsoever resulting, arising out of or in connection with the use or performance of the Website or any information obtained thereon, the MLCommons Content, the benchmarks, or the software.

MLCommons makes no representations whatsoever about any other website that you may access through the Website. MLCommons has no control over the content or claims of websites outside the MLCommons control, and does not endorse or accept any responsibility for the content of such websites.

If you have any questions regarding these Guidelines, please contact MLCommons at [email protected].