Spaces:
Sleeping
Sleeping
Update guidelines.md
Browse files- guidelines.md +5 -5
guidelines.md
CHANGED
|
@@ -1,10 +1,10 @@
|
|
| 1 |
-
|
| 2 |
|
| 3 |
This document provides guidelines for evaluating the fluency of responses generated by Norwegian language models. Annotators will compare pairs of responses (Response A and Response B) and determine which response demonstrates better fluency, or if they are equally fluent.
|
| 4 |
|
| 5 |
The evaluation focuses exclusively on language quality, naturalness, and grammaticality. Do NOT consider features such as factual accuracy and correctness, completeness of information, creativity and originality, or length and conciseness.
|
| 6 |
|
| 7 |
-
|
| 8 |
|
| 9 |
#### What is fluency?
|
| 10 |
|
|
@@ -22,7 +22,7 @@ When evaluating fluency, pay attention to:
|
|
| 22 |
6. **Spelling errors**: Typos and misspellings, wrong capitalization, incorrect use of diacritics (e.g. "å" vs "a", "ø" vs "o")
|
| 23 |
7. **Translationese**: A common problem of language models is that they base their output on English -- the majority language in the language corpus. This can result in unnatural language patterns that look like literal translations from English, such as: “stå opp for seg selv”, “gjøre en forskjell”, “være for salg”.
|
| 24 |
|
| 25 |
-
|
| 26 |
|
| 27 |
#### Step-by-Step process
|
| 28 |
|
|
@@ -46,7 +46,7 @@ You must select one of three options:
|
|
| 46 |
- **Be consistent**: Apply the same standards across all evaluations
|
| 47 |
- **When in doubt about equality**: If you cannot decisively determine which is better after careful analysis, select "Equally fluent"
|
| 48 |
|
| 49 |
-
|
| 50 |
|
| 51 |
Here are some examples of texts that should not be considered as fluent Norwegian:
|
| 52 |
- "Vi kan også prøve å finne måter å gjøre oppgavene dine mer overskuelige og gi deg mer tid til å gjøre dem på." (word choice)
|
|
@@ -56,7 +56,7 @@ Here are some examples of texts that should not be considered as fluent Norwegia
|
|
| 56 |
- "banal hjertroman" (compound)
|
| 57 |
- "den første konge" (double definiteness)
|
| 58 |
|
| 59 |
-
|
| 60 |
|
| 61 |
- **Other language than Norwegian**: If one of the responses is in a different language (e.g. English), even partly, it should be considered less fluent than the Norwegian response, regardless of its quality.
|
| 62 |
|
|
|
|
| 1 |
+
## Overview
|
| 2 |
|
| 3 |
This document provides guidelines for evaluating the fluency of responses generated by Norwegian language models. Annotators will compare pairs of responses (Response A and Response B) and determine which response demonstrates better fluency, or if they are equally fluent.
|
| 4 |
|
| 5 |
The evaluation focuses exclusively on language quality, naturalness, and grammaticality. Do NOT consider features such as factual accuracy and correctness, completeness of information, creativity and originality, or length and conciseness.
|
| 6 |
|
| 7 |
+
## Definitions
|
| 8 |
|
| 9 |
#### What is fluency?
|
| 10 |
|
|
|
|
| 22 |
6. **Spelling errors**: Typos and misspellings, wrong capitalization, incorrect use of diacritics (e.g. "å" vs "a", "ø" vs "o")
|
| 23 |
7. **Translationese**: A common problem of language models is that they base their output on English -- the majority language in the language corpus. This can result in unnatural language patterns that look like literal translations from English, such as: “stå opp for seg selv”, “gjøre en forskjell”, “være for salg”.
|
| 24 |
|
| 25 |
+
## Annotation procedure
|
| 26 |
|
| 27 |
#### Step-by-Step process
|
| 28 |
|
|
|
|
| 46 |
- **Be consistent**: Apply the same standards across all evaluations
|
| 47 |
- **When in doubt about equality**: If you cannot decisively determine which is better after careful analysis, select "Equally fluent"
|
| 48 |
|
| 49 |
+
## Examples
|
| 50 |
|
| 51 |
Here are some examples of texts that should not be considered as fluent Norwegian:
|
| 52 |
- "Vi kan også prøve å finne måter å gjøre oppgavene dine mer overskuelige og gi deg mer tid til å gjøre dem på." (word choice)
|
|
|
|
| 56 |
- "banal hjertroman" (compound)
|
| 57 |
- "den første konge" (double definiteness)
|
| 58 |
|
| 59 |
+
## Edge cases and special considerations
|
| 60 |
|
| 61 |
- **Other language than Norwegian**: If one of the responses is in a different language (e.g. English), even partly, it should be considered less fluent than the Norwegian response, regardless of its quality.
|
| 62 |
|