davda54 commited on
Commit
cb16510
·
verified ·
1 Parent(s): 561cd24

Update guidelines.md

Browse files
Files changed (1) hide show
  1. guidelines.md +5 -5
guidelines.md CHANGED
@@ -1,10 +1,10 @@
1
- # Overview
2
 
3
  This document provides guidelines for evaluating the fluency of responses generated by Norwegian language models. Annotators will compare pairs of responses (Response A and Response B) and determine which response demonstrates better fluency, or if they are equally fluent.
4
 
5
  The evaluation focuses exclusively on language quality, naturalness, and grammaticality. Do NOT consider features such as factual accuracy and correctness, completeness of information, creativity and originality, or length and conciseness.
6
 
7
- # Definitions
8
 
9
  #### What is fluency?
10
 
@@ -22,7 +22,7 @@ When evaluating fluency, pay attention to:
22
  6. **Spelling errors**: Typos and misspellings, wrong capitalization, incorrect use of diacritics (e.g. "å" vs "a", "ø" vs "o")
23
  7. **Translationese**: A common problem of language models is that they base their output on English -- the majority language in the language corpus. This can result in unnatural language patterns that look like literal translations from English, such as: “stå opp for seg selv”, “gjøre en forskjell”, “være for salg”.
24
 
25
- # Annotation procedure
26
 
27
  #### Step-by-Step process
28
 
@@ -46,7 +46,7 @@ You must select one of three options:
46
  - **Be consistent**: Apply the same standards across all evaluations
47
  - **When in doubt about equality**: If you cannot decisively determine which is better after careful analysis, select "Equally fluent"
48
 
49
- # Examples
50
 
51
  Here are some examples of texts that should not be considered as fluent Norwegian:
52
  - "Vi kan også prøve å finne måter å gjøre oppgavene dine mer overskuelige og gi deg mer tid til å gjøre dem på." (word choice)
@@ -56,7 +56,7 @@ Here are some examples of texts that should not be considered as fluent Norwegia
56
  - "banal hjertroman" (compound)
57
  - "den første konge" (double definiteness)
58
 
59
- # Edge cases and special considerations
60
 
61
  - **Other language than Norwegian**: If one of the responses is in a different language (e.g. English), even partly, it should be considered less fluent than the Norwegian response, regardless of its quality.
62
 
 
1
+ ## Overview
2
 
3
  This document provides guidelines for evaluating the fluency of responses generated by Norwegian language models. Annotators will compare pairs of responses (Response A and Response B) and determine which response demonstrates better fluency, or if they are equally fluent.
4
 
5
  The evaluation focuses exclusively on language quality, naturalness, and grammaticality. Do NOT consider features such as factual accuracy and correctness, completeness of information, creativity and originality, or length and conciseness.
6
 
7
+ ## Definitions
8
 
9
  #### What is fluency?
10
 
 
22
  6. **Spelling errors**: Typos and misspellings, wrong capitalization, incorrect use of diacritics (e.g. "å" vs "a", "ø" vs "o")
23
  7. **Translationese**: A common problem of language models is that they base their output on English -- the majority language in the language corpus. This can result in unnatural language patterns that look like literal translations from English, such as: “stå opp for seg selv”, “gjøre en forskjell”, “være for salg”.
24
 
25
+ ## Annotation procedure
26
 
27
  #### Step-by-Step process
28
 
 
46
  - **Be consistent**: Apply the same standards across all evaluations
47
  - **When in doubt about equality**: If you cannot decisively determine which is better after careful analysis, select "Equally fluent"
48
 
49
+ ## Examples
50
 
51
  Here are some examples of texts that should not be considered as fluent Norwegian:
52
  - "Vi kan også prøve å finne måter å gjøre oppgavene dine mer overskuelige og gi deg mer tid til å gjøre dem på." (word choice)
 
56
  - "banal hjertroman" (compound)
57
  - "den første konge" (double definiteness)
58
 
59
+ ## Edge cases and special considerations
60
 
61
  - **Other language than Norwegian**: If one of the responses is in a different language (e.g. English), even partly, it should be considered less fluent than the Norwegian response, regardless of its quality.
62