davda54 commited on
Commit
c5e760f
·
verified ·
1 Parent(s): a241619

Create guidelines.md

Browse files
Files changed (1) hide show
  1. guidelines.md +89 -0
guidelines.md ADDED
@@ -0,0 +1,89 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## Overview
2
+
3
+ This document provides guidelines for evaluating the fluency of Norwegian responses generated by language models. Annotators will compare pairs of responses (Response A and Response B) and determine which response demonstrates better fluency, or if they are equally fluent. The evaluation focuses exclusively on language quality, naturalness, and grammaticality.
4
+
5
+ ## Key principle
6
+
7
+ **Fluency evaluation is strictly limited to linguistic quality.** Do NOT consider:
8
+ - Factual accuracy or correctness
9
+ - Completeness of information
10
+ - Creativity or originality
11
+ - Formatting or structure (unless it affects readability)
12
+ - Length or conciseness
13
+
14
+ ## Definitions
15
+
16
+ ### What is fluency?
17
+
18
+ Fluency refers to the linguistic quality of text that makes it natural, smooth, and easy to read. A fluent response:
19
+
20
+ - **Grammatically correct**: Follows standard grammar rules with proper syntax
21
+ - **Natural-sounding**: Reads like something a native speaker would write
22
+ - **Coherent**: Maintains logical flow between sentences and paragraphs
23
+ - **Well-formed**: Uses appropriate vocabulary, punctuation, and sentence structure
24
+ - **Smooth**: Flows naturally without awkward phrasing or jarring transitions
25
+ - **Norwegian**: The models respond to Norwegian prompts and so they should always be either in Norwegian Bokmål or Norwegian Nynorsk
26
+
27
+ ### Fluency issues to look for
28
+
29
+ When evaluating fluency, pay attention to:
30
+
31
+ 1. **Grammar errors**: agreement errors (e.g. adjective-noun or determiner-noun disagreement), incorrect verb tense, incorrect word order (violating V2 requirement), wrong word forms
32
+ 2. **Awkward phrasing**: Unnatural word order, stilted expressions, robotic language
33
+ 3. **Punctuation problems**: Missing or incorrect punctuation that affects readability
34
+ 4. **Word choice issues**: Inappropriate vocabulary, incorrect word usage, repetitive language
35
+ 5. **Flow disruptions**: Abrupt transitions, disconnected ideas within sentences
36
+ 6. **Spelling errors**: Typos and misspellings that affect readability
37
+ 7. **Translationese**: A common problem of language models is that they base their output on English -- the majority language in the language corpus. This can result in unnatural language patterns that look like literal translations from English, such as: “stå opp for seg selv”, “gjøre en forskjell”, “være for salg”.
38
+
39
+ ## Annotation procedure
40
+
41
+ ### Step-by-Step process
42
+
43
+ 1. **Read both responses completely** without making immediate judgments
44
+ 2. **Focus solely on language quality** - ignore content accuracy and relevance
45
+ 3. **Identify fluency issues** in each response using the criteria above
46
+ 4. **Compare the severity and frequency** of fluency issues between responses
47
+ 5. **Make your decision** based on overall fluency
48
+
49
+ ### Decision options
50
+
51
+ You must select one of three options:
52
+
53
+ - **A is more fluent**: Response A has better overall language quality than Response B
54
+ - **B is more fluent**: Response B has better overall language quality than Response A
55
+ - **Equal fluency**: Both responses have similar language quality (minor differences that don't clearly favor either response)
56
+
57
+ ### Important guidelines
58
+
59
+ - **Minor differences matter**: Even small improvements in fluency should influence your decision
60
+ - **Be consistent**: Apply the same standards across all evaluations
61
+ - **When in doubt about equality**: If you cannot decisively determine which is better after careful analysis, select "Equal fluency"
62
+
63
+ ## Examples
64
+
65
+ ### Example 1: Clear fluency difference
66
+
67
+ TODO
68
+
69
+ ### Example 2: Equal fluency
70
+
71
+ TODO
72
+
73
+ ### Example 3: Subtle fluency difference
74
+
75
+ TODO
76
+
77
+ ### Example 4: Content vs. fluency
78
+
79
+ TODO
80
+
81
+ ## Edge cases and special considerations
82
+
83
+ TODO
84
+
85
+ **Technical or specialized language**: Technical terminology and domain-specific language should be considered fluent if used correctly and consistently, even if it might seem less natural to a general audience.
86
+
87
+ **Formatting issues**: Ignore formatting differences (bold, italics, bullet points) unless they directly impact readability or sentence structure.
88
+
89
+ **Code or mathematical expressions**: If responses contain code snippets or mathematical expressions, evaluate only the fluency of the natural language portions.