Instructions to use davanstrien/query-to-dataset-viewer-descriptions with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use davanstrien/query-to-dataset-viewer-descriptions with sentence-transformers:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("davanstrien/query-to-dataset-viewer-descriptions", trust_remote_code=True)

sentences = [
    "USER_QUERY: mental health conversation dataset",
    "NEGATIVE: DATASET_NAME: \"llmModeluser/Therapy_sessions_dataset\"\nFEATURES: {'conversations': {'dtype': 'string', '_type': 'Value'}, 'id': {'dtype': 'string', '_type': 'Value'}}\nDATA SAMPLE:\n[\n  {\n    \"row_idx\": 0,\n    \"row\": {\n      \"conversations\": \"[{'from': 'human', 'value': \\\"I've been feeling so sad and overwhelmed lately. Work has become such a massive source of stress for me.\\\"}\\n {'from': 'gpt', 'value': \\\"Hey there, I'm here to listen and support you. It sounds like work has been really challenging lately. Can you tell me more about what's been going on?\\\"}\\n {'from': 'human', 'value': \\\"I recently got a promotion at work, which I thought would be exciting. But the added responsibilities and pressure have just taken a toll on my mental health. It's been a really moving experience for me.\\\"}\\n {'from': 'gpt', 'value': \\\"I can understand how it can be overwhelming when we're faced with higher expectations. It's okay to acknowledge your emotions and allow yourself to feel sad in this situation. It's an important part of the healing process. What specific challenges have you been facing at work?\\\"}\\n {'from': 'human', 'value': \\\"Well, the workload has increased significantly, and I find it hard to maintain a work-life balance. I've been staying late at the office, and it feels like I'm constantly under a pile of never-ending tasks. It's just so hard to keep up, and it's impacting my overall well-being.\\\"}\\n {'from': 'gpt', 'value': \\\"It sounds like you're dealing with a lot of pressure to perform and succeed. Remember, it's crucial to take care of yourself, both mentally and physically. A healthy work-life balance is essential. Have you tried any strategies to cope with the added workload?\\\"}\\n {'from': 'human', 'value': \\\"I've been trying to prioritize my tasks and delegate whenever possible. I've also started practicing meditation during my breaks to help manage stress. But sometimes, it feels like no matter what I do, I can't catch a break. It's been a constant struggle.\\\"}\\n {'from': 'gpt', 'value': \\\"It's great to hear that you're already implementing some helpful strategies. Remember, progress takes time, and it's okay to have setbacks. In addition to what you're already doing, I encourage you to also communicate with your supervisor or team about your workload and discuss possible solutions together.\\\"}\\n {'from': 'human', 'value': \\\"You're right. I haven't really opened up about my struggles to my coworkers or supervisor. I guess I've been afraid of appearing weak or incapable. How can I approach this discussion without feeling vulnerable?\\\"}\\n {'from': 'gpt', 'value': \\\"It's completely normal to feel that way, but remember, asking for support is a strength, not a weakness. Start by scheduling a conversation with your supervisor or a trusted colleague in a private and comfortable setting. Be honest about your challenges and express your willingness to find solutions together. Remember, you're not alone in this.\\\"}\\n {'from': 'human', 'value': \\\"Thank you for your understanding and guidance. I appreciate the reminder that I don't have to face this alone. I'll gather my courage and initiate that conversation soon. I need to prioritize my well-being and find a healthier balance.\\\"}\\n {'from': 'gpt', 'value': \\\"You're very welcome! I'm here to support you every step of the way. Taking care of yourself should always be a priority. Remember to be kind to yourself and celebrate your progress, no matter how small it may seem. You've got this!\\\"}]\",\n      \"id\": \"identity_0\"\n    },\n    \"truncated_cells\": []\n  },\n  {\n    \"row_idx\": 1,\n    \"row\": {\n      \"conversations\": \"[{'from': 'human', 'value': \\\"Hi, I'm feeling really scared about something and I need some advice. It's about friendship.\\\"}\\n {'from': 'gpt', 'value': \\\"Hello there! I'm here to listen and offer support. Tell me more about what's going on with your friendship. What specifically is making you feel scared?\\\"}\\n {'from': 'human', 'value': \\\"Well, lately, I've been feeling like my friends don't value me as much as I value them. It's like I'm always the one reaching out and making plans, but they never initiate anything. It's making me doubt if our friendship is genuine.\\\"}\\n {'from': 'gpt', 'value': \\\"I understand how disheartening that can be. It's important to feel valued and appreciated in any relationship. Have you talked to your friends about how you're feeling?\\\"}\\n {'from': 'human', 'value': \\\"No, I haven't. I'm afraid that they'll think I'm being needy or that they'll get defensive. I don't want to ruin our friendship by bringing it up.\\\"}\\n {'from': 'gpt', 'value': \\\"It's normal to feel hesitant about having difficult conversations, but addressing your concerns is a crucial step towards building stronger connections. Before jumping to conclusions, it's important to communicate openly and honestly. You deserve to express your feelings.\\\"}\\n {'from': 'human', 'value': \\\"You're right, I should talk to them. But how do I approach the conversation without sounding accusatory or angry?\\\"}\\n {'from': 'gpt', 'value': 'That\\\\'s a great question! It can be helpful to use \\\"I\\\" statements to express how you feel, rather than placing blame on others. For example, you can say, \\\"I\\\\'ve been feeling a bit overlooked lately, and I wanted to talk about our friendship to ensure we\\\\'re on the same page.\\\"'}\\n {'from': 'human', 'value': \\\"I see, so it's important to frame it as a dialogue rather than confrontational. That makes sense.\\\"}\\n {'from': 'gpt', 'value': 'Absolutely! Opening up a conversation can lead to better understanding and growth. Remember, your feelings are valid, and a healthy friendship should provide a safe space to express them. If your friends truly care about you, they will listen and try to understand your perspective.'}\\n {'from': 'human', 'value': \\\"It's reassuring to hear that. I'll try to gather the courage to talk to them. But what if they don't respond well? What if they dismiss my concerns?\\\"}\\n {'from': 'gpt', 'value': \\\"It's natural to worry about how others might react, but remember, you cannot control their responses. What you can control is how you handle the situation. If they dismiss your concerns, it might be an indication that they are not as invested in the friendship as you would like. In that case, it's essential to focus on surrounding yourself with people who appreciate and support you.\\\"}\\n {'from': 'human', 'value': \\\"That's true. I shouldn't hold onto toxic relationships. It's just hard to let go sometimes, you know?\\\"}\\n {'from': 'gpt', 'value': \\\"Letting go can be challenging, especially when we have invested time and energy into a friendship. It's essential to prioritize your well-being and surround yourself with those who uplift you. Remember, you deserve friendships that nourish your self-esteem and bring positivity into your life.\\\"}\\n {'from': 'human', 'value': 'Thank you for reminding me of my worth. I appreciate your advice and support.'}\\n {'from': 'gpt', 'value': \\\"You're very welcome! Building healthy friendships is a journey, and you have taken the first step by recognizing the need for change. Remember to be gentle with yourself as you navigate through this process. You're doing great!\\\"}\\n {'from': 'human', 'value': 'I will, thank you. I feel much better after talking to you.'}\\n {'from': 'gpt', 'value': \\\"I'm glad to hear that. Remember, I'm here for you whenever you need someone to talk to. Don't hesitate to reach out. You've got this!\\\"}]\",\n      \"id\": \"identity_1\"\n    },\n    \"truncated_cells\": []\n  }\n]",
    "HUB_DATASET_PREVIEW: DATASET_NAME: \"emozilla/dolma-v1_7-arxiv\"\nFEATURES: {'text': {'dtype': 'string', '_type': 'Value'}, 'id': {'dtype': 'string', '_type': 'Value'}, 'metadata': {'file_path': {'dtype': 'string', '_type': 'Value'}}}\nDATA SAMPLE:\n[\n  {\n    \"row_idx\": 0,\n    \"row\": {\n      \"text\": \"\\\\section{Introduction}\\nLet $G$ be a simple undirected graph with the \\\\textit{vertex set} $V(G)$ and the \\\\textit{edge set} $E(G)$. A vertex with degree one is called a \\\\textit{pendant vertex}. The distance between the vertices $u$ and $v$ in graph $G$ is denoted by $d_G(u,v)$. A cycle $C$ is called \\\\textit{chordless} if $C$ has no \\\\textit{cycle chord} (that is an edge not in the edge set of $C$ whose endpoints lie on the vertices of $C$).\\nThe \\\\textit{Induced subgraph} on vertex set $S$ is denoted by $\\\\langle S\\\\rangle$. A path that starts in $v$ and ends in $u$ is denoted by $\\\\stackrel\\\\frown{v u}$.\\nA \\\\textit{traceable} graph is a graph that possesses a Hamiltonian path.\\nIn a graph $G$, we say that a cycle $C$ is \\\\textit{formed by the path} $Q$ if $ | E(C) \\\\setminus E(Q) | = 1 $. So every vertex of $C$ belongs to $V(Q)$.\\n\\nIn 2011 the following conjecture was proposed:\\n\\\\begin{conjecture}(Hoffmann-Ostenhof \\\\cite{hoffman})\\nLet $G$ be a connected cubic graph. Then $G$ has a decomposition into a spanning tree, a matching and a family of cycles.\\n\\n\\\\end{conjecture}\\nConjecture \\\\theconjecture$\\\\,$ also appears in Problem 516 \\\\cite{cameron}. There are a few partial results known for Conjecture \\\\theconjecture. Kostochka \\\\cite{kostocha} noticed that the Petersen graph, the prisms over cycles, and many other graphs have a decomposition desired in Conjecture \\\\theconjecture. Ozeki and Ye \\\\cite{ozeki} proved that the conjecture holds for 3-connected cubic plane graphs. Furthermore, it was proved by Bachstein \\\\cite{bachstein} that Conjecture \\\\theconjecture$\\\\,$ is true for every 3-connected cubic graph embedded in torus or Klein-bottle. Akbari, Jensen and Siggers \\\\cite[Theorem 9]{akbari} showed that Conjecture \\\\theconjecture$\\\\,$ is true for Hamiltonian cubic graphs.\\n\\nIn this paper, we show that Conjecture \\\\theconjecture$\\\\,$ holds for traceable cubic graphs.\\n\\\\section{Results}\\nBefore proving the main result, we need the following lemma.\\n\\\\begin{lemma}\\n\\\\label{lemma:1}\\nLet $G$ be a cubic graph. Suppose that $V(G)$ can be partitioned into a tree $T$ and finitely many cycles such that there is no edge between any pair of cycles (not necessarily distinct cycles), and every pendant vertex of $T$ is adjacent to at least one vertex of a cycle. Then, Conjecture \\\\theconjecture$\\\\,$ holds for $G$.\\n\\\\end{lemma}\\n\\\\begin{proof}\\nBy assumption, every vertex of each cycle in the partition is adjacent to exactly one vertex of $T$. Call the set of all edges with one endpoint in a cycle and another endpoint in $T$ by $Q$.\\nClearly, the induced subgraph on $E(T) \\\\cup Q$ is a spanning tree of $G$. We call it $T'$. Note that every edge between a pendant vertex of $T$ and the union of cycles in the partition is also contained in $T'$. Thus, every pendant vertex of $T'$ is contained in a cycle of the partition. Now, consider the graph $H = G \\\\setminus E(T')$. For every $v \\\\in V(T)$, $d_H(v) \\\\leq 1$. So Conjecture \\\\theconjecture$\\\\,$ holds for $G$. \\\\vspace{1em}\\n\\\\end{proof}\\n\\n\\n\\\\noindent\\\\textbf{Remark 1.}\\n\\\\label{remark:1}\\nLet $C$ be a cycle formed by the path $Q$. Then clearly there exists a chordless cycle formed by $Q$.\\n\\nNow, we are in a position to prove the main result.\\n\\n\\\\begin{theorem}\\nConjecture \\\\theconjecture$\\\\,$ holds for traceable cubic graphs.\\n\\\\end{theorem}\\n\\\\begin{proof}\\nLet $G$ be a traceable cubic graph and $P : v_1, \\\\dots, v_n$ be a Hamiltonian path in $G$. By \\\\cite[Theorem 9]{akbari}, Conjecture A holds for $v_1 v_n  \\\\in E(G)$. Thus we can assume that $v_1 v_n  \\\\notin E(G)$. Let $v_1 v_j, v_1 v_{j'}, v_i v_n, v_{i'} v_n \\\\in E(G)\\\\setminus E(P)$  and $j' < j < n$, $1 < i < i'$. Two cases can occur:\\n\\\\begin{enumerate}[leftmargin=0pt,label=]\\n\\\\item\\n\\\\textbf{Case 1.}\\nAssume that $i < j$. Consider the following graph in Figure \\\\ref{fig:overlapping} in which the thick edges denote the path $P$. Call the three paths between $v_j$ and $v_i$, from the left to the right, by $P_1$, $P_2$ and $P_3$, respectively (note that $P_1$ contains the edge $e'$ and $P_3$ contains the edge $e$).\\n\\n\\\\begin{figure}[H]\\n  \\\\begin{center}\\n    \\\\includegraphics[width=40mm]{engImages/overlapping.pdf}\\n    \\\\caption{Paths $P_1$, $P_2$ and $P_3$}\\n    \\\\label{fig:overlapping}\\n  \\\\end{center}\\n\\\\end{figure}\\n\\n\\nIf $P_2$ has order $2$, then $G$ is Hamiltonian and so by \\\\cite[Theorem 9]{akbari} Conjecture \\\\theconjecture$\\\\,$ holds. Thus we can assume that $P_1$, $P_2$ and $P_3$ have order at least $3$. Now, consider the following subcases:\\\\\\\\\\n\\n\\\\begin{enumerate}[leftmargin=0pt,label=]\\n\\\\label{case:1}\\n\\\\item \\\\textbf{Subcase 1.} There is no edge between $V(P_r)$ and $V(P_s)$ for $1 \\\\leq r < s \\\\leq 3$. Since every vertex of $P_i$ has degree 3 for every $i$, by \\\\hyperref[remark:1]{Remark 1}$\\\\,$ there are two chordless cycles $C_1$ and $C_2$ formed by $P_1$ and $P_2$, respectively.\\nDefine a tree $T$ with the edge set\\n$$ E\\\\Big(\\\\langle V(G) \\\\setminus \\\\big(V(C_1) \\\\cup V(C_2)\\\\big) \\\\rangle\\\\Big) \\\\bigcap \\\\big(\\\\bigcup_{i=1}^3 E(P_i)\\\\big).$$\\nNow, apply \\\\hyperref[lemma:1]{Lemma 1} $\\\\,$for the partition $\\\\{T, C_1, C_2\\\\}$.\\\\\\\\\\n\\n\\\\item \\\\textbf{Subcase 2.}\\n\\\\label{case:edge}\\nThere exists at least one edge between some $P_r$ and $P_s$, $r<s$. With no loss of generality, assume that $r=1$ and $s=2$. Suppose that $ab \\\\in E(G)$, where $a \\\\in V(P_1)$, $b \\\\in V(P_2)$ and $d_{P_1}(v_j, a) + d_{P_2}(v_j, b)$ is minimum.\\n\\n\\\\begin{figure}[H]\\n  \\\\begin{center}\\n    \\\\includegraphics[width=40mm]{engImages/ab.pdf}\\n    \\\\caption{The edge $ab$ between $P_1$ and $P_2$}\\n    \\\\label{fig:ab}\\n  \\\\end{center}\\n\\\\end{figure}\\n\\nThree cases occur: \\\\\\\\\\n\\n(a) There is no chordless cycle formed by either of the paths $\\\\stackrel\\\\frown{v_j a}$ or $\\\\stackrel\\\\frown{v_j b}$. Let $C$ be the chordless cycle $\\\\stackrel\\\\frown{v_j a}\\\\stackrel\\\\frown{ b v_j}$. Define $T$ with the edge set\\n$$ E\\\\Big(\\\\langle V(G) \\\\setminus V(C)\\\\rangle\\\\Big) \\\\bigcap \\\\big(\\\\bigcup_{i=1}^3 E(P_i)\\\\big).$$\\nNow, apply \\\\hyperref[lemma:1]{Lemma 1} $\\\\,$for the partition $\\\\{T,C\\\\}$.\\t\\\\\\\\\\n\\n(b) There are two chordless cycles, say $C_1$ and $C_2$, respectively formed by the paths $\\\\stackrel\\\\frown{v_j a}$ and $\\\\stackrel\\\\frown{v_j b}$. Now, consider the partition $C_1$, $C_2$ and the tree induced on the following edges,\\n$$E\\\\Big(\\\\langle V(G) \\\\setminus \\\\big(V(C_1) \\\\cup V(C_2)\\\\big) \\\\rangle\\\\Big) \\\\; \\\\bigcap \\\\; E\\\\Big(\\\\bigcup_{i=1}^3 P_i\\\\Big),$$\\nand apply \\\\hyperref[lemma:1]{Lemma 1}.\\\\\\\\\\n\\n(c) With no loss of generality, there exists a chordless cycle formed by the path $\\\\stackrel\\\\frown{v_j a}$ and there is no chordless cycle formed by the path $\\\\stackrel\\\\frown{v_j b}$.\\nFirst, suppose that for every chordless cycle $C_t$ on $\\\\stackrel\\\\frown{v_j a}$, at least one of the vertices of $C_t$ is adjacent to a vertex in $V(G) \\\\setminus V(P_1)$.\\nWe call one of the edges with one end in $C_t$ and other endpoint in $V(G) \\\\setminus V(P_1)$ by $e_t$. Let $v_j=w_0, w_1, \\\\dots, w_l=a$ be all vertices of the path $\\\\stackrel\\\\frown{v_j a}$ in $P_1$. Choose the shortest path $w_0 w_{i_1} w_{i_2} \\\\dots w_l$ such that $0 < i_1 < i_2 < \\\\dots < l$.\\nDefine a tree $T$ whose edge set is the thin edges in Figure \\\\ref{fig:deltaCycle}.\\\\\\\\\\nCall the cycle $w_0 w_{i_1} \\\\dots w_l \\\\stackrel\\\\frown{b w_0}$ by $C'$. Now, by removing $C'$, $q$ vertex disjoint paths $Q_1, \\\\dots, Q_q$ which are contained in $\\\\stackrel\\\\frown{v_j a}$ remain. Note that there exists a path of order $2$ in $C'$ which by adding this path to $Q_i$ we find a cycle $C_{t_i}$, for some $i$. Hence there exists an edge $e_{t_i}$ connecting $Q_i$ to $V(G) \\\\setminus V(P_1)$. Now, we define a tree $T$ whose the edge set is,\\n$$\\\\quad\\\\quad\\\\quad \\\\bigg( E\\\\Big(\\\\langle V(G) \\\\setminus V(C') \\\\rangle \\\\Big)\\\\; \\\\bigcap \\\\; \\\\Big(\\\\bigcup_{i=1}^3 E(P_i)\\\\Big) \\\\bigg) \\\\bigcup \\\\Big(\\\\big\\\\{e_{t_i} \\\\mid 1 \\\\leq i \\\\leq q \\\\big\\\\} \\\\Big).$$\\nApply \\\\hyperref[lemma:1]{Lemma 1} $\\\\,$for the partition $\\\\{T,C'\\\\}$.\\\\\\\\\\n\\n\\\\begin{figure}[H]\\n  \\\\begin{center}\\n    \\\\includegraphics[width=40mm]{engImages/deltaCycle.pdf}\\n    \\\\caption{The cycle $C'$ and the tree $T$}\\n       \\\\label{fig:deltaCycle}\\n  \\\\end{center}\\n\\\\end{figure}\\n\\nNext, assume that there exists a cycle $C_1$ formed by $\\\\stackrel\\\\frown{v_j a}$ such that none of the vertices of $C_1$ is adjacent to $V(G) \\\\setminus V(P_1)$. Choose the smallest cycle with this property. Obviously, this cycle is chordless. Now, three cases can be considered:\\\\\\\\\\n\\n\\\\begin{enumerate}[leftmargin=5pt,label=(\\\\roman*)]\\n\\\\item There exists a cycle $C_2$ formed by $P_2$ or $P_3$. Define the partition $C_1$, $C_2$ and a tree with the following edge set,\\n$$E\\\\Big(\\\\langle V(G) \\\\setminus \\\\big(V(C_1) \\\\cup V(C_2)\\\\big)\\\\rangle \\\\Big) \\\\bigcap \\\\Big( \\\\bigcup_{i=1}^3 E(P_i) \\\\Big),$$\\nand apply \\\\hyperref[lemma:1]{Lemma 1}.\\\\\\\\\\n\\n\\\\item There is no chordless cycle formed by $P_2$ and by $P_3$, and there is at least one edge between $V(P_2)$ and $V(P_3)$. Let $ab \\\\in E(G)$, $a \\\\in V(P_2)$ and $b \\\\in V(P_3)$ and moreover $d_{P_2}(v_j, a) + d_{P_3}(v_j,b)$ is minimum. Notice that the cycle $\\\\stackrel\\\\frown{v_j a} \\\\stackrel\\\\frown{b v_j}$ is chordless. Let us call this cycle by $C_2$. Now, define the partition $C_2$ and a tree with the following edge set,\\n$$E\\\\Big(\\\\langle V(G) \\\\setminus V(C_2)\\\\rangle \\\\Big) \\\\bigcap \\\\Big( \\\\bigcup_{i=1}^3 E(P_i) \\\\Big),$$\\nand apply \\\\hyperref[lemma:1]{Lemma 1}.\\\\\\\\\\n\\n\\\\item There is no chordless cycle formed by $P_2$ and by $P_3$, and there is no edge between $V(P_2)$ and $V(P_3)$. Let $C_2$ be the cycle consisting of two paths $P_2$ and $P_3$. Define the partition $C_2$ and a tree with the following edge set,\\n$$E\\\\Big(\\\\langle V(G) \\\\setminus V(C_2)\\\\rangle \\\\Big) \\\\bigcap \\\\Big( \\\\bigcup_{i=1}^3 E(P_i) \\\\Big),$$\\nand apply \\\\hyperref[lemma:1]{Lemma 1}.\\n\\n\\\\end{enumerate}\\n\\n\\n\\\\end{enumerate}\\n\\n\\\\vspace{5mm}\\n\\\\item\\n\\\\textbf{Case 2.}\\n\\\\label{case:2}\\nAssume that $j < i$ for all Hamiltonian paths. Among all Hamiltonian paths consider the  path such that $i'-j'$ is maximum. Now, three cases can be considered:\\\\\\\\\\n\\n\\\\begin{enumerate}[leftmargin=0pt,label=]\\n\\\\item \\\\textbf{Subcase 1.} There is no $s < j'$ and $t > i'$ such that $v_s v_t \\\\in E(G)$. By \\\\hyperref[remark:1]{Remark 1} $\\\\,$ there are two chordless cycles $C_1$ and $C_2$, respectively formed by the paths $v_1 v_{j'}$ and $v_{i'} v_n$. By assumption there is no edge $xy$, where $x \\\\in V(C_1)$ and $y \\\\in V(C_2)$.\\nDefine a tree $T$ with the edge set:\\n$$ E\\\\Big(\\\\langle V(G) \\\\setminus \\\\big(V(C_1) \\\\cup V(C_2)\\\\big) \\\\rangle \\\\Big) \\\\bigcap \\\\Big( E(P) \\\\cup \\\\{v_{i'}v_n, v_{j'}v_1\\\\} \\\\Big).$$\\nNow, apply \\\\hyperref[lemma:1]{Lemma 1} $\\\\,$for the partition $\\\\{T, C_1, C_2\\\\}$.\\\\\\\\\\n\\n\\\\item \\\\textbf{Subcase 2.}\\n\\\\label{subcase:22} There are at least four indices $s, s' < j$ and $t, t' > i$ such that $v_s v_t, v_{s'} v_{t'} \\\\in E(G)$. Choose four indices $g, h < j$ and $e, f > i$ such that $v_h v_e, v_g v_f \\\\in E(G)$ and $|g-h| + |e-f|$ is minimum.\\n\\n\\\\begin{figure}[H]\\n  \\\\begin{center}\\n    \\\\includegraphics[width=90mm]{engImages/case2-subcase2.pdf}\\n    \\\\caption{Two edges $v_h v_e$ and $v_g v_f$}\\n    \\\\label{fig:non-overlapping}\\n  \\\\end{center}\\n\\\\end{figure}\\n\\nThree cases can be considered:\\\\\\\\\\n\\n\\\\begin{enumerate}[leftmargin=0pt,label=(\\\\alph*)]\\n\\\\item There is no chordless cycle formed by $\\\\stackrel\\\\frown{v_g v_h}$ and by $\\\\stackrel\\\\frown{v_e v_f}$.\\n\\nConsider the cycle $\\\\stackrel\\\\frown{v_g v_h} \\\\stackrel\\\\frown{v_e v_f}v_g$ and call it $C$. Now, define a tree $T$ with the edge set,\\n$$\\\\,\\\\,\\\\,E\\\\Big(\\\\langle V(G) \\\\setminus V(C)\\\\rangle \\\\Big) \\\\bigcap \\\\Big( E(P) \\\\cup \\\\{v_1v_{j}, v_{i}v_n\\\\} \\\\Big),$$\\napply \\\\hyperref[lemma:1]{Lemma 1} $\\\\,$for the partition $\\\\{T, C\\\\}$.\\\\\\\\\\n\\n\\\\item With no loss of generality, there exists a chordless cycle formed by $\\\\stackrel\\\\frown{v_e v_f}$ and there is no chordless cycle formed by the path $\\\\stackrel\\\\frown{v_g v_h}$. First suppose that there is a chordless cycle $C_1$ formed by $\\\\stackrel\\\\frown{v_e v_f}$ such that there is no edge between $V(C_1)$ and $\\\\{v_1, \\\\dots, v_j\\\\}$. By \\\\hyperref[remark:1]{Remark 1} $,$ there exists a chordless cycle $C_2$ formed by $\\\\stackrel\\\\frown{v_1 v_j}$. By assumption there is no edge between $V(C_1)$ and $V(C_2)$. Now, define a tree $T$ with the edge set,\\n\\n$$\\\\quad\\\\quad\\\\quad\\\\quad E\\\\Big(\\\\langle V(G) \\\\setminus \\\\big(V(C_1) \\\\cup V(C_2)\\\\big)\\\\rangle \\\\Big) \\\\bigcap \\\\Big( E(P) \\\\cup \\\\{v_1v_{j}, v_{i}v_n\\\\} \\\\Big),$$\\n\\nand apply \\\\hyperref[lemma:1]{Lemma 1} $\\\\,$for the partition $\\\\{T, C_1, C_2\\\\}$.\\n\\n$\\\\;$ Next assume that for every cycle $C_r$ formed by $\\\\stackrel\\\\frown{v_e v_f}$, there are two vertices $x_r \\\\in V(C_r)$ and $y_r \\\\in \\\\{v_1, \\\\dots, v_j\\\\}$ such that $x_r y_r \\\\in E(G)$. Let $v_e=w_0, w_1, \\\\dots, w_l=v_f$ be all vertices of the path $\\\\stackrel\\\\frown{v_e v_f}$ in $P$. Choose the shortest path $w_0 w_{i_1} w_{i_2} \\\\dots w_l$ such that $0 < i_1 < i_2 < \\\\dots < l$. Consider the cycle $w_0 w_{i_1} \\\\dots w_l \\\\stackrel\\\\frown{v_g v_h}$ and call it $C$. Now, by removing $C$, $q$ vertex disjoint paths $Q_1, \\\\dots, Q_q$ which are contained in $\\\\stackrel\\\\frown{v_e v_f}$ remain. Note that there exists a path of order $2$ in $C$ which by adding this path to $Q_i$ we find a cycle $C_{r_i}$, for some $i$. Hence there exists an edge $x_{r_i} y_{r_i}$ connecting $Q_i$ to $V(G) \\\\setminus V(\\\\stackrel\\\\frown{v_e v_f})$. We define a tree $T$ whose edge set is the edges,\\n$$\\\\quad\\\\quad\\\\quad\\\\quad\\\\quad\\\\quad E\\\\Big(\\\\langle V(G) \\\\setminus V(C)\\\\rangle \\\\Big) \\\\bigcap \\\\Big( E(P) \\\\cup \\\\{v_1v_{j}, v_{i}v_n\\\\} \\\\cup \\\\big\\\\{x_{r_i} y_{r_i} \\\\mid 1 \\\\leq i \\\\leq q\\\\big\\\\} \\\\Big),$$\\nthen apply \\\\hyperref[lemma:1]{Lemma 1} $\\\\,$ on the partition $\\\\{T, C\\\\}$.\\\\\\\\\\n\\\\begin{figure}[H]\\n  \\\\begin{center}\\n    \\\\includegraphics[width=90mm]{engImages/deltaNonOverlapping.pdf}\\n    \\\\caption{The tree $T$ and the shortest path $w_0 w_{i_1}\\\\dots w_l$}\\n    \\\\label{fig:delta-non-overlapping}\\n  \\\\end{center}\\n\\\\end{figure}\\n\\n\\\\item There are at least two chordless cycles, say $C_1$ and $C_2$ formed by the paths $\\\\stackrel\\\\frown{v_g v_h}$ and $\\\\stackrel\\\\frown{v_e v_f}$, respectively. Since $|g-h| + |e-f|$ is minimum, there is no edge $xy \\\\in E(G)$ with $x \\\\in V(C_1)$ and $y \\\\in V(C_2)$. Now, define a tree $T$ with the edge set,\\n$$\\\\quad\\\\quad\\\\quad\\\\quad E\\\\Big( \\\\langle V(G) \\\\setminus \\\\big(V(C_1) \\\\cup V(C_2)\\\\big) \\\\rangle \\\\Big) \\\\bigcap \\\\Big( E(P) \\\\cup \\\\{v_1 v_{j}, v_{i}v_n\\\\} \\\\Big),$$\\nand apply \\\\hyperref[lemma:1]{Lemma 1} $\\\\,$for the partition $\\\\{T, C_1, C_2\\\\}$.\\\\\\\\\\n\\\\end{enumerate}\\n\\n\\\\item \\\\textbf{Subcase 3.} There exist exactly two indices $s,t$,  $s < j' < i' < t$ such that $v_s v_t \\\\in E(G)$ and there are no two other indices $s', t'$ such that $s' < j < i < t'$ and $v_{s'} v_{t'} \\\\in E(G)$. We can assume that there is no cycle formed by  $\\\\stackrel\\\\frown{v_{s+1} v_j}$ or $\\\\stackrel\\\\frown{v_i v_{t-1}}$, to see this by symmetry consider a cycle $C$ formed by $\\\\stackrel\\\\frown{v_{s+1} v_j}$. By \\\\hyperref[remark:1]{Remark 1} $\\\\,$ there exist chordless cycles $C_1$ formed by $\\\\stackrel\\\\frown{v_{s+1} v_j}$ and $C_2$ formed by $\\\\stackrel\\\\frown{v_{i} v_n}$. By assumption $v_s v_t$ is the only edge such that $s < j$ and $t > i \\\\;$. Therefore,  there is no edge between $V(C_1)$ and  $V(C_2)$. Now, let $T$ be a tree defined by the edge set,\\n$$ E\\\\Big(\\\\langle V(G) \\\\setminus \\\\big(V(C_1) \\\\cup V(C_2)\\\\big)\\\\rangle \\\\Big) \\\\bigcap \\\\Big( E(P) \\\\cup \\\\{v_1v_{j}, v_{i}v_n\\\\} \\\\Big),$$\\nand apply \\\\hyperref[lemma:1]{Lemma 1} $\\\\,$for the partition \\\\{$T$, $C_1$, $C_2$\\\\}.\\\\\\\\\\n\\n$\\\\quad$Furthermore, we can also assume that either $s \\\\neq j'-1$  or $t \\\\neq i'+1$, otherwise we have the Hamiltonian cycle $\\\\stackrel\\\\frown{v_1 v_s} \\\\stackrel\\\\frown{v_t v_n} \\\\stackrel\\\\frown{v_{i'} v_{j'}} v_1$ and by \\\\cite[Theorem 9]{akbari} Conjecture \\\\theconjecture$\\\\,$ holds.\\n\\n$\\\\quad$By symmetry, suppose that $s \\\\neq j'-1$. Let $v_k$ be the vertex adjacent to $v_{j'-1}$, and $k \\\\notin \\\\{j'-2, j'\\\\}$. It can be shown that $k > j'-1$, since otherwise by considering the Hamiltonian path $P': \\\\; \\\\stackrel\\\\frown{ v_{k+1} v_{j'-1}}\\\\stackrel\\\\frown{v_k v_1} \\\\stackrel\\\\frown{v_{j'} v_n}$,  the new $i'-j'$ is greater than the old one and this contradicts our assumption about $P$ in the \\\\hyperref[case:2]{Case 2}.\\n\\n$\\\\quad$We know that $j' < k < i$. Moreover, the fact that  $\\\\stackrel\\\\frown{v_{s+1} v_j}$ does not form a cycle contradicts the case that $j' < k \\\\le j$. So $j < k < i$. Consider two cycles $C_1$ and $C_2$, respectively with the vertices $v_1 \\\\stackrel\\\\frown{v_{j'} v_{j}} v_1$ and $v_n \\\\stackrel\\\\frown{v_{i'} v_{i}} v_n$. The cycles $C_1$ and $C_2$ are chordless, otherwise there exist cycles formed by the paths $\\\\stackrel\\\\frown{v_{s+1} v_j}$ or $\\\\stackrel\\\\frown{v_i v_{t-1}}$. Now, define a tree $T$ with the edge set\\n$$ E\\\\Big(\\\\langle V(G) \\\\setminus \\\\big(V(C_1) \\\\cup V(C_2)\\\\big)\\\\rangle \\\\Big) \\\\bigcap \\\\Big( E(P) \\\\cup \\\\{v_s v_t, v_k v_{j'-1}\\\\} \\\\Big),$$\\nand apply \\\\hyperref[lemma:1]{Lemma 1} $\\\\,$for the partition \\\\{$T$, $C_1$, $C_2$\\\\}.\\n\\\\end{enumerate}\\n\\\\end{enumerate}\\n\\\\end{proof}\\n\\n\\\\noindent\\\\textbf{Remark 2.}\\n\\\\label{remark:2}\\nIndeed, in the proof of the previous theorem we showed a stronger result, that is, for every traceable cubic graph there is a decomposition with at most two cycles.\\n\\n\",\n      \"id\": \"b7c40b41b7eedaa408f87d154284a1aba126589c\",\n      \"metadata\": {\n        \"file_path\": \"/home/ubuntu/dolma-v1_7/arxiv-0000.json.gz\"\n      }\n    },\n    \"truncated_cells\": []\n  },\n  {\n    \"row_idx\": 1,\n    \"row\": {\n      \"text\": \"\\\\section{Principle of nano strain-amplifier}\\r\\n\\r\\n\\\\begin{figure*}[t!]\\r\\n\\t\\\\centering\\r\\n\\t\\\\includegraphics[width=5.4in]{Fig1}\\r\\n\\t\\t\\\\vspace{-0.5em}\\r\\n\\t\\\\caption{Schematic sketches of nanowire strain sensors. (a)(b) Conventional non-released and released NW structure; \\r\\n\\t\\t(c)(d) The proposed nano strain-amplifier and its simplified physical model.}\\r\\n\\t\\\\label{fig:fig1}\\r\\n\\t\\t\\\\vspace{-1em}\\r\\n\\\\end{figure*}\\r\\nFigure \\\\ref{fig:fig1}(a) and 1(b) show the concept of the conventional structures of piezoresistive sensors. The piezoresistive elements are either released from, or kept on, the substrate. The sensitivity ($S$) of the sensors is defined based on the ratio of the relative resistance change ($\\\\Delta R/R$) of the sensing element and the strain applied to the substrate ($\\\\varepsilon_{sub}$):\\r\\n\\\\begin{equation}\\r\\nS = (\\\\Delta R/R)/\\\\varepsilon_{sub}\\r\\n\\\\label{eq:sensitivity}\\r\\n\\\\end{equation}\\r\\nIn addition, the relative resistance change $\\\\Delta R/R$ can be calculated from the gauge factor ($GF$) of the material used to make the piezoresistive elements: $\\\\Delta R/R = GF \\\\varepsilon_{ind}$, where $\\\\varepsilon_{ind}$ is the strain induced into the piezoresistor. In most of the conventional strain gauges as shown in Fig. \\\\ref{fig:fig1} (a,b), the thickness of the sensing layer is typically below a few hundred nanometers, which is much smaller than that of the substrate. Therefore, the strain induced into the piezoresistive elements is approximately the same as that of the substrate ($\\\\varepsilon_{ind} \\\\approx \\\\varepsilon_{sub}$). Consequently, to improve the sensitivity of strain sensors (e.g. enlarging $\\\\Delta R/R$), electrical approaches which can enlarge the gauge factor ($GF$) are required. Nevertheless, as aforementioned, the existence of the large gauge factor in nanowires due to quantum confinement or surface state, is still considered as controversial. \\n\\r\\nIt is also evident from Eq. \\\\ref{eq:sensitivity} that the sensitivity of strain sensors can also be improved using a mechanical approach, which enlarges the strain induced into the piezoresistive element. Figure \\\\ref{fig:fig1}(c) shows our proposed nano strain-amplifier structure, in which the piezoresistive nanowires are locally fabricated at the centre of a released bridge. The key idea of this structure is that, under a certain strain applied to the substrate, a large strain will be concentrated at the locally fabricated SiC nanowires. The working principle of the nano strain-amplifier is similar to that of the well-known dogbone structure, which is widely used to characterize the tensile strength of materials \\\\cite{dogbone1,dogbone2}. That is, when a stress is applied to the dogbone-shape of a certain material, a crack, if generated, will occur at the middle part of the dogbone. The large strain concentrated at the narrow area located at the centre part with respect to the wider areas located at outer region, causes the crack. Qualitative and quantitative explanations of the nano strain-amplifier are presented as follows.  \\r\\n\\r\\nFor the sake of simplicity, the released micro frame and nanowire (single wire or array) of the nano strain-amplifier can be considered as solid springs, Fig. \\\\ref{fig:fig1}(d). The stiffness of these springs are proportional to their width ($w$) and inversely proportional to their length (l): $K \\\\propto w/l$. Consequently, the model of the released nanowire and micro frames can be simplified as a series of springs, where the springs with higher stiffness correspond to the micro frame, and the single spring with lower stiffness corresponds to the nanowire. It is well-known in classical physics that, for serially connected springs, a larger strain will be concentrated in the low--stiffness string, while a smaller strain will be induced in the high--stiffness string \\\\cite{Springbook}. The following analysis quantitatively explained the amplification of the strain.\\t\\r\\n\\r\\n\\\\begin{figure}[b!]\\r\\n\\t\\\\centering\\r\\n\\t\\\\includegraphics[width=3in]{Fig2}\\r\\n\\t\\\\vspace{-1em}\\r\\n\\t\\\\caption{Finite element analysis of the strain induced in to the nanowire array utilizing nano strain-amplifier.}\\r\\n\\t\\\\label{fig:fig2}\\r\\n\\\\end{figure}\\r\\nWhen a tensile mechanical strain ($\\\\varepsilon_{sub}$) is applied to the substrate, the released structure will also be elongated. Since the stiffness of the released frame is much smaller than that of the substrate, it is safe to assume that the released structure will follows the elongation of the substrate. The displacement of the released structure $\\\\Delta L$ is:\\r\\n\\\\begin{equation}\\r\\n\\\\Delta L = \\\\Delta L_m + \\\\Delta L_n = L_m \\\\varepsilon_m + L_n \\\\varepsilon_n\\r\\n\\\\label{eq:displacement}\\r\\n\\\\end{equation} \\r\\nwhere $L_m$, $L_n$ are the length; $\\\\Delta L_m$, $\\\\Delta L_n$ are the displacement; and $\\\\varepsilon_m$, $\\\\varepsilon_n$ are the strains induced into the micro spring and nano spring, respectively. The subscripts m and n stand for the micro frames and nanowires, respectively. Furthermore, due to the equilibrium of the stressing force ($F$) along the series of springs, the following relationship is established: $F= K_m\\\\Delta L_m = K_n \\\\Delta L_n$, where $K_m$, $K_n$ are the stiffness of the released micro frames and nanowires, respectively. Consequently the relationship between the displacement of the micro frame (higher stiffness) and nanowires (lower stiffness) is:\\r\\n\\\\begin{equation}\\r\\n\\\\frac{\\\\Delta L_m}{\\\\Delta L_n}=\\\\frac{K_n}{K_m}=\\\\frac{L_mw_n}{L_nw_m}\\r\\n\\\\label{eq:euili}\\r\\n\\\\end{equation}\\r\\nSubstituting Eqn. \\\\ref{eq:euili} into Eqn. \\\\ref{eq:displacement}, the strain induced into the locally fabricated nanowires is:\\r\\n\\\\begin{equation}\\r\\n\\\\varepsilon_n = \\\\frac{\\\\Delta L_n}{L_n} = \\\\frac{1}{1-\\\\frac{w_m-w_n}{w_m}\\\\frac{L_m}{L}}\\\\varepsilon_{sub}\\r\\n\\\\label{eq:strainamp}\\r\\n\\\\end{equation} \\r\\n\\r\\nEquation \\\\ref{eq:strainamp} indicates that increasing the ratio of $w_m/w_n$ and $L_m/L_n$ significantly amplifies the strain induced into the nanowire from the strain applied to the substrate. This model is also applicable to the case of nanowire arrays, in which $w_n$ is the total width of all nanowires in the array.\\n\\r\\nThe theoretical model is then verified using the finite element analysis (FEA). In the FEA simulation, we compare the strain induced into (i) non released nanowires, (ii) the conventionally released nanowires, and (iii) our nano strain-amplifier structure, using COMSOL Multiphysics \\\\texttrademark. In our nano strain amplifying structure, the width of the released frame was set to be 8 $\\\\mu$m, while the width of each nanowire in the array (3 wires) was set to be 370 nm. The nanowires array structure was selected as it can enhance the electrical conductance of the SiC nanowires resistor which makes the subsequent experimental demonstration easier. The ratio between the length of nanowires and micro bridge was set to be 1: 20. With this geometrical dimensions, strain induced into nanowires array $\\\\varepsilon_n$ was numerically calculated to be approximately 6 times larger than $\\\\varepsilon_{sub}$, Eqn. \\\\ref{eq:strainamp}. The simulation results show that for all structure, the elongation of non-released and released nanowires follow that of the substrate. In addition, strain was almost completely transferred into conventional released and non-released structures. Furthermore, the ratio of the strain induced in to the locally fabricated nanowires was estimated to be 5.9 times larger than that of the substrate, Fig. \\\\ref{fig:fig2}. These results are in solid agreement with the theoretical analysis presented above. For a nanowire array with an average width of 470 nm, the amplified gain of strain was found to be 4.5.   \\t\\r\\n\\r\\nBased on the theoretical analysis, we conducted the following experiments to demonstrate the high sensitivity of SiC nanowire strain sensors using the nano strain-amplifier. A thin 3C-SiC film with its thickness of 300 nm was epitaxially grown on a 150 mm diameter Si wafer using low pressure chemical vapour deposition \\\\cite{SiC_growth}. The film was \\\\emph{in situ} doped using Al dopants. The carrier concentration of the p-type 3C-SiC was found to be $5 \\\\times 10^{18}$ cm$^{-3}$, using a hot probe technique \\\\cite{philip}. The details of the characteristics of the grown film can be found elsewhere \\\\cite{Phan_JMC}. Subsequently, I-shape p-type SiC resistors with aluminum electrodes deposited on the surface were patterned using inductive coupled plasma (ICP) etching. As the piezoresistance of p-type 3C-SiC depends on crystallographic orientation, all SiC resistors of the present work were aligned along [110] direction to maximize the piezoresistive effect. Next, the micro scale SiC resistors were then released from the Si substrate using dry etching (XeF$_2$). Finally, SiC nanowire arrays were formed at the centre of the released bridge using focused ion beam (FIB). Two types of nanowire array were fabricated with three nanowires for each array. The average width of each nanowire in each type were 380 nm and 470 nm, respectively. Figure \\\\ref{fig:fig3} shows the SEM images of the fabricated samples, including the conventional released structure, non-released nanowires, and the nano strain-amplifier.  \\r\\n\\r\\n\\\\begin{figure}[t!]\\r\\n\\t\\\\centering\\r\\n\\t\\\\includegraphics[width=3in]{Fig3}\\r\\n\\t\\\\caption{SEM image of SiC strain sensors. (a) Released SiC micro bridge used for the subsequent fabrication of the nano strain-amplifier; (b) SEM of a micro SiC resistor where the SiC nanowires array were formed using FIB; (c) SEM of non-released SiC nanowires; (d) SEM of locally fabricated SiC nanowires released from the Si substrate (nano strain-amplifier).}\\r\\n\\t\\\\label{fig:fig3}\\r\\n\\t\\\\vspace{-1em}\\r\\n\\\\end{figure}\\r\\nThe current voltage (I-V) curves of all fabricated samples were characterized using a HP 4145 \\\\texttrademark ~parameter analyzer. The linear relationship between the applied voltage and measured current, indicated that Al made a good Ohmic contact with the highly doped SiC resistance, Fig. \\\\ref{fig:IV}. Additionally, the electrical conductivity of both nanowires and micro frame estimated from the I-V curve and the dimensions of the resistors shows almost the same value. This indicated that the FIB process did not cause a significant surface damage to the fabricated nanowires.    \\r\\n\\t\\r\\n\\\\begin{figure}[b!]\\r\\n\\t\\\\centering\\r\\n\\t\\\\includegraphics[width=3in]{Fig4}\\r\\n\\t\\t\\\\vspace{-1.5em}\\r\\n\\t\\\\caption{Current voltage curves of the fabricated SiC resistors.}\\r\\n\\t\\\\label{fig:IV}\\r\\n\\n\\\\end{figure}\\r\\n\\r\\nThe bending experiment was used to characterize the piezoresistive effect in micro size SiC resistors and locally fabricated SiC nanowire array. In this experiment one end of the Si cantilever (with a thickness of 625 $\\\\mu$m, and a width of 7 mm) was fixed while the other end was deflected by applying different forces. The distance from the fabricated nanowires to the free end of the Si cantilever was approximately 45 mm. The strain induced into the Si substrate is $\\\\varepsilon_\\\\text{sub} = Mt/2EI$, where $M$ is the applied bending moment; and $t$, $E$ and $I$ are the thickness, Young's modulus and the moment of inertia of the Si cantilever, respectively. The response of the SiC resistance to applied strain was then measured using a multimeter (Agilent \\\\texttrademark 34401 A).\\n\\r\\n\\\\begin{figure}[h!]\\r\\n\\t\\\\centering\\r\\n\\t\\\\includegraphics[width=3in]{Fig5.eps}\\r\\n\\t\\t\\\\vspace{-1.5em}\\r\\n\\t\\\\caption{Experimental results. (a) A comparision between the relative resistance change in the nano strain-amplifiers, non released nanowires and released micro frames; (b) The repeatability of the SiC nanowires strain sensors utilizing the proposed structure.}\\r\\n\\t\\\\label{fig:DRR}\\r\\n\\t\\t\\t\\\\vspace{-1em}\\r\\n\\\\end{figure}\\t\\r\\nThe relative resistance change ($\\\\Delta R/R$) of the micro and nano SiC resistors was plotted against the strain induced into the Si substrate $\\\\varepsilon_{sub}$, Fig. \\\\ref{fig:DRR}(a). For all fabricated samples, the relative resistance change shows a good linear relationship with the applied strain ($\\\\varepsilon_{sub}$). In addition, with the same applied strain to the Si substrate, the resistance change of the SiC nanowires using the nano strain-amplifier was much larger than that of the the SiC micro resistor and the conventional non-released SiC nanowires. In addition, reducing the width of the SiC nanowires also resulted in the increase of the sensitivity. The magnitude of the piezoresistive effect in the nano strain-amplifier as well as conventional structures were then quantitatively evaluated based on the effective gauge factor ($GF_{eff}$), which is defined as the ratio of the relative resistance change to the applied strain to the substrate: $GF_{eff} = (\\\\Delta R/R)/\\\\varepsilon_{sub}$. Accordingly, the effective gauge factor of the released micro SiC was found to be 28, while that of the non-released SiC nanowires was 35. From the data shown in Fig. \\\\ref{fig:DRR}, the effective gauge factor of the 380 nm and 470 nm SiC nanowires in the nano strain-amplifier were calculated as 150 and 124, respectively. Thus for nanowire arrays with average widths of 380 nm and 470 nm, the sensitivity of the nano strain-amplifier was 5.4 times and 4.6 times larger than the bulk SiC, respectively. These results were consistent with analytical and numerical models presented above. The relative resistance change of the nano strain-amplifier also showed excellent linearity with the applied strain, with a linear regression of above 99\\\\%. \\r\\n\\r\\nThe resistance change of the nano strain-amplifier can also be converted into voltage signals using a Wheatstone bridge, Fig. \\\\ref{fig:DRR}(b). The output voltage of the nano strain-amplifier increases with increasing tensile strains from 0 ppm to 180 ppm, and returned to the initial value when the strain was completely removed, confirming a good repeatability after several strain induced cycles. The linearity of the relative resistance change, and the repeatability indicate that the proposed structure is promising for strain sensing applications.\\r\\n    \\r\\nIn conclusion, this work presents a novel mechanical approach to obtain highly sensitive piezoresistance in nanowires based on a nano strain-amplifier. The key factor of the nano strain-amplifier lies on nanowires locally fabricated on a released micro structure. Experimental studies were conducted on SiC nanowires, confirming that by utilizing our nano strain-amplifier, the sensitivity of SiC nanowires was 5.4 times larger than that of conventional structures. This result indicated that the nano strain-amplifier is an excellent platform for ultra sensitive strain sensing applications.    \\r\\n\\r\\n\\r\\n\",\n      \"id\": \"1b77ae9f541b19668cc96624c7ec0f83945284e2\",\n      \"metadata\": {\n        \"file_path\": \"/home/ubuntu/dolma-v1_7/arxiv-0000.json.gz\"\n      }\n    },\n    \"truncated_cells\": []\n  }\n]",
    "HUB_DATASET_PREVIEW: DATASET_NAME: \"KelvinTichana2/mentalhealthcurated\"\nFEATURES: {'Human': {'dtype': 'string', '_type': 'Value'}, 'Assistant': {'dtype': 'string', '_type': 'Value'}}\nDATA SAMPLE:\n[\n  {\n    \"row_idx\": 0,\n    \"row\": {\n      \"Human\": \"hello, hey, hi, good day, greetings, what's up?, how is it going\",\n      \"Assistant\": \"Hello! How are you today!, Hey! What's up, Hey, How are you feeling today\"\n    },\n    \"truncated_cells\": []\n  },\n  {\n    \"row_idx\": 1,\n    \"row\": {\n      \"Human\": \"cya, see you later, goodbye, Have a good day, bye, I am leaving\",\n      \"Assistant\": \"Talk to you later!, Bye!, Goodbye!\"\n    },\n    \"truncated_cells\": []\n  }\n]"
]
embeddings = model.encode(sentences)

similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [4, 4]

Notebooks
Google Colab
Kaggle

Alibaba-NLP/gte-base-en-v1.5 trained on query-to-dataset-viewer-descriptions

This is a sentence-transformers model finetuned from Alibaba-NLP/gte-base-en-v1.5 on the query-to-dataset-viewer-descriptions dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: Alibaba-NLP/gte-base-en-v1.5
Maximum Sequence Length: 8192 tokens
Output Dimensionality: 768 tokens
Similarity Function: Cosine Similarity
Training Dataset:
- query-to-dataset-viewer-descriptions
Language: en
License: apache-2.0

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: NewModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("query-to-dataset-viewer-descriptions")
# Run inference
sentences = [
    'USER_QUERY: kotlin code dataset',
    'HUB_DATASET_PREVIEW: DATASET_NAME: "mvasiliniuc/iva-kotlin-codeint"\nFEATURES: {\'repo_name\': {\'dtype\': \'string\', \'_type\': \'Value\'}, \'path\': {\'dtype\': \'string\', \'_type\': \'Value\'}, \'copies\': {\'dtype\': \'string\', \'_type\': \'Value\'}, \'size\': {\'dtype\': \'string\', \'_type\': \'Value\'}, \'content\': {\'dtype\': \'string\', \'_type\': \'Value\'}, \'license\': {\'dtype\': \'string\', \'_type\': \'Value\'}}\nDATA SAMPLE:\n[\n  {\n    "row_idx": 0,\n    "row": {\n      "repo_name": "Cognifide/gradle-aem-plugin",\n      "path": "src/main/kotlin/com/cognifide/gradle/aem/instance/tasks/InstanceReload.kt",\n      "copies": "1",\n      "size": "1052",\n      "content": "package com.cognifide.gradle.aem.instance.tasks\\n\\nimport com.cognifide.gradle.aem.common.instance.action.AwaitUpAction\\nimport com.cognifide.gradle.aem.common.instance.action.ReloadAction\\nimport com.cognifide.gradle.aem.common.instance.names\\nimport com.cognifide.gradle.aem.common.tasks.Instance\\nimport org.gradle.api.tasks.TaskAction\\n\\nopen class InstanceReload : Instance() {\\n\\n    private var reloadOptions: ReloadAction.() -> Unit = {}\\n\\n    fun reload(options: ReloadAction.() -> Unit) {\\n        this.reloadOptions = options\\n    }\\n\\n    private var awaitUpOptions: AwaitUpAction.() -> Unit = {}\\n\\n    fun awaitUp(options: AwaitUpAction.() -> Unit) {\\n        this.awaitUpOptions = options\\n    }\\n\\n    @TaskAction\\n    fun reload() {\\n        instanceManager.awaitReloaded(anyInstances, reloadOptions, awaitUpOptions)\\n        common.notifier.lifecycle(\\"Instance(s) reloaded\\", \\"Which: ${anyInstances.names}\\")\\n    }\\n\\n    init {\\n        description = \\"Reloads all AEM instance(s).\\"\\n    }\\n\\n    companion object {\\n        const val NAME = \\"instanceReload\\"\\n    }\\n}\\n",\n      "license": "apache-2.0"\n    },\n    "truncated_cells": []\n  },\n  {\n    "row_idx": 1,\n    "row": {\n      "repo_name": "80998062/Fank",\n      "path": "presentation/src/main/java/com/sinyuk/fanfou/ui/status/StatusView.kt",\n      "copies": "1",\n      "size": "8490",\n      "content": "/*\\n *\\n *  * Apache License\\n *  *\\n *  * Copyright [2017] Sinyuk\\n *  *\\n *  * Licensed under the Apache License, Version 2.0 (the \\"License\\");\\n *  * you may not use this file except in compliance with the License.\\n *  * You may obtain a copy of the License at\\n *  *\\n *  *     http://www.apache.org/licenses/LICENSE-2.0\\n *  *\\n *  * Unless required by applicable law or agreed to in writing, software\\n *  * distributed under the License is distributed on an \\"AS IS\\" BASIS,\\n *  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\\n *  * See the License for the specific language governing permissions and\\n *  * limitations under the License.\\n *\\n */\\n\\npackage com.sinyuk.fanfou.ui.status\\n\\nimport android.os.Build\\nimport android.os.Bundle\\nimport android.support.v4.app.Fragment\\nimport android.support.v4.app.FragmentPagerAdapter\\nimport android.text.Editable\\nimport android.text.TextWatcher\\nimport android.view.View\\nimport android.view.ViewTreeObserver\\nimport cn.dreamtobe.kpswitch.util.KeyboardUtil\\nimport com.linkedin.android.spyglass.suggestions.SuggestionsResult\\nimport com.linkedin.android.spyglass.suggestions.interfaces.Suggestible\\nimport com.linkedin.android.spyglass.suggestions.interfaces.SuggestionsResultListener\\nimport com.linkedin.android.spyglass.suggestions.interfaces.SuggestionsVisibilityManager\\nimport com.linkedin.android.spyglass.tokenization.QueryToken\\nimport com.linkedin.android.spyglass.tokenization.impl.WordTokenizer\\nimport com.linkedin.android.spyglass.tokenization.impl.WordTokenizerConfig\\nimport com.linkedin.android.spyglass.tokenization.interfaces.QueryTokenReceiver\\nimport com.sinyuk.fanfou.R\\nimport com.sinyuk.fanfou.base.AbstractActivity\\nimport com.sinyuk.fanfou.base.AbstractFragment\\nimport com.sinyuk.fanfou.di.Injectable\\nimport com.sinyuk.fanfou.domain.DO.Player\\nimport com.sinyuk.fanfou.domain.DO.Status\\nimport com.sinyuk.fanfou.domain.STATUS_LIMIT\\nimport com.sinyuk.fanfou.domain.StatusCreation\\nimport com.sinyuk.fanfou.domain.TIMELINE_CONTEXT\\nimport com.sinyuk.fanfou.ui.editor.EditorView\\nimport com.sinyuk.fanfou.ui.editor.MentionListView\\nimport com.sinyuk.fanfou.ui.timeline.TimelineView\\nimport com.sinyuk.fanfou.util.obtainViewModelFromActivity\\nimport com.sinyuk.fanfou.viewmodel.FanfouViewModelFactory\\nimport com.sinyuk.fanfou.viewmodel.PlayerViewModel\\nimport kotlinx.android.synthetic.main.status_view.*\\nimport kotlinx.android.synthetic.main.status_view_footer.*\\nimport kotlinx.android.synthetic.main.status_view_reply_actionbar.*\\nimport javax.inject.Inject\\n\\n\\n/**\\n * Created by sinyuk on 2018/1/12.\\n *\\n */\\nclass StatusView : AbstractFragment(), Injectable, QueryTokenReceiver, SuggestionsResultListener, SuggestionsVisibilityManager {\\n\\n    companion object {\\n        fun newInstance(status: Status, photoExtra: Bundle? = null) = StatusView().apply {\\n            arguments = Bundle().apply {\\n                putParcelable(\\"status\\", status)\\n                putBundle(\\"photoExtra\\", photoExtra)\\n            }\\n        }\\n    }\\n\\n    override fun layoutId() = R.layout.status_view\\n\\n    @Inject\\n    lateinit var factory: FanfouViewModelFactory\\n\\n    private val playerViewModel by lazy { obtainViewModelFromActivity(factory, PlayerViewModel::class.java) }\\n\\n    override fun onEnterAnimationEnd(savedInstanceState: Bundle?) {\\n        super.onEnterAnimationEnd(savedInstanceState)\\n        navBack.setOnClickListener { onBackPressedSupport() }\\n        setupEditor()\\n        setupKeyboard()\\n        onTextChanged(0)\\n        setupViewPager()\\n\\n        val status = arguments!!.getParcelable<Status>(\\"status\\")\\n        fullscreenButton.setOnClickListener {\\n            (activity as AbstractActivity).start(EditorView.newInstance(status.id,\\n                    replyEt.mentionsText,\\n                    StatusCreation.REPOST_STATUS))\\n            replyEt.text = null\\n        }\\n    }\\n\\n    private fun setupViewPager() {\\n        val status = arguments!!.getParcelable<Status>(\\"status\\")\\n        val bundle = arguments!!.getBundle(\\"photoExtra\\")\\n        val fragments: List<Fragment> = if (findChildFragment(TimelineView::class.java) == null) {\\n            val mentionView = MentionListView()\\n            mentionView.onItemClickListener = onSuggestionSelectListener\\n            mutableListOf(TimelineView.contextTimeline(TIMELINE_CONTEXT, status, bundle), mentionView)\\n        } else {\\n            mutableListOf(findChildFragment(TimelineView::class.java), MentionListView())\\n        }\\n\\n        viewPager.setPagingEnabled(false)\\n        viewPager.offscreenPageLimit = 1\\n        viewPager.adapter = object : FragmentPagerAdapter(childFragmentManager) {\\n            override fun getItem(position: Int) = fragments[position]\\n\\n            override fun getCount() = fragments.size\\n        }\\n    }\\n\\n    private var keyboardListener: ViewTreeObserver.OnGlobalLayoutListener? = null\\n\\n    private fun setupKeyboard() {\\n        keyboardListener = KeyboardUtil.attach(activity, panelRoot, {\\n            // TODO: how comes the Exception: panelRootContainer must not be null\\n            panelRootContainer?.visibility =\\n                    if (it) {\\n                        if (replyEt.requestFocus()) replyEt.setSelection(replyEt.text.length)\\n                        View.VISIBLE\\n                    } else {\\n                        replyEt.clearFocus()\\n                        View.GONE\\n                    }\\n        })\\n    }\\n\\n    private val config = WordTokenizerConfig.Builder()\\n            .setExplicitChars(\\"@\\")\\n            .setThreshold(3)\\n            .setMaxNumKeywords(5)\\n            .setWordBreakChars(\\" \\").build()\\n\\n    private fun setupEditor() {\\n        replyEt.tokenizer = WordTokenizer(config)\\n        replyEt.setAvoidPrefixOnTap(true)\\n        replyEt.setQueryTokenReceiver(this)\\n        replyEt.setSuggestionsVisibilityManager(this)\\n        replyEt.setAvoidPrefixOnTap(true)\\n\\n        replyCommitButton.setOnClickListener { }\\n\\n        if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.O)\\n            textCountProgress.min = 0\\n        textCountProgress.max = STATUS_LIMIT\\n        replyEt.addTextChangedListener(object : TextWatcher {\\n            override fun afterTextChanged(s: Editable?) {\\n                onTextChanged(s?.length ?: 0)\\n            }\\n\\n            override fun beforeTextChanged(s: CharSequence?, start: Int, count: Int, after: Int) {\\n\\n            }\\n\\n            override fun onTextChanged(s: CharSequence?, start: Int, before: Int, count: Int) {\\n\\n            }\\n        })\\n    }\\n\\n\\n    /**\\n     * @param count \\u5b57\\u6570\\n     */\\n    private fun onTextChanged(count: Int) {\\n        textCountProgress.progress = count\\n        replyCommitButton.isEnabled = count in 1..STATUS_LIMIT\\n    }\\n\\n\\n    private val onSuggestionSelectListener = object : MentionListView.OnItemClickListener {\\n        override fun onItemClick(position: Int, item: Suggestible) {\\n            (item as Player).let {\\n                replyEt.insertMention(it)\\n                displaySuggestions(false)\\n                playerViewModel.updateMentionedAt(it) //\\n                onTextChanged(replyEt.text.length)\\n                replyEt.requestFocus()\\n                replyEt.setSelection(replyEt.text.length)\\n            }\\n        }\\n    }\\n\\n    @Suppress(\\"PrivatePropertyName\\")\\n    private val BUCKET = \\"player-mentioned\\"\\n\\n    override fun onQueryReceived(queryToken: QueryToken): MutableList<String> {\\n        val data = playerViewModel.filter(queryToken.keywords)\\n        onReceiveSuggestionsResult(SuggestionsResult(queryToken, data), BUCKET)\\n        return arrayOf(BUCKET).toMutableList()\\n    }\\n\\n    override fun onReceiveSuggestionsResult(result: SuggestionsResult, bucket: String) {\\n        val data = result.suggestions\\n        if (data?.isEmpty() != false) return\\n        displaySuggestions(true)\\n        findChildFragment(MentionListView::class.java).setData(data)\\n    }\\n\\n    override fun displaySuggestions(display: Boolean) {\\n        viewPager.setCurrentItem(if (display) 1 else 0, true)\\n    }\\n\\n    override fun isDisplayingSuggestions() = viewPager.currentItem == 1\\n\\n    override fun onBackPressedSupport(): Boolean {\\n        when {\\n            panelRootContainer.visibility == View.VISIBLE -> KeyboardUtil.hideKeyboard(panelRootContainer)\\n            isDisplayingSuggestions -> displaySuggestions(false)\\n            else -> pop()\\n        }\\n        return true\\n\\n    }\\n\\n    override fun onDestroy() {\\n        keyboardListener?.let { KeyboardUtil.detach(activity, it) }\\n        activity?.currentFocus?.let { KeyboardUtil.hideKeyboard(it) }\\n        super.onDestroy()\\n    }\\n\\n}",\n      "license": "mit"\n    },\n    "truncated_cells": []\n  }\n]',
    'NEGATIVE: DATASET_NAME: "vikp/starcoder_cleaned"\nFEATURES: {\'code\': {\'dtype\': \'string\', \'_type\': \'Value\'}, \'repo_path\': {\'dtype\': \'string\', \'_type\': \'Value\'}}\nDATA SAMPLE:\n[\n  {\n    "row_idx": 0,\n    "row": {\n      "code": "# ---\\n# jupyter:\\n#   jupytext:\\n#     text_representation:\\n#       extension: .py\\n#       format_name: light\\n#       format_version: \'1.5\'\\n#       jupytext_version: 1.14.4\\n#   kernelspec:\\n#     display_name: Python 3\\n#     language: python\\n#     name: python3\\n# ---\\n\\n# # 09 Strain Gage\\n#\\n# This is one of the most commonly used sensor.  It is used in many transducers.  Its fundamental operating principle is fairly easy to understand and it will be the purpose of this lecture. \\n#\\n# A strain gage is essentially a thin wire that is wrapped on film of plastic.  \\n# <img src=\\"img/StrainGage.png\\" width=\\"200\\">\\n# The strain gage is then mounted (glued) on the part for which the strain must be measured.  \\n# <img src=\\"img/Strain_gauge_2.jpg\\" width=\\"200\\">\\n#\\n# ## Stress, Strain\\n# When a beam is under axial load, the axial stress, $\\\\sigma_a$, is defined as:\\n# \\\\begin{align*}\\n# \\\\sigma_a = \\\\frac{F}{A}\\n# \\\\end{align*}\\n# with $F$ the axial load, and $A$ the cross sectional area of the beam under axial load.\\n#\\n# <img src=\\"img/BeamUnderStrain.png\\" width=\\"200\\">\\n#\\n# Under the load, the beam of length $L$ will extend by $dL$, giving rise to the definition of strain, $\\\\epsilon_a$:\\n# \\\\begin{align*}\\n# \\\\epsilon_a = \\\\frac{dL}{L}\\n# \\\\end{align*}\\n# The beam will also contract laterally: the cross sectional area is reduced by $dA$.  This results in a transverval strain $\\\\epsilon_t$.  The transversal and axial strains are related by the Poisson\'s ratio:\\n# \\\\begin{align*}\\n# \\\\nu = - \\\\frac{\\\\epsilon_t }{\\\\epsilon_a}\\n# \\\\end{align*}\\n# For a metal the Poission\'s ratio is typically $\\\\nu = 0.3$, for an incompressible material, such as rubber (or water), $\\\\nu = 0.5$.\\n#\\n# Within the elastic limit, the axial stress and axial strain are related through Hooke\'s law by the Young\'s modulus, $E$:\\n# \\\\begin{align*}\\n# \\\\sigma_a = E \\\\epsilon_a\\n# \\\\end{align*}\\n#\\n# <img src=\\"img/ElasticRegime.png\\" width=\\"200\\">\\n\\n# ## Resistance of a wire\\n#\\n# The electrical resistance of a wire $R$ is related to its physical properties (the electrical resistiviy, $\\\\rho$ in $\\\\Omega$/m) and its geometry: length $L$ and cross sectional area $A$.\\n#\\n# \\\\begin{align*}\\n# R = \\\\frac{\\\\rho L}{A}\\n# \\\\end{align*}\\n#\\n# Mathematically, the change in wire dimension will result inchange in its electrical resistance.  This can be derived from first principle:\\n# \\\\begin{align}\\n# \\\\frac{dR}{R} = \\\\frac{d\\\\rho}{\\\\rho} + \\\\frac{dL}{L} - \\\\frac{dA}{A}\\n# \\\\end{align}\\n# If the wire has a square cross section, then:\\n# \\\\begin{align*}\\n# A & = L\'^2 \\\\\\\\\\n# \\\\frac{dA}{A} & = \\\\frac{d(L\'^2)}{L\'^2} = \\\\frac{2L\'dL\'}{L\'^2} = 2 \\\\frac{dL\'}{L\'}\\n# \\\\end{align*}\\n# We have related the change in cross sectional area to the transversal strain.\\n# \\\\begin{align*}\\n# \\\\epsilon_t = \\\\frac{dL\'}{L\'}\\n# \\\\end{align*}\\n# Using the Poisson\'s ratio, we can relate then relate the change in cross-sectional area ($dA/A$) to axial strain $\\\\epsilon_a = dL/L$.\\n# \\\\begin{align*}\\n# \\\\epsilon_t &= - \\\\nu \\\\epsilon_a \\\\\\\\\\n# \\\\frac{dL\'}{L\'} &= - \\\\nu \\\\frac{dL}{L} \\\\; \\\\text{or}\\\\\\\\\\n# \\\\frac{dA}{A} & = 2\\\\frac{dL\'}{L\'} = -2 \\\\nu \\\\frac{dL}{L}\\n# \\\\end{align*}\\n# Finally we can substitute express $dA/A$ in eq. for $dR/R$ and relate change in resistance to change of wire geometry, remembering that for a metal $\\\\nu =0.3$:\\n# \\\\begin{align}\\n# \\\\frac{dR}{R} & = \\\\frac{d\\\\rho}{\\\\rho} + \\\\frac{dL}{L} - \\\\frac{dA}{A} \\\\\\\\\\n# & = \\\\frac{d\\\\rho}{\\\\rho} + \\\\frac{dL}{L} - (-2\\\\nu \\\\frac{dL}{L}) \\\\\\\\\\n# & = \\\\frac{d\\\\rho}{\\\\rho} + 1.6 \\\\frac{dL}{L} = \\\\frac{d\\\\rho}{\\\\rho} + 1.6 \\\\epsilon_a\\n# \\\\end{align}\\n# It also happens that for most metals, the resistivity increases with axial strain.  In general, one can then related the change in resistance to axial strain by defining the strain gage factor:\\n# \\\\begin{align}\\n# S = 1.6 + \\\\frac{d\\\\rho}{\\\\rho}\\\\cdot \\\\frac{1}{\\\\epsilon_a}\\n# \\\\end{align}\\n# and finally, we have:\\n# \\\\begin{align*}\\n# \\\\frac{dR}{R} =  S \\\\epsilon_a\\n# \\\\end{align*}\\n# $S$ is materials dependent and is typically equal to 2.0 for most commercially availabe strain gages. It is dimensionless.\\n#\\n# Strain gages are made of thin wire that is wraped in several loops, effectively increasing the length of the wire and therefore the sensitivity of the sensor.\\n#\\n# _Question:\\n#\\n# Explain why a longer wire is necessary to increase the sensitivity of the sensor_.\\n#\\n# Most commercially available strain gages have a nominal resistance (resistance under no load, $R_{ini}$) of 120 or 350 $\\\\Omega$.\\n#\\n# Within the elastic regime, strain is typically within the range $10^{-6} - 10^{-3}$, in fact strain is expressed in unit of microstrain, with a 1 microstrain = $10^{-6}$.  Therefore, changes in resistances will be of the same order.  If one were to measure resistances, we will need a dynamic range of 120 dB, whih is typically very expensive.  Instead, one uses the Wheatstone bridge to transform the change in resistance to a voltage, which is easier to measure and does not require such a large dynamic range.\\n\\n# ## Wheatstone bridge:\\n# <img src=\\"img/WheatstoneBridge.png\\" width=\\"200\\">\\n#\\n# The output voltage is related to the difference in resistances in the bridge:\\n# \\\\begin{align*}\\n# \\\\frac{V_o}{V_s} = \\\\frac{R_1R_3-R_2R_4}{(R_1+R_4)(R_2+R_3)}\\n# \\\\end{align*}\\n#\\n# If the bridge is balanced, then $V_o = 0$, it implies: $R_1/R_2 = R_4/R_3$.\\n#\\n# In practice, finding a set of resistors that balances the bridge is challenging, and a potentiometer is used as one of the resistances to do minor adjustement to balance the bridge.  If one did not do the adjustement (ie if we did not zero the bridge) then all the measurement will have an offset or bias that could be removed in a post-processing phase, as long as the bias stayed constant.\\n#\\n# If each resistance $R_i$ is made to vary slightly around its initial value, ie $R_i = R_{i,ini} + dR_i$.  For simplicity, we will assume that the initial value of the four resistances are equal, ie $R_{1,ini} = R_{2,ini} = R_{3,ini} = R_{4,ini} = R_{ini}$.  This implies that the bridge was initially balanced, then the output voltage would be:\\n#\\n# \\\\begin{align*}\\n# \\\\frac{V_o}{V_s} = \\\\frac{1}{4} \\\\left( \\\\frac{dR_1}{R_{ini}} - \\\\frac{dR_2}{R_{ini}} + \\\\frac{dR_3}{R_{ini}} - \\\\frac{dR_4}{R_{ini}} \\\\right)\\n# \\\\end{align*}\\n#\\n# Note here that the changes in $R_1$ and $R_3$ have a positive effect on $V_o$, while the changes in $R_2$ and $R_4$ have a negative effect on $V_o$.  In practice, this means that is a beam is a in tension, then a strain gage mounted on the branch 1 or 3 of the Wheatstone bridge will produce a positive voltage, while a strain gage mounted on branch 2 or 4 will produce a negative voltage.  One takes advantage of this to increase sensitivity to measure strain.\\n#\\n# ### Quarter bridge\\n# One uses only one quarter of the bridge, ie strain gages are only mounted on one branch of the bridge.\\n#\\n# \\\\begin{align*}\\n# \\\\frac{V_o}{V_s} = \\\\pm \\\\frac{1}{4} \\\\epsilon_a S\\n# \\\\end{align*}\\n# Sensitivity, $G$:\\n# \\\\begin{align*}\\n# G = \\\\frac{V_o}{\\\\epsilon_a} = \\\\pm \\\\frac{1}{4}S V_s\\n# \\\\end{align*}\\n#\\n#\\n# ### Half bridge\\n# One uses half of the bridge, ie strain gages are mounted on two branches of the bridge.\\n#\\n# \\\\begin{align*}\\n# \\\\frac{V_o}{V_s} = \\\\pm \\\\frac{1}{2} \\\\epsilon_a S\\n# \\\\end{align*}\\n#\\n# ### Full bridge\\n#\\n# One uses of the branches of the bridge, ie strain gages are mounted on each branch.\\n#\\n# \\\\begin{align*}\\n# \\\\frac{V_o}{V_s} = \\\\pm \\\\epsilon_a S\\n# \\\\end{align*}\\n#\\n# Therefore, as we increase the order of bridge, the sensitivity of the instrument increases.  However, one should be carefull how we mount the strain gages as to not cancel out their measurement.\\n\\n# _Exercise_\\n#\\n# 1- Wheatstone bridge\\n#\\n# <img src=\\"img/WheatstoneBridge.png\\" width=\\"200\\">\\n#\\n# > How important is it to know \\\\& match the resistances of the resistors you employ to create your bridge?\\n# > How would you do that practically?\\n# > Assume $R_1=120\\\\,\\\\Omega$, $R_2=120\\\\,\\\\Omega$, $R_3=120\\\\,\\\\Omega$, $R_4=110\\\\,\\\\Omega$, $V_s=5.00\\\\,\\\\text{V}$.  What is $V_\\\\circ$?\\n\\nVs = 5.00\\nVo = (120**2-120*110)/(230*240) * Vs\\nprint(\'Vo = \',Vo, \' V\')\\n\\n# typical range in strain a strain gauge can measure\\n# 1 -1000 micro-Strain\\nAxialStrain = 1000*10**(-6)  # axial strain\\nStrainGageFactor = 2\\nR_ini = 120 # Ohm\\nR_1 = R_ini+R_ini*StrainGageFactor*AxialStrain\\nprint(R_1)\\nVo = (120**2-120*(R_1))/((120+R_1)*240) * Vs\\nprint(\'Vo = \', Vo, \' V\')\\n\\n# > How important is it to know \\\\& match the resistances of the resistors you employ to create your bridge?\\n# > How would you do that practically?\\n# > Assume $R_1= R_2 =R_3=120\\\\,\\\\Omega$, $R_4=120.01\\\\,\\\\Omega$, $V_s=5.00\\\\,\\\\text{V}$.  What is $V_\\\\circ$?\\n\\nVs = 5.00\\nVo = (120**2-120*120.01)/(240.01*240) * Vs\\nprint(Vo)\\n\\n# 2- Strain gage 1:\\n#\\n# One measures the strain on a bridge steel beam.  The modulus of elasticity is $E=190$ GPa.  Only one strain gage is mounted on the bottom of the beam; the strain gage factor is $S=2.02$.\\n#\\n# > a) What kind of electronic circuit will you use?  Draw a sketch of it.\\n#\\n# > b) Assume all your resistors including the unloaded strain gage are balanced and measure $120\\\\,\\\\Omega$, and that the strain gage is at location $R_2$.  The supply voltage is $5.00\\\\,\\\\text{VDC}$.  Will $V_\\\\circ$ be positive or negative when a downward load is added?\\n\\n# In practice, we cannot have all resistances = 120 $\\\\Omega$.  at zero load, the bridge will be unbalanced (show $V_o \\\\neq 0$). How could we balance our bridge?\\n#\\n# Use a potentiometer to balance bridge, for the load cell, we \'\'zero\'\' the instrument.\\n#\\n# Other option to zero-out our instrument? Take data at zero-load, record the voltage, $V_{o,noload}$.  Substract $V_{o,noload}$ to my data.\\n\\n# > c) For a loading in which $V_\\\\circ = -1.25\\\\,\\\\text{mV}$, calculate the strain $\\\\epsilon_a$ in units of microstrain.\\n\\n# \\\\begin{align*}\\n# \\\\frac{V_o}{V_s} & = - \\\\frac{1}{4} \\\\epsilon_a S\\\\\\\\\\n# \\\\epsilon_a & = -\\\\frac{4}{S} \\\\frac{V_o}{V_s}\\n# \\\\end{align*}\\n\\nS = 2.02\\nVo = -0.00125\\nVs = 5\\neps_a = -1*(4/S)*(Vo/Vs)\\nprint(eps_a)\\n\\n# > d) Calculate the axial stress (in MPa) in the beam under this load.\\n\\n\\n\\n# > e) You now want more sensitivity in your measurement, you install a second strain gage on to\\n\\n# p of the beam.  Which resistor should you use for this second active strain gage?\\n#\\n# > f) With this new setup and the same applied load than previously, what should be the output voltage?\\n\\n# 3- Strain Gage with Long Lead Wires \\n#\\n# <img src=\\"img/StrainGageLongWires.png\\" width=\\"360\\">\\n#\\n# A quarter bridge strain gage Wheatstone bridge circuit is constructed with $120\\\\,\\\\Omega$ resistors and a $120\\\\,\\\\Omega$ strain gage.  For this practical application, the strain gage is located very far away form the DAQ station and the lead wires to the strain gage are $10\\\\,\\\\text{m}$ long and the lead wire have a resistance of $0.080\\\\,\\\\Omega/\\\\text{m}$.  The lead wire resistance can lead to problems since $R_{lead}$ changes with temperature.\\n#\\n# > Design a modified circuit that will cancel out the effect of the lead wires.\\n\\n# ## Homework\\n#\\n",\n      "repo_path": "Lectures/09_StrainGage.ipynb"\n    },\n    "truncated_cells": []\n  },\n  {\n    "row_idx": 1,\n    "row": {\n      "code": "# ---\\n# jupyter:\\n#   jupytext:\\n#     split_at_heading: true\\n#     text_representation:\\n#       extension: .py\\n#       format_name: light\\n#       format_version: \'1.5\'\\n#       jupytext_version: 1.14.4\\n#   kernelspec:\\n#     display_name: Python 3\\n#     language: python\\n#     name: python3\\n# ---\\n\\n#export\\nfrom fastai.basics import *\\nfrom fastai.tabular.core import *\\nfrom fastai.tabular.model import *\\n\\nfrom fastai.tabular.data import *\\n\\n#hide\\nfrom nbdev.showdoc import *\\n\\n\\n# +\\n#default_exp tabular.learner\\n# -\\n\\n# # Tabular learner\\n#\\n# > The function to immediately get a `Learner` ready to train for tabular data\\n\\n# The main function you probably want to use in this module is `tabular_learner`. It will automatically create a `TabulaModel` suitable for your data and infer the irght loss function. See the [tabular tutorial](http://docs.fast.ai/tutorial.tabular) for an example of use in context.\\n\\n# ## Main functions\\n\\n#export\\n@log_args(but_as=Learner.__init__)\\nclass TabularLearner(Learner):\\n    \\"`Learner` for tabular data\\"\\n    def predict(self, row):\\n        tst_to = self.dls.valid_ds.new(pd.DataFrame(row).T)\\n        tst_to.process()\\n        tst_to.conts = tst_to.conts.astype(np.float32)\\n        dl = self.dls.valid.new(tst_to)\\n        inp,preds,_,dec_preds = self.get_preds(dl=dl, with_input=True, with_decoded=True)\\n        i = getattr(self.dls, \'n_inp\', -1)\\n        b = (*tuplify(inp),*tuplify(dec_preds))\\n        full_dec = self.dls.decode((*tuplify(inp),*tuplify(dec_preds)))\\n        return full_dec,dec_preds[0],preds[0]\\n\\n\\nshow_doc(TabularLearner, title_level=3)\\n\\n\\n# It works exactly as a normal `Learner`, the only difference is that it implements a `predict` method specific to work on a row of data.\\n\\n#export\\n@log_args(to_return=True, but_as=Learner.__init__)\\n@delegates(Learner.__init__)\\ndef tabular_learner(dls, layers=None, emb_szs=None, config=None, n_out=None, y_range=None, **kwargs):\\n    \\"Get a `Learner` using `dls`, with `metrics`, including a `TabularModel` created using the remaining params.\\"\\n    if config is None: config = tabular_config()\\n    if layers is None: layers = [200,100]\\n    to = dls.train_ds\\n    emb_szs = get_emb_sz(dls.train_ds, {} if emb_szs is None else emb_szs)\\n    if n_out is None: n_out = get_c(dls)\\n    assert n_out, \\"`n_out` is not defined, and could not be infered from data, set `dls.c` or pass `n_out`\\"\\n    if y_range is None and \'y_range\' in config: y_range = config.pop(\'y_range\')\\n    model = TabularModel(emb_szs, len(dls.cont_names), n_out, layers, y_range=y_range, **config)\\n    return TabularLearner(dls, model, **kwargs)\\n\\n\\n# If your data was built with fastai, you probably won\'t need to pass anything to `emb_szs` unless you want to change the default of the library (produced by `get_emb_sz`), same for `n_out` which should be automatically inferred. `layers` will default to `[200,100]` and is passed to `TabularModel` along with the `config`.\\n#\\n# Use `tabular_config` to create a `config` and cusotmize the model used. There is just easy access to `y_range` because this argument is often used.\\n#\\n# All the other arguments are passed to `Learner`.\\n\\npath = untar_data(URLs.ADULT_SAMPLE)\\ndf = pd.read_csv(path/\'adult.csv\')\\ncat_names = [\'workclass\', \'education\', \'marital-status\', \'occupation\', \'relationship\', \'race\']\\ncont_names = [\'age\', \'fnlwgt\', \'education-num\']\\nprocs = [Categorify, FillMissing, Normalize]\\ndls = TabularDataLoaders.from_df(df, path, procs=procs, cat_names=cat_names, cont_names=cont_names, \\n                                 y_names=\\"salary\\", valid_idx=list(range(800,1000)), bs=64)\\nlearn = tabular_learner(dls)\\n\\n#hide\\ntst = learn.predict(df.iloc[0])\\n\\n# +\\n#hide\\n#test y_range is passed\\nlearn = tabular_learner(dls, y_range=(0,32))\\nassert isinstance(learn.model.layers[-1], SigmoidRange)\\ntest_eq(learn.model.layers[-1].low, 0)\\ntest_eq(learn.model.layers[-1].high, 32)\\n\\nlearn = tabular_learner(dls, config = tabular_config(y_range=(0,32)))\\nassert isinstance(learn.model.layers[-1], SigmoidRange)\\ntest_eq(learn.model.layers[-1].low, 0)\\ntest_eq(learn.model.layers[-1].high, 32)\\n\\n\\n# -\\n\\n#export\\n@typedispatch\\ndef show_results(x:Tabular, y:Tabular, samples, outs, ctxs=None, max_n=10, **kwargs):\\n    df = x.all_cols[:max_n]\\n    for n in x.y_names: df[n+\'_pred\'] = y[n][:max_n].values\\n    display_df(df)\\n\\n\\n# ## Export -\\n\\n#hide\\nfrom nbdev.export import notebook2script\\nnotebook2script()\\n\\n\\n",\n      "repo_path": "nbs/43_tabular.learner.ipynb"\n    },\n    "truncated_cells": []\n  }\n]',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Triplet

Evaluated with TripletEvaluator

Metric	Value
cosine_accuracy	1.0
dot_accuracy	0.0
manhattan_accuracy	1.0
euclidean_accuracy	1.0
max_accuracy	1.0

Training Details

Training Dataset

query-to-dataset-viewer-descriptions

Dataset: query-to-dataset-viewer-descriptions
Size: 1,141 training samples
Columns: query, positive, and negative

Approximate statistics based on the first 1000 samples:

	query	positive	negative
type	string	string	string
details	min: 9 tokens mean: 11.72 tokens max: 19 tokens	min: 40 tokens mean: 2018.88 tokens max: 8192 tokens	min: 41 tokens mean: 2125.25 tokens max: 8192 tokens

Samples:

query	positive	negative
`USER_QUERY: LLM paper dataset`	HUB_DATASET_PREVIEW: DATASET_NAME: "MarkrAI/AutoRAG-evaluation-2024-LLM-paper-v1" FEATURES: {'doc_id': {'dtype': 'string', '_type': 'Value'}, 'contents': {'dtype': 'string', '_type': 'Value'}, 'metadata': {'creation_datetime': {'dtype': 'string', '_type': 'Value'}, 'file_name': {'dtype': 'string', '_type': 'Value'}, 'file_path': {'dtype': 'string', '_type': 'Value'}, 'file_size': {'dtype': 'int64', '_type': 'Value'}, 'file_type': {'dtype': 'null', '_type': 'Value'}, 'last_accessed_datetime': {'dtype': 'string', '_type': 'Value'}, 'last_modified_datetime': {'dtype': 'string', '_type': 'Value'}}} DATA SAMPLE: [ { "row_idx": 0, "row": { "doc_id": "6f86094c-47fe-43de-a77a-e8c34c69c997", "contents": "# Rag-Driver: Generalisable Driving Explanations With Retrieval-Augmented In-Context Learning In Multi-Modal Large Language Model\n\nJianhao Yuan1, Shuyang Sun1, Daniel Omeiza1, Bo Zhao2, Paul Newman1, Lars Kunze1, Matthew Gadd1\n1 University of Oxford 2 Beijing Academy of Artificial Intelligence\n{jianhaoyuan,kevinsun,daniel,pnewman,lars,mattgadd}@robots.ox.ac.uk \nAbstract\u2014Robots powered by 'blackbox' models need to provide\nhuman-understandable explanations which we can trust. Hence,\nexplainability plays a critical role in trustworthy autonomous\ndecision-making to foster transparency and acceptance among\nend users, especially in complex autonomous driving. Recent\nadvancements in Multi-Modal Large Language models (MLLMs)\nhave shown promising potential in enhancing the explainability\nas a driving agent by producing control predictions along with\nnatural language explanations. However, severe data scarcity\ndue to expensive annotation costs and significant domain gaps\nbetween different datasets makes the development of a robust and\ngeneralisable system an extremely challenging task. Moreover, the\nprohibitively expensive training requirements of MLLM and the\nunsolved problem of catastrophic forgetting further limit their\ngeneralisability post-deployment. To address these challenges, we\npresent RAG-Driver, a novel retrieval-augmented multi-modal\nlarge language model that leverages in-context learning for high-\nperformance, explainable, and generalisable autonomous driving.\nBy grounding in retrieved expert demonstration, we empirically\nvalidate that RAG-Driver achieves state-of-the-art performance in\nproducing driving action explanations, justifications, and control\nsignal prediction. More importantly, it exhibits exceptional zero-\nshot generalisation capabilities to unseen environments without \nfurther training endeavours1.\nIndex Terms\u2014Autonomous driving, multi-modal language\nmodel, end-to-end driving, domain generalisation", "metadata": { "creation_datetime": "2024-03-04", "file_name": "2402.10828v1.md", "file_path": "paper_data/2402.10828v1.md", "file_size": 64885, "file_type": null, "last_accessed_datetime": "2024-03-04", "last_modified_datetime": "2024-02-22" } }, "truncated_cells": [] }, { "row_idx": 1, "row": { "doc_id": "cf485ad0-8ec4-4a63-a0c6-5d7eb499c0c8", "contents": "# Rag-Driver: Generalisable Driving Explanations With Retrieval-Augmented In-Context Learning In Multi-Modal Large Language Model\n## I. Introduction\n\nDriven by the emerging development of deep learning, autonomous driving has observed a paradigm shift from rulesbased decision systems [66, 21] to data-driven learning-based approaches [28, 6, 36]. However, this comes at the cost of transparency in decision-making, especially for end-to-end autonomous driving systems which are considered black-box in nature [13]. Thus, in addition to precision in action control, explanation provision is key in ensuring trustworthy decisionmaking to reconcile the system's decisions with end-user expectations to foster confidence and acceptance [79, 8, 57] in dynamic driving environments. \nTraditional approaches have mainly relied on attention visualisation [5, 7, 55] as a proxy to rationalise the decisions of the black-box systems or auxiliary intermediate tasks such as semantic segmentation [25, 32], object detection [16, 31], and affordance prediction [68, 45] provide meaningful intermediate representation for decision-making. However, these methods do not engage end-users in the dialogue as they are onedirectional and not readily comprehensible by the general users for the purpose of fostering trust and confidence. An alternative promising approach is the integration of natural language explanations [38, 33, 54], in particular through Multi-Modal Large Language Models (MLLMs) [1, 70]. These models, pretrained on extensive web-scale datasets, demonstrate remarkable reasoning capacity, enabling the transformation of complex vehicular decision-making processes into more understandable narrative formats, thereby offering a new layer of explainability to conventional systems. \nWhile several early attempts have demonstrated the potential of MLLMs as general explainable driving agents [78, 76, 51], these methods fall short of human-level understanding. One of the limitations is their failure to generalise to unseen environments. A primary obstacle is the lack of high-quality annotated data [56], coupled with the significant domain shift across various datasets [23], which hinders the models' generalisation capacity to novel environments outside of the training data distribution. Another critical challenge is the prohibitively expensive training requirement and the unsolved problem of catastrophic forgetting [39], which make re-training or finetuning impractical solutions due to the immense computational demands and severe performance degradation. Consequently, this further limits the models' generalisability after deployment, as they struggle to effectively utilise new data in constantly evolving environments and driving scenarios. \nTo address these challenges, we introduce RAG-Driver, a novel retrieval-augment", "metadata": { "creation_datetime": "2024-03-04", "file_name": "2402.10828v1.md", "file_path": "paper_data/2402.10828v1.md", "file_size": 64885, "file_type": null, "last_accessed_datetime": "2024-03-04", "last_modified_datetime": "2024-02-22" } }, "truncated_cells": [] } ]	NEGATIVE: DATASET_NAME: "emozilla/dolma-v1_7-arxiv" FEATURES: {'text': {'dtype': 'string', '_type': 'Value'}, 'id': {'dtype': 'string', '_type': 'Value'}, 'metadata': {'file_path': {'dtype': 'string', '_type': 'Value'}}} DATA SAMPLE: [ { "row_idx": 0, "row": { "text": "\section{Introduction}\nLet $G$ be a simple undirected graph with the \textit{vertex set} $V(G)$ and the \textit{edge set} $E(G)$. A vertex with degree one is called a \textit{pendant vertex}. The distance between the vertices $u$ and $v$ in graph $G$ is denoted by $d_G(u,v)$. A cycle $C$ is called \textit{chordless} if $C$ has no \textit{cycle chord} (that is an edge not in the edge set of $C$ whose endpoints lie on the vertices of $C$).\nThe \textit{Induced subgraph} on vertex set $S$ is denoted by $\langle S\rangle$. A path that starts in $v$ and ends in $u$ is denoted by $\stackrel\frown{v u}$.\nA \textit{traceable} graph is a graph that possesses a Hamiltonian path.\nIn a graph $G$, we say that a cycle $C$ is \textit{formed by the path} $Q$ if $
`USER_QUERY: code vulnerability dataset`	HUB_DATASET_PREVIEW: DATASET_NAME: "benjis/bigvul" FEATURES: {'CVE ID': {'dtype': 'string', '_type': 'Value'}, 'CVE Page': {'dtype': 'string', '_type': 'Value'}, 'CWE ID': {'dtype': 'string', 'type': 'Value'}, 'codeLink': {'dtype': 'string', 'type': 'Value'}, 'commit_id': {'dtype': 'string', 'type': 'Value'}, 'commit_message': {'dtype': 'string', 'type': 'Value'}, 'func_after': {'dtype': 'string', 'type': 'Value'}, 'func_before': {'dtype': 'string', 'type': 'Value'}, 'lang': {'dtype': 'string', 'type': 'Value'}, 'project': {'dtype': 'string', 'type': 'Value'}, 'vul': {'dtype': 'int8', 'type': 'Value'}} DATA SAMPLE: [ { "row_idx": 0, "row": { "CVE ID": "CVE-2017-7586", "CVE Page": "https://www.cvedetails.com/cve/CVE-2017-7586/", "CWE ID": "CWE-119", "codeLink": "https://github.com/erikd/libsndfile/commit/708e996c87c5fae77b104ccfeb8f6db784c32074", "commit_id": "708e996c87c5fae77b104ccfeb8f6db784c32074", "commit_message": "src/ : Move to a variable length header buffer\n\nPreviously, the psf->header buffer was a fixed length specified by\nSF_HEADER_LEN which was set to 12292. This was problematic for\ntwo reasons; this value was un-necessarily large for the majority\nof files and too small for some others.\n\nNow the size of the header buffer starts at 256 bytes and grows as\nnecessary up to a maximum of 100k.", "func_after": "psf_get_date_str (char str, int maxlen)\n{\ttime_t\t\tcurrent ;\n\tstruct tm\ttimedata, tmptr ;\n\n\ttime (¤t) ;\n\n#if defined (HAVE_GMTIME_R)\n\t/ If the re-entrant version is available, use it. /\n\ttmptr = gmtime_r (¤t, &timedata) ;\n#elif defined (HAVE_GMTIME)\n\t/ Otherwise use the standard one and copy the data to local storage. /\n\ttmptr = gmtime (¤t) ;\n\tmemcpy (&timedata, tmptr, sizeof (timedata)) ;\n#else\n\ttmptr = NULL ;\n#endif\n\n\tif (tmptr)\n\t\tsnprintf (str, maxlen, "%4d-%02d-%02d %02d:%02d:%02d UTC",\n\t\t\t1900 + timedata.tm_year, timedata.tm_mon, timedata.tm_mday,\n\t\t\ttimedata.tm_hour, timedata.tm_min, timedata.tm_sec) ;\n\telse\n\t\tsnprintf (str, maxlen, "Unknown date") ;\n\n\treturn ;\n} / psf_get_date_str /\n", "func_before": "psf_get_date_str (char str, int maxlen)\n{\ttime_t\t\tcurrent ;\n\tstruct tm\ttimedata, tmptr ;\n\n\ttime (¤t) ;\n\n#if defined (HAVE_GMTIME_R)\n\t/ If the re-entrant version is available, use it. /\n\ttmptr = gmtime_r (¤t, &timedata) ;\n#elif defined (HAVE_GMTIME)\n\t/ Otherwise use the standard one and copy the data to local storage. /\n\ttmptr = gmtime (¤t) ;\n\tmemcpy (&timedata, tmptr, sizeof (timedata)) ;\n#else\n\ttmptr = NULL ;\n#endif\n\n\tif (tmptr)\n\t\tsnprintf (str, maxlen, "%4d-%02d-%02d %02d:%02d:%02d UTC",\n\t\t\t1900 + timedata.tm_year, timedata.tm_mon, timedata.tm_mday,\n\t\t\ttimedata.tm_hour, timedata.tm_min, timedata.tm_sec) ;\n\telse\n\t\tsnprintf (str, maxlen, "Unknown date") ;\n\n\treturn ;\n} / psf_get_date_str /\n", "lang": "C", "project": "libsndfile", "vul": 0 }, "truncated_cells": [] }, { "row_idx": 1, "row": { "CVE ID": "CVE-2018-18352", "CVE Page": "https://www.cvedetails.com/cve/CVE-2018-18352/", "CWE ID": "CWE-732", "codeLink": "https://github.com/chromium/chromium/commit/a9cbaa7a40e2b2723cfc2f266c42f4980038a949", "commit_id": "a9cbaa7a40e2b2723cfc2f266c42f4980038a949", "commit_message": "Simplify "WouldTaintOrigin" concept in media/blink\n\nCurrently WebMediaPlayer has three predicates:\n - DidGetOpaqueResponseFromServiceWorker\n - HasSingleSecurityOrigin\n - DidPassCORSAccessCheck\n. These are used to determine whether the response body is available\nfor scripts. They are known to be confusing, and actually\nMediaElementAudioSourceHandler::WouldTaintOrigin misuses them.\n\nThis CL merges the three predicates to one, WouldTaintOrigin, to remove\nthe confusion. Now the "response type" concept is available and we\ndon't need a custom CORS check, so this CL removes\nBaseAudioContext::WouldTaintOrigin. This CL also renames\nURLData::has_opaque_data and its (direct and indirect) data accessors\nto match the spec.\n\nBug: 849942, 875153\nChange-Id: I6acf50169d7445c4ff614e80ac606f79ee577d2a\nReviewed-on: https://chromium-review.googlesource.com/c/1238098\nReviewed-by: Fredrik Hubinette [email protected]\nReviewed-by: Kinuko Yasuda [email protected]\nReviewed-by: Raymond Toy [email protected]\nCommit-Queue: Yutaka Hirano [email protected]\nCr-Commit-Position: refs/heads/master@{#598258}", "func_after": "void MultibufferDataSource::CreateResourceLoader(int64_t first_byte_position,\n int64_t last_byte_position) {\n DCHECK(render_task_runner->BelongsToCurrentThread());\n\n SetReader(new MultiBufferReader(\n url_data()->multibuffer(), first_byte_position, last_byte_position,\n base::Bind(&MultibufferDataSource::ProgressCallback, weak_ptr)));\n reader->SetIsClientAudioElement(is_client_audio_element);\n UpdateBufferSizes();\n}\n", "func_before": "void MultibufferDataSource::CreateResourceLoader(int64_t first_byte_position,\n int64_t last_byte_position) {\n DCHECK(render_task_runner->BelongsToCurrentThread());\n\n SetReader(new MultiBufferReader(\n url_data()->multibuffer(), first_byte_position, last_byte_position,\n base::Bind(&MultibufferDataSource::ProgressCallback, weak_ptr)));\n reader->SetIsClientAudioElement(is_client_audio_element);\n UpdateBufferSizes();\n}\n", "lang": "C", "project": "Chrome", "vul": 0 }, "truncated_cells": [] } ]	NEGATIVE: DATASET_NAME: "sfakhoury/NL2Fix" FEATURES: {'defects4j_project': {'dtype': 'string', '_type': 'Value'}, 'defects4j_bug_id': {'dtype': 'string', '_type': 'Value'}, 'file_path': {'dtype': 'string', '_type': 'Value'}, 'bug_start_line': {'dtype': 'string', '_type': 'Value'}, 'bug_end_line': {'dtype': 'string', '_type': 'Value'}, 'issue_title': {'dtype': 'string', '_type': 'Value'}, 'issue_description': {'dtype': 'string', '_type': 'Value'}, 'original_src': {'dtype': 'string', '_type': 'Value'}, 'original_src_wo_comments': {'dtype': 'string', '_type': 'Value'}, 'fixed_src': {'dtype': 'string', '_type': 'Value'}, 'fixed_src_wo_comments': {'dtype': 'string', '_type': 'Value'}} DATA SAMPLE: [ { "row_idx": 0, "row": { "defects4j_project": "Math", "defects4j_bug_id": "19", "file_path": "src/main/java/org/apache/commons/math3/optimization/direct/CMAESOptimizer.java", "bug_start_line": "504", "bug_end_line": "561", "issue_title": "Wide bounds to CMAESOptimizer result in NaN parameters passed to fitness function", "issue_description": "If you give large values as lower/upper bounds (for example -Double.MAX_VALUE as a lower bound), the optimizer can call the fitness function with parameters set to NaN. My guess is this is due to FitnessFunction.encode/decode generating NaN when normalizing/denormalizing parameters. For example, if the difference between the lower and upper bound is greater than Double.MAX_VALUE, encode could divide infinity by infinity.", "original_src": "private void checkParameters() {\n final double[] init = getStartPoint();\n final double[] lB = getLowerBound();\n final double[] uB = getUpperBound();\n\n // Checks whether there is at least one finite bound value.\n boolean hasFiniteBounds = false;\n for (int i = 0; i < lB.length; i++) {\n if (!Double.isInfinite(lB[i])
`USER_QUERY: english korean translation dataset`	HUB_DATASET_PREVIEW: DATASET_NAME: "yoonjae22/Aihub_translate" FEATURES: {'instruction': {'dtype': 'string', '_type': 'Value'}, 'output': {'dtype': 'string', '_type': 'Value'}, 'text': {'dtype': 'string', '_type': 'Value'}, 'input': {'dtype': 'string', '_type': 'Value'}} DATA SAMPLE: [ { "row_idx": 0, "row": { "input": "Bible Coloring' is a coloring application that allows you to experience beautiful stories in the Bible.", "output": "'Bible Coloring'\uc740 \uc131\uacbd\uc758 \uc544\ub984\ub2e4\uc6b4 \uc774\uc57c\uae30\ub97c \uccb4\ud5d8 \ud560 \uc218 \uc788\ub294 \uceec\ub7ec\ub9c1 \uc571\uc785\ub2c8\ub2e4.", "instruction": "Please translate the English sentence into Korean.", "text": "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\nBible Coloring' is a coloring application that allows you to experience beautiful stories in the Bible.\n\n###Response:\n'Bible Coloring'\uc740 \uc131\uacbd\uc758 \uc544\ub984\ub2e4\uc6b4 \uc774\uc57c\uae30\ub97c \uccb4\ud5d8 \ud560 \uc218 \uc788\ub294 \uceec\ub7ec\ub9c1 \uc571\uc785\ub2c8\ub2e4." }, "truncated_cells": [] }, { "row_idx": 1, "row": { "input": "Do you work at a City bank?", "output": "\uc528\ud2f0\uc740\ud589\uc5d0\uc11c \uc77c\ud558\uc138\uc694?", "instruction": "Please translate the English sentence into Korean.", "text": "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\nDo you work at a City bank?\n\n###Response:\n\uc528\ud2f0\uc740\ud589\uc5d0\uc11c \uc77c\ud558\uc138\uc694?" }, "truncated_cells": [] } ]	NEGATIVE: DATASET_NAME: "werty1248/EnKo-Translation-LongTextOnly-dedup" FEATURES: {'english': {'dtype': 'string', '_type': 'Value'}, 'korean': {'dtype': 'string', '_type': 'Value'}, 'from': {'dtype': 'string', '_type': 'Value'}, 'category': {'dtype': 'string', '_type': 'Value'}} DATA SAMPLE: [ { "row_idx": 0, "row": { "english": "ROOFTOP GREENING STRUCTURETo provide a structure firmly and easily installing a house cultivation arch-like aggregate in a rooftop greening structure. This rooftop greening structure includes pressingly fixing each of support stands 24 in each of support stand line groups 24A to a rooftop slab surface through a greening support layer 6, using a fastener which pierces into the greening support layer 6, and steps over between each of the support stands 24 and the rooftop slab surface 2, and installing a holding member 36 for holding a house cultivation arch-like aggregate 50 each on the upper end surface of each of the support stands 24 in each of the support stand line groups 24A. As a result of this, the support stand 24 which has stiffness higher than the greening support layer 6, and is firmly fixed to the rooftop slab surface 2 through the greening support layer 6 is used for holding the end part of the arch-like aggregate 50. The holding member 36 for holding the end part of the arch-like aggregate 50 is installed on the upper end surface of the support stand 24 so as to suppress the holding member 36 from burying in soil and increase the visibility.In a rooftop greening structure in which a greening support layer is formed by laying a plurality of greening support panels on the rooftop floor and soil is arranged on the greening support layer, a pair of support stands are placed on the greening support layer. The rows are arranged so as to be separated from each other, and each of the support rows is arranged upright so as to form a row with a plurality of supports having higher rigidity than the greening support layer. Each support pedestal in each support pedestal row group is configured through the greening support layer by using a fastener that penetrates the greening support layer and straddles between each support pedestal and the rooftop floor surface. It is characterized in that it is pressed and fixed to the rooftop floor surface, and the upper end surface of each support stand in each support stand row group is provided with a holding portion for holding an arch-shaped aggregate for house cultivation. Rooftop greening structure.", "korean": "\uc625\uc0c1 \ub179\ud654 \uad6c\uc870\uc625\uc0c1 \ub179\ud654 \uad6c\uc870\uc5d0 \uc788\uc5b4\uc11c \ud558\uc6b0\uc2a4 \uc7ac\ubc30\uc6a9 \uc544\uce58\ud615 \uace8\uc7ac\ub97c \uacac\uace0\ud558\uace0 \uc6a9\uc774\ud558\uac8c \uace0\uc815\ud558\ub294 \uad6c\uc870\ub97c \uc81c\uacf5\ud55c\ub2e4. \uac01 \uc9c0\uc9c0\ub300\ub82c\uad70 24 A\uc758 \uac01 \uc9c0\uc9c0\ub300 24\ub97c \ub179\ud654 \uc9c0\uc6d0\uce35 6\uc744 \uad00\ud1b5\ud574 \uac01 \uc9c0\uc9c0\ub300 24\uc640 \uc625\uc0c1 \uc2ac\ub798\ube0c\uba74 2 \uc0ac\uc774\ub97c \ub118\ub294 \uace0\uc815\uad6c\ub97c \uc774\uc6a9\ud568\uc73c\ub85c\uc368, \ub179\ud654 \uc9c0\uc6d0\uce35 6\uc744 \ud1b5\ud574 \uc0c1\uae30 \uc625\uc0c1 \uc2ac\ub798\ube0c\uba74\uc5d0 \uac00\uc555 \uace0\uc815\ud558\uace0 \uadf8 \uac01 \uc9c0\uc9c0\ub300\ub82c\uad70 24 A\uc758 \uac01 \uc9c0\uc9c0\ub300 24\uc758 \uc0c1\ub2e8\uba74\uc5d0 \ud558\uc6b0\uc2a4 \uc7ac\ubc30\uc6a9 \uc544\uce58\ud615 \uace8\uc7ac 50\uc744 \uc9c0\uc9c0\ud558\uae30 \uc704\ud55c \uc9c0\uc9c0 \ubd80\uc7ac 36\uc744 \uac01\uac01 \ub9c8\ub828\ud55c\ub2e4. \uc774\uac83\uc5d0 \uc758\ud574 \uc544\uce58\ud615 \uace8\uc7ac 50\uc758 \ub2e8\ubd80\ub97c \uc9c0\uc9c0\ud558\ub294 \uac83\uc73c\ub85c\uc11c \ub179\ud654 \uc9c0\uc6d0\uce35 6\ubcf4\ub2e4 \uac15\uc131\uc774 \ub192\uace0 \uc625\uc0c1 \uc2ac\ub798\ube0c\uba74 2\uc5d0 \ub179\ud654 \uc9c0\uc6d0\uce35 6\uc744 \ud1b5\ud574 \uc81c\ub300\ub85c \uace0\uc815\ub41c \uc9c0\uc9c0\ub300 24\uac00 \uc774\uc6a9\ub418\ub3c4\ub85d \ud55c\ub2e4. \ub610\ud55c \uc544\uce58\ud615 \uace8\uc7ac 50\uc758 \ub2e8\ubd80\ub97c \uc9c0\uc9c0\ud558\ub294 \uc9c0\uc9c0 \ubd80\uc7ac 36\uc744 \uc9c0\uc9c0\ub300 24\uc758 \uc0c1\ub2e8\uba74\uc5d0 \ub9c8\ub828\ud568\uc73c\ub85c\uc368, \ud1a0\uc591\uc5d0 \ud30c\ubb3b\ud788\ub294 \uac83\uc744 \uc5b5\uc81c\ud558\uace0 \uadf8 \uc9c0\uc9c0 \ubd80\uc7ac 36\uc758 \uc2dc\uc778\uc131\uc744 \ud5a5\uc0c1\uc2dc\ud0a8\ub2e4.\uc625\uc0c1 \ubc14\ub2e5\uba74\uc0c1\uc5d0 \ubcf5\uc218\uc758 \ub179\ud654 \uc9c0\uc6d0 \ud328\ub110\uc744 \ubd80\uc124\ud568\uc73c\ub85c\uc368 \ub179\ud654 \uc9c0\uc6d0\uce35\uc774 \ud615\uc131\ub418\uace0 \uc0c1\uae30 \ub179\ud654 \uc9c0\uc6d0\uce35\uc0c1\uc5d0 \ud1a0\uc591\uc774 \ubc30\uc124\ub418\ub294 \uc625\uc0c1 \ub179\ud654 \uad6c\uc870\uc5d0 \uc788\uc5b4\uc11c \uc0c1\uae30 \ub179\ud654 \uc9c0\uc6d0\uce35\uc0c1\uc5d0 \ud55c \uc30d\uc758 \uc9c0\uc9c0\ub300\ub82c\uad70\uc774 \uc11c\ub85c \uc774\uaca9\ub41c \uc0c1\ud0dc\ub97c \uac00\uc9c0\uace0 \ubc30\uce58\ub418\uace0 \uc0c1\uae30 \uac01 \uc9c0\uc9c0\ub300\ub82c\uad70\uc774 \uc0c1\uae30 \ub179\ud654 \uc9c0\uc6d0\uce35\ubcf4\ub2e4 \uac15\uc131\uc774 \ud5a5\uc0c1\ub41c \ubcf5\uc218\uc758 \uc9c0\uc9c0\ub300\ub97c \uac04\uaca9\uc744 \ub450\uba74\uc11c \uc5f4\uc744 \uc774\ub8e8\ub3c4\ub85d \uc785\uc124 \ubc30\uce58\ud568\uc73c\ub85c\uc368 \uad6c\uc131\ub418\uace0 \uc0c1\uae30 \uac01 \uc9c0\uc9c0\ub300\ub82c\uad70\uc758 \uac01 \uc9c0\uc9c0\ub300\uac00 \uc0c1\uae30 \ub179\ud654 \uc9c0\uc6d0\uce35\uc744 \uad00\ud1b5\ud574 \uc0c1\uae30 \uac01 \uc9c0\uc9c0\ub300\uc640 \uc0c1\uae30 \uc625\uc0c1 \ubc14\ub2e5\uba74 \uc0ac\uc774\ub97c \ub118\ub294 \uace0\uc815\uad6c\ub97c \uc774\uc6a9\ud568\uc73c\ub85c\uc368, \uc0c1\uae30 \ub179\ud654 \uc9c0\uc6d0\uce35\uc744 \ud1b5\ud574 \uc0c1\uae30 \uc625\uc0c1 \ubc14\ub2e5\uba74\uc5d0 \uac00\uc555 \uace0\uc815\ub418\uc5b4 \uc0c1\uae30 \uac01 \uc9c0\uc9c0\ub300\ub82c\uad70\uc758 \uac01 \uc9c0\uc9c0\ub300\uc758 \uc0c1\ub2e8\uba74\uc5d0\ub294 \ud558\uc6b0\uc2a4 \uc7ac\ubc30\uc6a9 \uc544\uce58\ud615 \uace8\uc7ac\ub97c \uc9c0\uc9c0\ud558\uae30 \uc704\ud55c \uc9c0\uc9c0\ubd80\uac00 \uac01\uac01 \uad6c\ube44\ub418\uc5b4 \uc788\ub294, \uac83\uc744 \ud2b9\uc9d5\uc73c\ub85c \ud558\ub294 \uc625\uc0c1 \ub179\ud654 \uad6c\uc870.", "from": "nayohan/aihub-en-ko-translation-12m", "category": "full" }, "truncated_cells": [] }, { "row_idx": 1, "row": { "english": "Native chicken breeding methodThe invention discloses a native chicken breeding method, which includes steps that the shield degree of a breeding grove is 60-65%; a native chicken activity field with area of 5-8 mu is encircled bya 1.8-2.2m high nylon mesh; a ventilating and warming device is arranged in a henhouse; feed and water are delivered at 8: 00-15: 00 in every morning, and native chicken are put in grove activity field at 15:00-17: 00 in the afternoon; music is displayed at 17: 00-18: 30, and feed is delivered at outside of the henhouse to domesticize the native chickens , and then chickens are returned to the henhouse; the henhouse is cleaned at intervals of 12-15 days; the henhouse is sterilized by an automatic sterilizing system during the stocking period in the afternoon at intervals of 3-5 days. The native chicken breeding method can well consider about the stocking time, thus the stocking instinct of the native chickens is well guaranteed, the food intake of the native chickens is increased throughthe reasonable captive time; the meat growth is accelerated, the breeding cycle is shortened, and the meat quality of the native chickens is ensured.A kind of 1. cultural method of chicken, it is characterised in that\uff1ait the described method comprises the following steps\uff1a\uff081\uff09selection cultivation ground\uff1aselection away from livestock and poultry transaction place, slaughtering field, chemical plant, garbage disposal plant, avoid air, dust, water source, germ and the cultivation of the woods of noise pollution, the moon degree of covering of the woods is 6065%, with 1.82.2 meters of high nylon net circle area is 58 mu of chicken playground, and vegetable seeds is broadcasted sowing in forest land\uff1b\uff082\uff09build chicken house\uff1athe wind sheltering in woods ground on the sunny side, hen house is built in the chicken playground centre position that physical features is high and dry, draining blowdown condition is good, and ventilation heating is set in hen house equipment, hen house is interior to set automatic sterilizing system\uff1b\uff083\uff09select kind\uff1aselect it is resistance to it is extensive, action flexibly, the pure native that power of looking for food is strong, premunition is strong\uff1b\uff084\uff09dietary management\uff1aevery mu of forest land puts 260280 in a suitable place to breed, every morning 8:0015:feed and water are launched in stable breeding when 00, afternoon 15:0017:it is put into forest land playground when 00 to put in a suitable place to breed, 17:0018:dispensing feed outside music colony house is played when 30 to enter row domestication makes chicken return to colony house, and day temperature is maintained at 2023 degrees celsius in circle, and nocturnal temperature is maintained at 2023 degrees celsius\uff1b \uff085\uff09disinfectant management\uff1ato being cleaned in hen house, colony house is started certainly during chicken is put in a suitable place to breed afternoon within every 35 days within every 1215 days dynamic disinfection system is sterilized, and lime powder for every 23 months to the main passageway in woods forest land.", "korean": "\ud1a0\uc885\ub2ed \uc0ac\uc721\ubc29\ubc95\uc774 \ubc1c\uba85\ud488\uc740 \uc0ac\uc721\uc7a5\uc758 \ubc29\ud328\ub3c4\uac00 6065%\uc778 \ud1a0\uc885\ub2ed \uc0ac\uc721\ubc95\uc744 \uacf5\uac1c\ud558\uace0 \uc788\uc73c\uba70, \uba74\uc801\uc774 58m\uc778 \ud1a0\uc885\ub2ed \ud65c\ub3d9\uc7a5\uc744 1.82.2m \ub192\uc774\uc758 \ub098\uc77c\ub860 \uba54\uc2dc\ub85c \ub458\ub7ec\uc2f8\uace0 \uc788\uc73c\uba70, \ub2ed\uc7a5\uc5d0 \ud658\uae30 \ubc0f \ub09c\ubc29 \uc7a5\uce58\uac00 \ubc30\uce58\ub418\uc5b4 \uc788\uc73c\uba70, \ub9e4\uc77c \uc544\uce68 8\uc2dc15\ubd84\uc5d0 \uc0ac\ub8cc\uc640 \ubb3c\uc774 \uc804\ub2ec\ub418\uace0 \uc788\ub2e4. \uadf8\ub9ac\uace0 \ud1a0\uc885\ub2ed\uc740 \uc624\ud6c4 15:00-17:00\uc5d0 \uc232 \ud65c\ub3d9\uc7a5\uc5d0 \ud22c\uc785\ub418\uace0, 17: 00-18:30\uc5d0\ub294 \uc74c\uc545\uc774 \uc5f0\uc8fc\ub418\uba70, \ubaa8\uc774\ub294 \ub2ed\uc7a5 \ubc16\uc5d0\uc11c \ubc30\ub2ec\uc744 \ubc1b\uc544 \ud1a0\uc885\ub2ed\uc744 \uae38\ub4e4\uc774\uace0, \ub2ed\uc7a5\uc740 12-15\uc77c \uac04\uaca9\uc73c\ub85c \ub2ed\uc7a5\uc73c\ub85c \ub3cc\ub824\ubcf4\ub0b8\ub2e4; \ub2ed\uc7a5\uc740 \uc790\ub3d9\uc18c\ub3c5\ub41c\ub2e4.c \uc624\ud6c4\uc758 \ubcf4\uad00 \uae30\uac04 \ub3d9\uc548 35\uc77c \uac04\uaca9\uc73c\ub85c \uba78\uade0 \uc2dc\uc2a4\ud15c. \ud1a0\uc885\ub2ed \uc0ac\uc721\ubc95\uc740 \uc0ac\uc721 \uc2dc\uac04\uc744 \uc798 \uace0\ub824\ud560 \uc218 \uc788\uae30 \ub54c\ubb38\uc5d0 \ud1a0\uc885\ub2ed\uc758 \uc0ac\uc721 \ubcf8\ub2a5\uc774 \uc798 \ubcf4\uc7a5\ub418\uace0, \ud1a0\uc885\ub2ed\uc758 \uba39\uc774 \uc12d\ucde8\uac00 \uc801\uc808\ud55c \ud3ec\ud68d \uc2dc\uac04\uc744 \ud1b5\ud574 \uc99d\uac00\ud55c\ub2e4; \uc721\uc2dd \uc131\uc7a5\uc774 \uac00\uc18d\ud654\ub418\uace0, \ubc88\uc2dd \uc8fc\uae30\uac00 \uc9e7\uc544\uc9c0\uba70, \ud1a0\uc885\ub2ed\uc758 \uc721\uc9c8\ub3c4 e\uc774\ub2e4.\ub204\uc5d0\uc288\uc5b4\ub2ed\uc758 \uc77c\uc885\uc73c\ub85c, \ubb18\uc0ac\ub41c \ubc29\ubc95\uc740 \ub2e4\uc74c\uacfc \uac19\uc740 \ub2e8\uacc4\ub85c \uad6c\uc131\ub41c\ub2e4: \uff091select\uc120\uc815\uc7ac\ubc30\uc7a5: \uac00\ucd95\uacfc \uac00\uae08\ub958 \uac70\ub798\uc7a5\uc18c\ub85c\ubd80\ud130\uc758 \uc120\ud0dd, \ub3c4\ucd95\uc7a5, \ud654\ud559\uacf5\uc7a5, \uc4f0\ub808\uae30 \ucc98\ub9ac\uc7a5, \uacf5\uae30, \uba3c\uc9c0, \uc218\uc6d0, \uc138\uade0, \uadf8\ub9ac\uace0 \uc232\uc758 \ubb34\uade0 \uc7ac\ubc30\uc774\uc138\uc624\uc5fc, \uc232\uc758 \ub2ec\uc758 \ub36e\uc784\ub3c4\ub294 6065%\uc774\uace0, \ub192\uc740 \ub098\uc77c\ub860 \uadf8\ubb3c\ub9dd \uba74\uc801 1.82.2m\ub294 \ub2ed \ub180\uc774\ud130\uc758 58mu\uc774\uba70, \uc232 \uc18d\uc5d0 \ucc44\uc18c \uc528\uc557\uc744 \ubfcc\ub9ac\ub294 \uac83\uc744 \ubc29\uc1a1\ud55c\ub2e4. \uc2e0\uccb4\uc801 \ud2b9\uc9d5\uc774 \ub192\uace0 \uac74\uc870\ud558\uba70 \ubc30\uc218 \ube14\ub85c\uc6b0\ub2e4\uc6b4 \uc870\uac74\uc774 \uc88b\ub2e4, \uadf8\ub9ac\uace0 \ud658\uae30 \ub09c\ubc29\uc740 \ub2ed\uc9d1 \uc7a5\ube44\uc5d0 \uc124\uc815\ub41c\ub2e4, \ub2ed\uc9d1\uc740 \uc790\ub3d9 \uc0b4\uade0 \uc2dc\uc2a4\ud15c\uc744 \uc124\uc815\ud558\uae30 \uc704\ud55c \ub0b4\ubd80\uc774\ub2e4;33selectselect cind;select codelt it's \uad11\ubc94\uc704\ud558\uace0, \uc720\uc5f0\ud558\uac8c \uc791\uc6a9\ud558\uba70, \uc74c\uc2dd\uc744 \ucc3e\ub294 \ud798\uc774 \uac15\ud55c \uc21c\uc218\ud55c \ud1a0\uc885, \uc608\uac10\uc774 \uac15\ud558\ub2e4;select4aary \uad00\ub9ac:\uc784\uc57c\uc758 \ubaa8\ub4e0 \ubba4\ub294 260280\ubc88\uc2dd\uc744 \ud558\uae30\uc5d0 \uc801\ud569\ud55c \uc7a5\uc18c\uc5d0 \ubc30\uce58\ud558\uace0, \ub9e4\uc77c \uc544\uce68 8:0015:\uc0ac\ub8cc\uc640 \ubb3c\uc740 00\ubc88\uc2dd\uc744 \ud560 \ub54c \uc548\uc815\uc801\uc778 \ubc88\uc2dd\uc9c0\ub85c \ud22c\uc785\ud558\uace0, 17:0018:\uc74c\uc545\uc9d1 \uc678\ubd80\uc758 \uc0ac\ub8cc\ub4e4\uc774 30\ubc88 \uc904\uc5d0 \ub4e4\uc5b4\uc11c\uba74 \uc7ac\uc0dd\ub429\ub2c8\ub2e4.\ub2ed\uc758 \uad70\uc9d1 \ubcf5\uadc0\ub294 \uc544\uc774\ub514\ucf00\uc774\uc158\uc73c\ub85c, \ub0ae \uae30\uc628\uc740 \uc6d0\uc8fc 2023\ub3c4, \uc57c\ud589\uc131 \uc628\ub3c4\ub294 2023\ub3c4\ub97c \uc720\uc9c0\ud558\uba70, \u30105\u3011\uc911\uc694\ud55c \uad00\ub9ac:\ub2ed\uc9d1 \uccad\uc18c\ub294 \ubc18\ub4dc\uc2dc \uc2dc\uc791\ud558\uba70, \ub2ed\uc740 3\ub144\ub9c8\ub2e4 \uc624\ud6c4\ub9c8\ub2e4 \ubc88\uc2dd\ud558\uae30\uc5d0 \uc801\ud569\ud55c \uc7a5\uc18c\uc5d0 \ub454\ub2e4.1215\uc77c \uc774\ub0b4\uc5d0\ub294 5\uc77c \uc774\ub0b4 \ub3d9\uc801\uc18c\ub3c5\uc2dc\uc2a4\ud15c\uc774 \uba78\uade0 \ucc98\ub9ac\ub418\uba70, \uc232\uc18d\uc758 \uc8fc\ud1b5\ub85c\ub85c 2~3\uac1c\uc6d4\ub9c8\ub2e4 \ub77c\uc784\ud30c\uc6b0\ub354\uac00 \ud22c\uc785\ub41c\ub2e4.", "from": "nayohan/aihub-en-ko-translation-12m", "category": "full" }, "truncated_cells": [] } ]

Loss: CachedMultipleNegativesRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim"
}

Evaluation Dataset

query-to-dataset-viewer-descriptions

Dataset: query-to-dataset-viewer-descriptions at eb9d1be
Size: 1,433 evaluation samples
Columns: query, positive, and negative

Approximate statistics based on the first 1000 samples:

	query	positive	negative
type	string	string	string
details	min: 10 tokens mean: 11.75 tokens max: 17 tokens	min: 46 tokens mean: 2188.82 tokens max: 8192 tokens	min: 33 tokens mean: 1769.64 tokens max: 8192 tokens

Samples:

query	positive	negative
`USER_QUERY: multimodal conversation dataset`	HUB_DATASET_PREVIEW: DATASET_NAME: "BUAADreamer/llava-en-zh-2k" FEATURES: {'messages': [{'role': {'dtype': 'string', 'type': 'Value'}, 'content': {'dtype': 'string', 'type': 'Value'}}], 'images': {'feature': {'type': 'Image'}, 'type': 'Sequence'}} DATA SAMPLE: [ { "row_idx": 0, "row": { "messages": [ { "role": "user", "content": "How many baseball players are visible in the image?" }, { "role": "assistant", "content": "There are three baseball players visible in the image." }, { "role": "user", "content": "What are the players holding in their hands?" }, { "role": "assistant", "content": "The players are holding baseball bats in their hands." }, { "role": "user", "content": "Are the players in a dugout?" }, { "role": "assistant", "content": "Yes, the three baseball players are standing in the dugout." }, { "role": "user", "content": "Is the image in color or black and white?" }, { "role": "assistant", "content": "The image is an old black and white photo of the three baseball players." }, { "role": "user", "content": "Do the players belong to a specific baseball team?" }, { "role": "assistant", "content": "Yes, the players belong to the Boston Red Sox baseball team." } ], "images": [ { "src": "https://datasets-server.huggingface.co/assets/BUAADreamer/llava-en-zh-2k/--/fba994c834822bddd3cd79e929c33135f4289d2b/--/en/train/0/images/image-1d100e9.jpg?Expires=1726591851&Signature=QHGD147HyWamORfSjz0QoG51Ru86g3STPBNDAEOLK7NTq8Y~b4vVt3u~XF9njlRWwNnVF7AQ8-l9f2pCWxggPnZw1wZEfBAC5Q1oOW2CwT-gMYME~I-9qeJrbQtkszer9U0-H5rkECK0DVgWKKIagyjjJBmSSLz2QAb3BlV5aEMGEbJj-4G1X2xzoVaMIfg~ogGSb6~2PAKoLoKMPUZIxrm~2LoPEGVJeXr7JIa8-ISVukshW8ZEDQII~eXfOBeEsTd1Mk5BpJz26dH3W7X2rdhq54KeQ8H9OBt~HlU0XbXigFABeMymb-GBH-fDNHUDmb9Zt3io5hBMOd7I64DLyQ&Key-Pair-Id=K3EI6M078Z3AC3", "height": 308, "width": 384 } ] }, "truncated_cells": [] }, { "row_idx": 1, "row": { "messages": [ { "role": "user", "content": "What is the man wearing in the image?" }, { "role": "assistant", "content": "The man in the image is wearing chaps, a cowboy hat, and is dressed like a cowboy." }, { "role": "user", "content": "What is the cowboy doing in the image?" }, { "role": "assistant", "content": "The cowboy is riding on a horse in a grassy field and pointing into the distance, likely directing attention towards a specific location or object." }, { "role": "user", "content": "What kind of landscape surrounds the cowboy and his horse?" }, { "role": "assistant", "content": "The cowboy and his horse are surrounded by a grassy field, which creates a natural and open setting for the scene." }, { "role": "user", "content": "What is the significance of the cowboy attire and its relation to horse riding?" }, { "role": "assistant", "content": "Cowboy attire is deeply rooted in the history, culture, and traditions of the American West. It is both functional and symbolic, representing the hardworking and rugged lifestyle of cowboys who spent much of their time working with livestock and riding horses.\n\nThe key elements of cowboy attire, like the cowboy hat and chaps, serve specific purposes:\n1. Cowboy Hat: A wide-brimmed hat made of stiffened felt or straw is designed to protect the cowboy from the sun, wind, and rain. The hat's distinctive high crown allows for air circulation and provides a touch of style.\n2. Chaps: These are sturdy leather leg coverings worn over pants to protect the cowboy's legs from rough terrain, thorny bushes, and other elements while riding or working with livestock. Chaps provide an additional layer of protection and durability, particularly while riding through rugged landscapes and herding cattle.\n\nIn summary, the cowboy attire seen in the image is not just a fashion statement but is deeply connected to the history and traditions of the American West. The functional pieces of clothing, like the cowboy hat and chaps, are designed to protect and support the cowboy during horse riding and working with livestock." } ], "images": [ { "src": "https://datasets-server.huggingface.co/assets/BUAADreamer/llava-en-zh-2k/--/fba994c834822bddd3cd79e929c33135f4289d2b/--/en/train/1/images/image-1d100e9.jpg?Expires=1726591851&Signature=WyNDGZXVbzPOU9iOQSDPFt1MizgmdT-KqdVAG8nIVSK0Gg8OO-qmhKxgIVjyWMHnWyNbW5svuMoukPMyv9hiHMsNh0YmzdjMR9Gwb6mRvsisEAdaLl71Q053MYxEqkZWCB6PbXG5yEazHL4RHvDphsUEhZS-0Yk8Kzx0HHc12HNaJfiO4fO4IPkY3eLw5xLgNoKIcvvO9TDo0JEbc1ej6YkxGUdqXyVrG2Y4zYnhrCM0drgKVzq24cQ9YZ78HW5f-EsXsftbj0ZzEg4SKcuVgrqaKG8SJ~i0aV-OtkXiTCWxW16D4hfsmpXZShZAHesa1EOGprkYdtQG4Kfte12maQ&Key-Pair-Id=K3EI6M078Z3AC3", "height": 288, "width": 384 } ] }, "truncated_cells": [] } ]	NEGATIVE: DATASET_NAME: "passing2961/photochat_plus" FEATURES: {'photo_description': {'dtype': 'string', '_type': 'Value'}, 'trigger_sentences': {'feature': {'dtype': 'string', '_type': 'Value'}, '_type': 'Sequence'}, 'dialogue_id': {'dtype': 'int64', '_type': 'Value'}, 'photo_url': {'dtype': 'string', '_type': 'Value'}, 'dialogue': [{'message': {'dtype': 'string', '_type': 'Value'}, 'share_photo': {'dtype': 'bool', '_type': 'Value'}, 'user_id': {'dtype': 'int64', '_type': 'Value'}}], 'image_descriptions': {'feature': {'dtype': 'string', '_type': 'Value'}, '_type': 'Sequence'}, 'intents': {'feature': {'dtype': 'string', '_type': 'Value'}, '_type': 'Sequence'}, 'salient_information': {'feature': {'dtype': 'string', '_type': 'Value'}, '_type': 'Sequence'}, 'photo_id': {'dtype': 'string', '_type': 'Value'}} DATA SAMPLE: [ { "row_idx": 0, "row": { "photo_description": "The photo has your brother Kannon. Objects in the photo: Man", "trigger_sentences": [ "How is Kannon doing?" ], "dialogue_id": 500, "photo_url": "https://farm6.staticflickr.com/151/369716968_bde7e83418_o.jpg", "dialogue": [ { "message": "Hello, how have you been, dear friend?", "share_photo": false, "user_id": 1 }, { "message": "Great!", "share_photo": false, "user_id": 0 }, { "message": "Thanks for asking", "share_photo": false, "user_id": 0 }, { "message": "And how have you been?", "share_photo": false, "user_id": 0 }, { "message": "It seems like we haven't talked in forever", "share_photo": false, "user_id": 0 }, { "message": "I have been doing well, keeping busy, spent a lot of time outdoors. What have you been up to?", "share_photo": false, "user_id": 1 }, { "message": "Last night my brother Kannon did a poetry reading", "share_photo": false, "user_id": 0 }, { "message": "Really? How did it go? You know how much I love poetry.", "share_photo": false, "user_id": 1 }, { "message": "It went really well", "share_photo": false, "user_id": 0 }, { "message": "Do you remember my brother Kannon?", "share_photo": false, "user_id": 0 }, { "message": "Absolutely! How could I forget, he left quite an impression", "share_photo": false, "user_id": 1 }, { "message": "How is Kannon doing?", "share_photo": false, "user_id": 1 }, { "message": "", "share_photo": true, "user_id": 0 }, { "message": "Great", "share_photo": false, "user_id": 0 }, { "message": "Here is a photo from last night", "share_photo": false, "user_id": 0 }, { "message": "Wow, he seems so confident in that pic! Wish that I could have been there.", "share_photo": false, "user_id": 1 } ], "image_descriptions": [ "A photo of Kannon", "A picture of Kannon.", "a photo of recent situation" ], "intents": [ "Information Dissemination", "Social Bonding" ], "salient_information": [ "poetry", "How is Kannon doing?", "Kannon doing" ], "photo_id": "train/19e8f436d4b2fc25" }, "truncated_cells": [] }, { "row_idx": 1, "row": { "photo_description": "The photo has your uncle Kieran. Objects in the photo: Clothing, Man", "trigger_sentences": [ "guess what new animal he got?", "He's always had goats and chickens, but guess what new animal he got?" ], "dialogue_id": 501, "photo_url": "https://farm8.staticflickr.com/53/189664134_f70fc8947a_o.jpg", "dialogue": [ { "message": "Hey! You remember my uncle who owns the hobby farm, right?", "share_photo": false, "user_id": 0 }, { "message": "Yeah i do", "share_photo": false, "user_id": 1 }, { "message": "Uncle Keiran?", "share_photo": false, "user_id": 0 }, { "message": "How about him?", "share_photo": false, "user_id": 1 }, { "message": "He's always had goats and chickens, but guess what new animal he got?", "share_photo": false, "user_id": 0 }, { "message": "Dog?", "share_photo": false, "user_id": 1 }, { "message": "Nope, a wild hog!", "share_photo": false, "user_id": 0 }, { "message": "And not the motorcycle kind ;)", "share_photo": false, "user_id": 0 }, { "message": "", "share_photo": true, "user_id": 0 }, { "message": "Wow", "share_photo": false, "user_id": 1 } ], "image_descriptions": [ "A photo of the hog's appearance.", "a photo of wild hog", "An image of the new wild hog" ], "intents": [ "Social Bonding", "Visual Clarification" ], "salient_information": [ "hog", "not the motorcycle kind", "wild hog", "a wild hog" ], "photo_id": "train/07d688f5e2142b87" }, "truncated_cells": [] } ]
`USER_QUERY: question answering dataset for technical documentation`	HUB_DATASET_PREVIEW: DATASET_NAME: "hythyt/preguntas-respuestas-RAG" FEATURES: {'DATOS': {'dtype': 'string', '_type': 'Value'}, 'PREGUNTA': {'dtype': 'string', '_type': 'Value'}, 'RESPUESTA': {'dtype': 'string', '_type': 'Value'}} DATA SAMPLE: [ { "row_idx": 0, "row": { "DATOS": "ARISbyhyt Manual de Usuario\nSecci\u00f3n 8\nARISbyhyt es un programa dise\u00f1ado para gestionar proyectos de forma eficiente. Permite crear tareas, asignar recursos y realizar un seguimiento detallado del progreso. Para agregar una nueva tarea, accede al men\u00fa "Proyectos" y selecciona "Nueva Tarea". Completa la informaci\u00f3n requerida, como nombre, fecha de inicio y asignaci\u00f3n de recursos.\n\nSecci\u00f3n 4\nEn ARISbyhyt, puedes personalizar tus informes para obtener datos espec\u00edficos sobre el progreso del proyecto. Accede al men\u00fa "Informes", elige la opci\u00f3n "Personalizar Informe" y selecciona los campos que deseas incluir, como el estado de las tareas o el tiempo dedicado a cada fase.", "PREGUNTA": "\u00bfC\u00f3mo puedo crear una nueva tarea en ARISbyhyt?", "RESPUESTA": "Para agregar una nueva tarea en ARISbyhyt, sigue estos pasos:\n1. Ve al men\u00fa "Proyectos".\n2. Selecciona "Nueva Tarea".\n3. Completa la informaci\u00f3n solicitada, como nombre, fecha de inicio y asignaci\u00f3n de recursos." }, "truncated_cells": [] }, { "row_idx": 1, "row": { "DATOS": "ARISbyhyt Manual de Usuario \nSecci\u00f3n 2\nEn ARISbyhyt, puedes personalizar tus informes para obtener datos espec\u00edficos sobre el progreso del proyecto. Accede al men\u00fa "Informes", elige la opci\u00f3n "Personalizar Informe" y selecciona los campos que deseas incluir, como el estado de las tareas o el tiempo dedicado a cada fase.", "PREGUNTA": "\u00bfC\u00f3mo puedo personalizar un informe en ARISbyhyt para obtener datos espec\u00edficos sobre el progreso del proyecto?", "RESPUESTA": "Para personalizar un informe en ARISbyhyt, sigue estos pasos:\n1. Dir\u00edgete al men\u00fa "Informes".\n2. Selecciona "Personalizar Informe".\n3. Elige los campos que deseas incluir, como el estado de las tareas o el tiempo dedicado a cada fase." }, "truncated_cells": [] } ]	NEGATIVE: DATASET_NAME: "cmalaviya/expertqa" FEATURES: {'example_id': {'dtype': 'int64', '_type': 'Value'}, 'context': {'dtype': 'string', '_type': 'Value'}, 'question': {'dtype': 'string', '_type': 'Value'}, 'answer': {'dtype': 'string', '_type': 'Value'}} DATA SAMPLE: [ { "row_idx": 0, "row": { "example_id": 0, "context": "", "question": "Some customers of mine are not paying their debts on time. Do I have to keep all my customers?", "answer": "You don't necessarily have to keep all your customers, especially if they consistently fail to pay their debts on time. There are different types of non-paying customers, such as cash-strapped, purposefully late, and non-payer by nature . It is essential to maintain a positive attitude and treat your customers with respect while trying to collect their debts . However, if you consistently face issues with particular customers not paying their debts, you may opt to discontinue providing services or products to them and focus on other reliable customers. You may need to consult a professional debt collector or a business attorney in such cases to decide the appropriate next steps in debt collections . To prevent nonpayment issues in the future, you can implement various strategies, such as researching new prospects, being clear with your payment policies, and setting up contracts detailing payment expectations and late fees ." }, "truncated_cells": [] }, { "row_idx": 1, "row": { "example_id": 1, "context": "", "question": "When accounts are faced with ethical dilemmas that often bring their integrity into question, the question is whether they are equipped enough tp deal with those?", "answer": " The context provided does not give specific information on whether accountants are adequately equipped to handle ethical dilemmas that could question their integrity. The text does suggest, however, that when faced with an ethical dilemma, one must question the situation honestly and transparently. And, if doubts persist, they have the obligation to raise these questions with those in authority . This suggests the need for a strong understanding of ethics to navigate such situations. The text also implies a correlation between integrity and ethics stating, "Integrity can be measured by ethics" . In a broader perspective, the text suggests that professionals, like nurses for example, often face dilemmas uncommon to the general populace . Due to the rapid advancement in medical technology, the study of ethics has become increasingly relevant, indicating that equipping professionals with adequate knowledge in ethics is necessary to navigate the demands and challenges of their roles effectively . Furthermore, it shows that managers grapple with ethical decisions involving questions of morality and integrity especially in situations where prior decisions by other managers create ethical dilemmas . While this analysis provides general insights on the significance of ethical decision-making and the need for professionals to navigate ethical dilemmas effectively, it does not provide a specific commentary on the readiness or the adequacy of training or framework available to accountants to deal with such scenarios. Hence, it is not possible to definitively answer the question based on the context provided. In South Africa SAICA has equipped accountants with code of professional conduct that they should follow when faced with ethical dilemmas. the code gives them guidance on how to deal with those. SAICA code of professional conduct https://www.misti.com/internal-audit-insights/ethics-and-the-internal-auditor Ethics and the Internal Auditor freely express these thoughts and ideas, the culture may be sending the wrong message. When you are personally faced with an ethical dilemma, you must ask yourself whether you are looking at the situation as honestly and transparently as you can. If questions still arise, it is your obligation to raise those questions to individuals in positions of responsibility. Integrity can be measured by ethics If someone had you name the top three people in history that you felt displayed unquestionable integrity, would those same individuals measure high on the ethics scale? Most likely they would. Integrity is adherence to https://misti.com/internal-audit-insights/ethics-and-the-internal-auditor Ethics and the Internal Auditor thoughts and ideas, the culture may be sending the wrong message. When you are personally faced with an ethical dilemma, you must ask yourself whether you are looking at the situation as honestly and transparently as you can. If questions still arise, it is your obligation to raise those questions to individuals in positions of responsibility. Integrity can be measured by ethics If someone had you name the top three people in history that you felt displayed unquestionable integrity, would those same individuals measure high on the ethics scale? Most likely they would. Integrity is adherence to a moral code, https://www.misti.co.uk/internal-audit-insights/ethics-and-the-internal-auditor Ethics and the Internal Auditor wrong message. When you are personally faced with an ethical dilemma, you must ask yourself whether you are looking at the situation as honestly and transparently as you can. If questions still arise, it is your obligation to raise those questions to individuals in positions of responsibility. Integrity can be measured by ethics If someone had you name the top 3 people in history that you felt displayed unquestionable integrity, would those same individuals measure high on the ethics scale? Most likely they would. Integrity is adherence to a moral code, reflected in honesty and harmony in what one thinks, SAICA equip accountants with all the relevant information in order to be able to identify ethical dilemmas https://www.misti.com/internal-audit-insights/ethics-and-the-internal-auditor Ethics and the Internal Auditor freely express these thoughts and ideas, the culture may be sending the wrong message. When you are personally faced with an ethical dilemma, you must ask yourself whether you are looking at the situation as honestly and transparently as you can. If questions still arise, it is your obligation to raise those questions to individuals in positions of responsibility. Integrity can be measured by ethics If someone had you name the top three people in history that you felt displayed unquestionable integrity, would those same individuals measure high on the ethics scale? Most likely they would. Integrity is adherence to https://misti.com/internal-audit-insights/ethics-and-the-internal-auditor Ethics and the Internal Auditor thoughts and ideas, the culture may be sending the wrong message. When you are personally faced with an ethical dilemma, you must ask yourself whether you are looking at the situation as honestly and transparently as you can. If questions still arise, it is your obligation to raise those questions to individuals in positions of responsibility. Integrity can be measured by ethics If someone had you name the top three people in history that you felt displayed unquestionable integrity, would those same individuals measure high on the ethics scale? Most likely they would. Integrity is adherence to a moral code, https://www.misti.co.uk/internal-audit-insights/ethics-and-the-internal-auditor Ethics and the Internal Auditor wrong message. When you are personally faced with an ethical dilemma, you must ask yourself whether you are looking at the situation as honestly and transparently as you can. If questions still arise, it is your obligation to raise those questions to individuals in positions of responsibility. Integrity can be measured by ethics If someone had you name the top 3 people in history that you felt displayed unquestionable integrity, would those same individuals measure high on the ethics scale? Most likely they would. Integrity is adherence to a moral code, reflected in honesty and harmony in what one thinks, https://www.bartleby.com/essay/The-Ethical-Dilemma-Of-A-Family-Nurse-F3H66JS4CPLLX The Ethical Dilemma Of A Family Nurse Practitioner
`USER_QUERY: conversational ai training dataset`	HUB_DATASET_PREVIEW: DATASET_NAME: "dataautogpt3/flan1m-alpaca-uncensored" FEATURES: {'instruction': {'dtype': 'string', '_type': 'Value'}, 'input': {'dtype': 'string', '_type': 'Value'}, 'output': {'dtype': 'string', '_type': 'Value'}} DATA SAMPLE: [ { "row_idx": 0, "row": { "instruction": "You are a helpful assistant, who always provide explanation. Think like you are answering to a five year old.", "input": "Continue the following story.\n\nEmily held and rocked Kayla as they sobbed because \nSelect from the following.\n +Emily really needed a good friend.\n +Kayla really needed a good friend.", "output": "Once upon a time, Emily held and rocked Kayla as they both sobbed because Kayla really needed a good friend. You see, little Kayla was feeling very lonely and sad because she didn't have anyone to play with or talk to.\n\nEmily wanted to help Kayla and be her friend, so she did something truly special. Emily decided to throw a magical friendship party for Kayla. Emily knew that this would cheer her up and help her make some new friends!\n\nEmily worked really hard, preparing for the big day. She bought balloons, baked cookies, and even created a treasure hunt for everyone to enjoy. She invited lots of other kids from their neighborhood, so Kayla could meet them all and find a friend to play with.\n\nOn the day of the party, the sun was shining, and the sky was filled with fluffy clouds. All the kids in the neighborhood were excited to come to the party and play with Kayla.\n\nEmily and Kayla welcomed their guests with huge smiles and warm hugs. The kids played games, laughed, and explored the treasure hunt that Emily had set up all around the backyard.\n\nSoon, Kayla was surrounded by new friends, chatting and laughing together. She began to feel so happy and loved. As the sun began to set and the party ended, Kayla felt like she finally had found the friends she needed.\n\nThat night, as Emily tucked Kayla into bed, Kayla thanked her for being such an amazing friend and throwing the best party ever. Emily smiled and said, "That's what friends are for, to help each other when we're feeling sad and lonely."\n\nAnd from that day on, Emily and Kayla were the best of friends and shared many more magical adventures together. Now, Kayla knew that she always had a good friend by her side." }, "truncated_cells": [] }, { "row_idx": 1, "row": { "instruction": "You are an AI assistant that helps people find information. User will you give you a question. Your task is to answer as faithfully as you can. While answering think step-bystep and justify your answer.", "input": "For this chain-of-thought reasoning and answer, what was the question?\nA boy playing ball is not necessarily not very good at playing ball. There is no way to know what level of skill the boy has.\n A: it is not possible to tell", "output": "The question likely was:\n\n"Does a boy playing ball automatically mean he is not very good at playing ball?"" }, "truncated_cells": [] } ]	NEGATIVE: DATASET_NAME: "sgp-bench/open-instruct-gpt4o_40k" FEATURES: {'instruction': {'dtype': 'string', '_type': 'Value'}, 'input': {'dtype': 'string', '_type': 'Value'}, 'output': {'dtype': 'string', '_type': 'Value'}} DATA SAMPLE: [ { "row_idx": 0, "row": { "instruction": "Can you tell me how cookies work for tracking user data?", "input": "", "output": "Sure! Cookies are small pieces of data that websites can store on your computer, allowing them to track your activities and improve the user experience. For example, cookies can remember your preferences, keep track of your shopping history, or save information you enter into forms. They\u2019re typically stored in a file on your hard drive, and you can configure your browser to block or delete cookies, or\n" }, "truncated_cells": [] }, { "row_idx": 1, "row": { "instruction": "Given the sentence "White fluffy dog hangs out of the window of a turquoise building." is it true that "The dog is indoors."? Let's be accurate as possible and think first.", "input": "", "output": "A dog being in a turquoise building is the same as a dog being indoors. Final answer: yes." }, "truncated_cells": [] } ]

Loss: CachedMultipleNegativesRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim"
}

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: steps
per_device_train_batch_size: 4
per_device_eval_batch_size: 4
num_train_epochs: 200
warmup_ratio: 0.1
bf16: True
load_best_model_at_end: True
batch_sampler: no_duplicates

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 4
per_device_eval_batch_size: 4
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 5e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 200
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.1
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: True
fp16: False
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: True
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: False
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
dispatch_batches: None
split_batches: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
eval_use_gather_object: False
batch_sampler: no_duplicates
multi_dataset_batch_sampler: proportional

Training Logs

Epoch	Step	Training Loss	loss	max_accuracy
0	0	-	-	0.5
0.3497	100	1.0509	0.7070	-
0.6993	200	0.6183	0.3396	-
1.0490	300	0.3746	0.2282	-
1.3986	400	0.2481	0.1616	-
1.7483	500	0.2198	0.1302	-
2.0979	600	0.166	0.1164	-
2.4476	700	0.1045	0.1174	-
2.7972	800	0.0797	0.1095	-
3.1469	900	0.0422	0.1176	-
3.4965	1000	0.0595	0.1115	-
3.8462	1100	0.0416	0.1008	-
4.1958	1200	0.0174	0.1233	-
4.5455	1300	0.0273	0.1032	-
4.8951	1400	0.0389	0.0990	-
5.2448	1500	0.0126	0.0963	-
5.5944	1600	0.0074	0.1193	-
5.9441	1700	0.0165	0.1379	-
6.2937	1800	0.0046	0.1127	-
6.6434	1900	0.0158	0.1289	-
6.9930	2000	0.0157	0.1009	-
7.3427	2100	0.0032	0.1075	-
7.6923	2200	0.0072	0.1289	-
8.0420	2300	0.0192	0.1176	-
8.3916	2400	0.001	0.1214	-
8.7413	2500	0.024	0.1320	1.0

The bold row denotes the saved checkpoint.

Framework Versions

Python: 3.10.12
Sentence Transformers: 3.1.0
Transformers: 4.44.2
PyTorch: 2.4.0+cu121
Accelerate: 0.34.2
Datasets: 3.0.0
Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

CachedMultipleNegativesRankingLoss

@misc{gao2021scaling,
    title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
    author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
    year={2021},
    eprint={2101.06983},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}