Patent SDG Classifier
This is the resulting classification model trained on the "silver" dateset builded as described in "From scratch to silver: Creating trustworthy training data for patent-SDG classification using Large Language Models".
The model is a multi-label classifier built on mpi-inno-comp/pat_specter, designed to classify patent texts against 17 United Nations Sustainable Development Goals (SDGs).
The model was trained using auto-generated soft prediction vectors, where as soft we intended that are normalized weak-supervised annotation frequencies, as described in the paper, which are normalized importance scores for each SDG per patent, rather than traditional binary human-annotated labels.
A custom BCE loss function was used, incorporating epsilon smoothing (0.02) to prevent overconfident predictions and entropy regularization (lambda=0.0005) to encourage a broader probability distribution. Square-root inverse class weights were applied to address class imbalance.
Training involved a learning rate of 2e-5, a batch size of 128 per device, 20 epochs, and 0.01 weight decay. Early stopping was implemented with a patience of 3, monitoring f1_macro
Usage
from transformers import pipeline
pipe = pipeline("text-classification",
model="graziasveva93/patent_sdg_classifier")
inf = pipe({PATENT_TEXT}, top_k=None)
print(inf)
[{'label': 'sdg_14', 'score': 0.47907012701034546},
{'label': 'sdg_9', 'score': 0.31178590655326843},
{'label': 'sdg_3', 'score': 0.108424112200737},
{'label': 'sdg_7', 'score': 0.08894453942775726},
{'label': 'sdg_6', 'score': 0.05447851121425629},
{'label': 'sdg_12', 'score': 0.04002637416124344},
{'label': 'sdg_15', 'score': 0.021443745121359825},
{'label': 'sdg_17', 'score': 0.01988437958061695},
{'label': 'sdg_2', 'score': 0.019452063366770744},
{'label': 'sdg_10', 'score': 0.01940927840769291},
{'label': 'sdg_13', 'score': 0.018470678478479385},
{'label': 'sdg_11', 'score': 0.01597837172448635},
{'label': 'sdg_5', 'score': 0.01508121658116579},
{'label': 'sdg_16', 'score': 0.014975948259234428},
{'label': 'sdg_4', 'score': 0.01400495134294033},
{'label': 'sdg_8', 'score': 0.011000115424394608},
{'label': 'sdg_1', 'score': 0.009290466085076332}]
- Downloads last month
- 17
Model tree for graziasveva93/patent_sdg_classifier
Base model
mpi-inno-comp/pat_specter