Counterfactual Augmentation for Robust Authorship Representation Learning
ERLAS is official hub for the paper "Counterfactual Augmentation for Robust Authorship Representation Learning". In this framework we introduce generating style-counterfactual examples by retrieving the most similar content texts by different authors on the same topics/domains.
Installation:
git clone https://github.com/hieum98/Counterfactual-Augmentation-for-Robust-Authorship-Representation-Learning.git
cd Counterfactual-Augmentation-for-Robust-Authorship-Representation-Learning
pip install -r requirements.txt
pip install -e .
Usage:
from ERLAS.model.erlas import ERLAS
from transformers import AutoTokenizer
model = ERLAS.from_pretrained('Hieuman/erlas')
tokenizer = AutoTokenizer.from_pretrained('Hieuman/erlas')
batch_size = 3
episode_length = 16
text = [
["Foo"] * episode_length,
["Bar"] * episode_length,
["Zoo"] * episode_length,
]
text = [j for i in text for j in i]
tokenized_text = tokenizer(
text,
max_length=32,
padding="max_length",
truncation=True,
return_tensors="pt"
)
# inputs size: (batch_size, episode_length, max_token_length)
tokenized_text["input_ids"] = tokenized_text["input_ids"].reshape(batch_size, 1, episode_length, -1)
tokenized_text["attention_mask"] = tokenized_text["attention_mask"].reshape(batch_size, 1, episode_length, -1)
author_reps, _ = model(tokenized_text['input_ids'], tokenized_text['attention_mask'])
author_reps = author_reps.squeeze(1) # [bs, hidden_size]
Citation
@inproceedings{10.1145/3626772.3657956,
author = {Man, Hieu and Huu Nguyen, Thien},
title = {Counterfactual Augmentation for Robust Authorship Representation Learning},
year = {2024},
isbn = {9798400704314},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3626772.3657956},
doi = {10.1145/3626772.3657956},
pages = {2347–2351},
numpages = {5},
keywords = {authorship attribution, counterfactual learning, domain generalization},
location = {Washington DC, USA},
series = {SIGIR '24}
}
This model has been pushed to the Hub using the PytorchModelHubMixin integration:
- Downloads last month
- 1
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support