facebook
/

audioseal

AudioSeal

Model card Files Files and versions

xet

Community

reach-vb commited on Mar 11, 2024

Commit

d16f1d8

verified ·

1 Parent(s): 20f3f5a

Create README.md (#1)

Browse files

- Create README.md (f82cc8665fef399bbe5de542d6ff78ae4cf5ae7c)

Files changed (1) hide show

README.md +89 -0

README.md ADDED Viewed

	@@ -0,0 +1,89 @@

+---
+tags:
+- audioseal
+inference: false
+---
+# AudioSeal
+We introduce AudioSeal, a method for speech localized watermarking, with state-of-the-art robustness and detector speed. It jointly trains a generator that embeds a watermark in the audio, and a detector that detects the watermarked fragments in longer audios, even in the presence of editing.
+Audioseal achieves state-of-the-art detection performance of both natural and synthetic speech at the sample level (1/16k second resolution), it generates limited alteration of signal quality and is robust to many types of audio editing.
+Audioseal is designed with a fast, single-pass detector, that significantly surpasses existing models in speed — achieving detection up to two orders of magnitude faster, making it ideal for large-scale and real-time applications.
+# :mate: Installation
+AudioSeal requires Python >=3.8, Pytorch >= 1.13.0, [omegaconf](https://omegaconf.readthedocs.io/), [julius](https://pypi.org/project/julius/), and numpy. To install from PyPI:
+```
+pip install audioseal
+```
+To install from source: Clone this repo and install in editable mode:
+```
+git clone https://github.com/facebookresearch/audioseal
+cd audioseal
+pip install -e .
+```
+# :gear: Models
+We provide the checkpoints for the following models:
+- AudioSeal Generator.
+  It takes as input an audio signal (as a waveform), and outputs a watermark of the same size as the input, that can be added to the input to watermark it.
+  Optionally, it can also take as input a secret message of 16-bits that will be encoded in the watermark.
+- AudioSeal Detector.
+  It takes as input an audio signal (as a waveform), and outputs a probability that the input contains a watermark at each sample of the audio (every 1/16k s).
+  Optionally, it may also output the secret message encoded in the watermark.
+Note that the message is optional and has no influence on the detection output. It may be used to identify a model version for instance (up to $2**16=65536$ possible choices).
+**Note**: We are working to release the training code for anyone wants to build their own watermarker. Stay tuned !
+# :abacus: Usage
+Audioseal provides a simple API to watermark and detect the watermarks from an audio sample. Example usage:
+```python
+from audioseal import AudioSeal
+# model name corresponds to the YAML card file name found in audioseal/cards
+model = AudioSeal.load_generator("audioseal_wm_16bits")
+# Other way is to load directly from the checkpoint
+# model =  Watermarker.from_pretrained(checkpoint_path, device = wav.device)
+# a torch tensor of shape (batch, channels, samples) and a sample rate
+# It is important to process the audio to the same sample rate as the model
+# expectes. In our case, we support 16khz audio
+wav, sr = ..., 16000
+watermark = model.get_watermark(wav, sr)
+# Optional: you can add a 16-bit message to embed in the watermark
+# msg = torch.randint(0, 2, (wav.shape(0), model.msg_processor.nbits), device=wav.device)
+# watermark = model.get_watermark(wav, message = msg)
+watermarked_audio = wav + watermark
+detector = AudioSeal.load_detector("audioseal_detector_16bits")
+# To detect the messages in the high-level.
+result, message = detector.detect_watermark(watermarked_audio, sr)
+print(result) # result is a float number indicating the probability of the audio being watermarked,
+print(message)  # message is a binary vector of 16 bits
+# To detect the messages in the low-level.
+result, message = detector(watermarked_audio, sr)
+# result is a tensor of size batch x 2 x frames, indicating the probability (positive and negative) of watermarking for each frame
+# A watermarked audio should have result[:, 1, :] > 0.5
+print(result[:, 1 , :])
+# Message is a tensor of size batch x 16, indicating of the probability of each bit to be 1.
+# message will be a random tensor if the detector detects no watermarking from the audio
+print(message)
+```