Audio deepfake detection: designing data sets correctly

September 30, 2025

The key to reliable audio deepfake detection: how to design data sets correctly

Manipulated audio recordings (‘audio deepfakes’) are no longer the stuff of science fiction, but real threats to individuals, companies and democratic processes. Attackers can use them to imitate voices or falsify statements – a technique that is becoming increasingly sophisticated. The use of robust AI-based detection methods offers a necessary countermeasure. However, the success of such systems depends less on clever models and more on the quality of the training material. In other words, if the data set is weak, detection will also remain weak.

This article highlights what to look out for when building datasets for audio deepfake detection, what challenges exist, and how research – using the MLAAD dataset as an example – is addressing them. It also supplements and evaluates current research results from Fraunhofer AISEC.

Why good datasets are crucial

AI models learn from examples – they only generalise to new cases if the training data is sufficiently diverse and representative. There are several risks associated with audio deepfakes:

  • Overfitting to specific systems or sound distortions that occurred during training but not in reality.
  • Language or model bias: If a data set contains only a few TTS systems or languages, the system may fail when encountering unfamiliar voices or language variants.
  • Acoustic environmental influences, noise conditions and playback chains can distort the signal and make reliable recognition difficult.

The challenge is greater in the audio domain because sound details, noise and subtleties play a major role. Fraunhofer AISEC points out in its publications that many existing models suffer from significant performance degradation in more realistic, difficult conditions¹.

The MLAAD approach: versatility as a strategy for success

A prominent example of a comprehensively designed dataset is MLAAD (Multi-Language Audio Anti-Spoofing Dataset), co-developed at Fraunhofer AISEC².

Key features of MLAAD:

  • Wide variety of models: The dataset uses over 101 different text-to-speech (TTS) models, with 42 different architectures³.
  • Multilingualism: MLAAD covers synthetic voices in 40 languages, thus complementing classic datasets that often only include English or Chinese³.
  • Combination with real recordings: The M-AILABS dataset, among others, serves as the basis for real audio data, supplemented by synthetic variants in many languages³.
  • Source tracing: MLAAD also allows the origin of a deepfake (i.e., which TTS model generated it) to be investigated and promotes corresponding research efforts³.

Results from experiments show that models trained with MLAAD demonstrate better generalisation ability in cross-dataset evaluations than many conventional datasets³.

Technical and methodological guiding principles for dataset construction

Based on existing research and practical experience, several principles can be formulated that help in building robust datasets:

  1. Diversity of fake generators
  2. Linguistic diversity
  3. Real audio as a counterpoint
  4. Realistic recording conditions
  5. Balanced classes
  6. Careful annotation and metadata
  7. Training/validation/testing partitioning based on domain aspects
  8. Continuous expansion (living dataset)

Current research trends and challenges

Replay attacks are a particularly difficult scenario: a deepfake is played back through speakers and recorded again, which changes the sound and makes detection more difficult⁴.

In the field of source tracing, additional research is needed to reliably identify AI generation sources⁵. Fraunhofer AISEC is actively conducting research in this area, particularly in the Cognitive Security Technologies department⁶.

Fraunhofer also provides tools such as Deepfake Total, which can be used to check audio or video recordings for deepfake characteristics⁷.

Final considerations

Building suitable datasets is a core task in the practice of AI-based deepfake detection – especially in the audio domain. Diversity, careful annotation and realistic conditions are not optional, but basic requirements. The MLAAD approach exemplifies what a modern dataset can look like: multilingual, highly diverse models and a combination of real and synthetic audio.

Despite these advances, challenges remain — especially in attack scenarios such as replay attacks or previously unknown synthesis methods. Research is evolving, and institutes such as Fraunhofer AISEC are making an important contribution with their expertise and tools.

Sources / Footnotes

  1. Fraunhofer AISEC. Does Audio Deepfake Detection Generalise? [Online]. Available: https://www.aisec.fraunhofer.de/content/dam/aisec/Dokumente/Publikationen/Studien_TechReports/englisch/Does_Audio_Deepfake_Detection_Generalize.pdf
  2. Fraunhofer AISEC. Deepfakes – Challenges and Solutions. [Online]. Available: https://www.aisec.fraunhofer.de/en/spotlights/Deepfakes.html
  3. Deepfake Total. MLAAD – Multi-Language Audio Anti-Spoofing Dataset. [Online]. Available: https://deepfake-total.com/mlaad
  4. Doan, N. et al. Challenges in Replay-Attack Detection for Audio Deepfakes. In: ISCA Archive, 2025. [Online]. Available: https://arxiv.org/abs/2505.14862
  5. Doan, N. et al. Source Tracing of Audio Deepfakes. In: Interspeech 2025 Proceedings. [Online]. Available: https://www.isca-archive.org/interspeech_2025/doan25_interspeech.pdf
  6. Fraunhofer AISEC. Cognitive Security Technologies. [Online]. Available: https://www.aisec.fraunhofer.de/en/fields-of-expertise/CST.html
  7. Fraunhofer AISEC. Deepfake Total. [Online]. Available: https://www.aisec.fraunhofer.de/en/spotlights/Deepfakes.html

Related Articles

Certificates of good conduct for external service providers

What security managers need to consider In data centres, building security or the operation of critical infrastructure, trust in the personnel employed is crucial. Companies are therefore increasingly considering whether they can require external service providers to...

Comment: German STEM education – federalism or national approach?

Germany is at a crossroads: tomorrow's technical education will determine the economic location of the day after tomorrow. The position paper of the National STEM Forum and the VDI's demand not to lose STEM education in the ‘confusion of federalism’ strike at the...

Share This