Custom RF Dataset Development

^

Custom RF Dataset Development

Developing reliable RF machine learning models depends on access to high-quality, well-characterized training data.
Qoherent designs and produces custom datasets that align precisely with project objectives, using both synthetic generation and real-world signal capture.

We have delivered dozens of purpose-built datasets for prototyping and research use by space agencies, enterprises, and government research labs, ranging from small-scale (tens of megabytes) collections with targeted signal examples to comprehensive multi-terabyte datasets covering extensive signal scenarios.

Our Approach

We manage the dataset development lifecycle from initial specification to final delivery. Whether synthetically generated, testbed generated, or ambient captured, each stage is guided by RF domain expertise, machine learning experience, and familiarity with operational deployment needs. The typical workflow includes requirements gathering, dataset design, capture or synthesis, augmentation, quality control, and packaging.

Tip

Equipment, Testbeds, and Field Kits

We maintain complete, ready-to-use capture and test platforms, so no new equipment purchases are required.

  • SDR platforms: Ettus USRP family, BladeRF, HackRF, PlutoSDR devices and more
  • Frequency coverage: 50 MHz to 6 GHz directly, extended to millimetre-wave using calibrated downconverters
  • Timing and calibration: GPSDO, multi-radio clock distribution, calibration procedures, and reference sources
  • Front-end chain: LNAs, band-pass filters, splitters, combiners, programmable attenuators, and power monitoring
  • Antennas and fixtures: Wideband and band-specific antennas, fixtures for controlled and repeatable setups
  • Portable capture kits: Rugged, battery-operable systems for field collection with edge compute and storage
  • Lab testbeds: Multi-radio channel emulation, over-the-air chambers, and controlled interference injection

Synthetic Dataset Generation

Our synthetic data capabilities cover a comprehensive range of RF environments:

  • Cellular Networks: 5G NR, LTE, and legacy cellular standards across all deployment bands.
  • Satellite Communications: LEO, MEO, and GEO scenarios including interference modeling.
  • Industrial IoT: ISM band protocols, LoRa, and proprietary formats.
  • Radar Systems: FMCW, pulse-Doppler, and specialized waveforms.
  • Custom Protocols: Your proprietary signals and modulation schemes.

Real-World Data Collection

Synthetic data is a powerful tool but isn’t sufficient for every use case. We provide professional data collection and curation services:

  • Testbed design and development: Creation of emulators and testbeds using commercially available SDRs for controlled environment evaluation. Any synthetic scenario can be emulated for over-the-air testing.
  • Over-the-air capture: Recording ambient signals in real-world environments, with equipment calibrated for frequency range, noise floor, and timing accuracy.
  • Labelling: Automated and human-led RF dataset labelling, with label definitions aligned to project requirements and traceable back to source data.
  • Multi-location and multi-environment campaigns: Collection across varied geographic locations and environmental conditions.
  • Background characterization: Measurement and documentation of interference, noise floor, and spectrum occupancy in each environment.
Delivery Standards
Industry-Standard FormatsFull support for SigMF, HDF5, and custom formats
Comprehensive MetadataComplete signal parameters, collection conditions, and labeling methodology
Quality AssuranceValidated datasets with statistical analysis and example notebooks
DocumentationDetailed guides for dataset usage and model training

Representative Projects

Our dataset development work supports operational, research, and prototyping efforts across space, defence, telecommunications, and industrial IoT sectors, resulting in dozens of terabytes of dataset curation.

  • Satellite interference recognition datasets – multi-class datasets for recognizing numerous types of interference scenarios and degradation modes that are found in LEO, MEO, and GEO links.
  • Wideband spectrum occupancy datasets – geographically diverse captures supporting cognitive radio research and spectrum-sharing technology development.
  • RF device fingerprinting datasets – labelled IQ data for specific emitter identification and authentication of devices.
  • Beamforming datasets – controlled-environment recordings for ML beamforming algorithm development.
  • Channel environment libraries datasets – synthetic and captured data for training channel environment recognition models.
Contact us to discuss your dataset requirements alter-text