Dirty-MNIST Dataset | Papers With Code

Name:*

Full name (optional):

Description (Markdown and $\LaTeX$ enabled):*

DirtyMNIST is a concatenation of MNIST + AmbiguousMNIST, with 60k samples each in the training set. AmbiguousMNIST contains additional ambiguous digits with varying ambiguity. The AmbiguousMNIST test set contains 60k ambiguous samples as well.

## Additional Guidance

1. DirtyMNIST is a concatenation of MNIST + AmbiguousMNIST, with 60k samples each in the training set.
2. The current AmbiguousMNIST contains 6k unique samples with 10 labels each. This multi-label dataset gets flattened to 60k samples. The assumption is that ambiguous samples have multiple "valid" labels as they are ambiguous. MNIST samples are intentionally undersampled (in comparison), which benefits AL acquisition functions that can select unambiguous samples.
3. Pick your initial training samples (for warm starting Active Learning) from the MNIST half of DirtyMNIST to avoid starting training with potentially very ambiguous samples, which might add a lot of variance to your experiments.
4. Make sure to pick your validation set from the MNIST half as well, for the same reason as above.
5. Make sure that your batch acquisition size is >= 10 (probably) given that there are 10 multi-labels per samples in Ambiguous-MNIST.
6. By default, Gaussian noise with stddev 0.05 is added to each sample to prevent acquisition functions (in Active Learning) from cheating by disgarding "duplicates".
7. If you want to split Ambiguous-MNIST into subsets (or Dirty-MNIST within the second ambiguous half), make sure to split by multiples of 10 to avoid splits within a flattened multi-label sample.

Homepage URL (optional):

Paper where the dataset was introduced:

Introduction date:

Dataset license:

URL to full license terms:

Image

Currently

datasets/Screen_Shot_2021-06-11_at_16.50.08.png Clear

Change

---

Dirty-MNIST

Additional Guidance

Benchmarks

Add a new result Link an existing benchmark

Papers

Dataset Loaders

Add Remove

Tasks

Similar Datasets

QMNIST

Synbols

ImageNet-O

Usage

License

Modalities

Languages