Machine Learning Bazaar

Explore solutions for thousands of machine learning tasks generated using automated data science systems.

About ML Bazaar

What is the Machine Learning Bazaar?

The ML Bazaar is a new framework developed by MIT's Data To AI Lab for more easily developing ML and AutoML systems. It encompasses a number of-open source ML libraries providing functionality including: ML primitives, ML pipelines, pipeline templates, datasets, task types, tasks, tuners, acquisition functions, selectors and other AutoML components — a bazaar of sorts that incorporates decades of work in machine learning.

What is this website showing?

We are presenting the best-so-far end-to-end machine learning solutions generated by our automated system for 456 dataset-task pairs, as well as the underlying dataset and task definitions.

How many task types have you covered?

We cover 15 different machine learning task types ranging from simple single table classification, to image regression, to vertex nomination.

What open source libraries are you releasing?

ML Bazaar encompasses the following open-source libraries:

How can I programmatically explore datasets, pipelines, and pipeline templates?

You can programmatically access the datasets (stored and organized in the standard D3M format) and about 3.1 million pipelines via piex and mit-d3m, our libraries for pipeline analysis and dataset loading. We are adding millions more.

How can I request access to the datasets?

The ML Bazaar Task Suite, including 456 datasets and task definitions, is available for academic and non-commercial usage. Please email mlbazaar@mit.edu to request access and explain your intended usage. Our collection of scored pipelines can be accessed without restrictions.

How can I contribute?

Contributions of all types are welcome to all of our open-source libraries. In particular, you can also contribute individual primitives, pipelines, and hyperparameter configuration spaces. Once you follow the steps and the code is validated, we will run the experiments and return results. If your pipeline does better, we will update the website here and credit you/your team with it.

Is this just an academic exercise?

We are motivated to provide a place where any practitioner can go and find validated competitive pipeline templates for their machine learning task.

How can I learn more about ML Bazaar?

Please read our full paper about ML Bazaar, published at SIGMOD 2020, or check out software libraries, documentation, and tutorials!

How can I cite ML Bazaar?

Please reference the following citation

Reveal Citation

@inproceedings{smith2020mlbazaar,
   author = {Smith, Micah J. and Sala, Carles and Kanter, James Max and Veeramachaneni, Kalyan},
   title = {The Machine Learning Bazaar: Harnessing the ML Ecosystem for Effective System Development},
   booktitle = {Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data},
   series = {SIGMOD '20},
   year = {2020},
   doi = {10.1145/3318464.3386146},
   url = {https://doi.org/10.1145/3318464.3386146},
   location = {Portland, OR, USA},
   numpages = {16},
   publisher = {ACM},
   address = {New York, NY, USA},
}

Datasets and Tasks