About ML Bazaar

What is the Machine Learning Bazaar?

The ML Bazaar is a new framework developed by MIT's Data To AI Lab for more easily developing ML and AutoML systems. It encompasses a number of-open source ML libraries providing functionality including: ML primitives, ML pipelines, pipeline templates, datasets, task types, tasks, tuners, acquisition functions, selectors and other AutoML components — a bazaar of sorts that incorporates decades of work in machine learning.

What is this website showing?

We are presenting the best-so-far end-to-end machine learning solutions generated by our automated system for 456 dataset-task pairs, as well as the underlying dataset and task definitions.

How many task types have you covered?

We cover 15 different machine learning task types ranging from simple single table classification, to image regression, to vertex nomination.

What open source libraries are you releasing?

ML Bazaar encompasses the following open-source libraries:

MLPrimitives, a specification for and collection of primitives for machine learning and data science
MLBlocks, a pipeline execution library
BTB, a simple, extensible library for AutoML
AutoBazaar, a reliable, end-to-end, general purpose AutoML system built using our own framework
piex, a library for analysis and meta-analysis of scored pipelines

How can I programmatically explore datasets, pipelines, and pipeline templates?

You can programmatically access the datasets (stored and organized in the standard D3M format) and about 3.1 million pipelines via piex and mit-d3m, our libraries for pipeline analysis and dataset loading. We are adding millions more.

How can I request access to the datasets?

The ML Bazaar Task Suite, including 456 datasets and task definitions, is available for academic and non-commercial usage. Please email mlbazaar@mit.edu to request access and explain your intended usage. Our collection of scored pipelines can be accessed without restrictions.

How can I contribute?

Contributions of all types are welcome to all of our open-source libraries. In particular, you can also contribute individual primitives, pipelines, and hyperparameter configuration spaces. Once you follow the steps and the code is validated, we will run the experiments and return results. If your pipeline does better, we will update the website here and credit you/your team with it.

@inproceedings{smith2020mlbazaar,
   author = {Smith, Micah J. and Sala, Carles and Kanter, James Max and Veeramachaneni, Kalyan},
   title = {The Machine Learning Bazaar: Harnessing the ML Ecosystem for Effective System Development},
   booktitle = {Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data},
   series = {SIGMOD '20},
   year = {2020},
   doi = {10.1145/3318464.3386146},
   url = {https://doi.org/10.1145/3318464.3386146},
   location = {Portland, OR, USA},
   numpages = {16},
   publisher = {ACM},
   address = {New York, NY, USA},
}

Machine Learning Bazaar

About ML Bazaar

What is the Machine Learning Bazaar?

What is this website showing?

How many task types have you covered?

What open source libraries are you releasing?

How can I programmatically explore datasets, pipelines, and pipeline templates?

How can I request access to the datasets?

How can I contribute?

Is this just an academic exercise?

How can I learn more about ML Bazaar?

How can I cite ML Bazaar?

Datasets and Tasks