Explore solutions for thousands of machine learning tasks generated using automated data science systems.
The ML Bazaar is a new framework developed by MIT's Data To AI Lab for more easily developing ML and AutoML systems. It encompasses a number of-open source ML libraries providing functionality including: ML primitives, ML pipelines, pipeline templates, datasets, task types, tasks, tuners, acquisition functions, selectors and other AutoML components — a bazaar of sorts that incorporates decades of work in machine learning.
We are presenting the best-so-far end-to-end machine learning solutions generated by our automated system for 456 dataset-task pairs, as well as the underlying dataset and task definitions.
We cover 15 different machine learning task types ranging from simple single table classification, to image regression, to vertex nomination.
ML Bazaar encompasses the following open-source libraries:
You can programmatically access the datasets (stored and organized in the standard D3M format) and about 3.1 million pipelines via piex and mit-d3m, our libraries for pipeline analysis and dataset loading. We are adding millions more.
The ML Bazaar Task Suite, including 456 datasets and task definitions, is available for academic and non-commercial usage. Please email mlbazaar@mit.edu to request access and explain your intended usage. Our collection of scored pipelines can be accessed without restrictions.
Contributions of all types are welcome to all of our open-source libraries. In particular, you can also contribute individual primitives, pipelines, and hyperparameter configuration spaces. Once you follow the steps and the code is validated, we will run the experiments and return results. If your pipeline does better, we will update the website here and credit you/your team with it.
We are motivated to provide a place where any practitioner can go and find validated competitive pipeline templates for their machine learning task.
Please read our full paper about ML Bazaar, published at SIGMOD 2020, or check out software libraries, documentation, and tutorials!
Please reference the following citation
@inproceedings{smith2020mlbazaar,
author = {Smith, Micah J. and Sala, Carles and Kanter, James Max and Veeramachaneni, Kalyan},
title = {The Machine Learning Bazaar: Harnessing the ML Ecosystem for Effective System Development},
booktitle = {Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data},
series = {SIGMOD '20},
year = {2020},
doi = {10.1145/3318464.3386146},
url = {https://doi.org/10.1145/3318464.3386146},
location = {Portland, OR, USA},
numpages = {16},
publisher = {ACM},
address = {New York, NY, USA},
}