Machine Learning for Systems

21.06.2019 14:30-15:30

Machine Learning for Systems

21.06.2019, 14:30

Speaker: Azalia Mirhoseini, Ph.D. | Location: Hochschulstraße 10 (S2|02), Piloty Building, Room C205, Darmstadt

Organizer: System Security Lab


Abstract
In this talk, I will present some of our recent work at the intersection of machine learning and systems. First, I discuss our work on the sparsely gated mixture of experts, a new conditional neural network architecture that allows us to train models with 130B+ parameters (10x larger than any previous model) on datasets with 100B+ examples. This architecture uses an intelligent gating mechanism that routes input examples to a subset of the modules (“experts”) within the larger model. Even with a moderate number of parameters, this model runs 2-3x faster than top-performing baselines and sets a new state of the art in machine translation and language modeling. Next, I discuss our work on deep reinforcement learning models that learn to do resource allocation, a combinatorial optimization problem that repeatedly appears in computer systems design and operation. Our method is end-to-end and abstracts away the complexity of the underlying optimization space; the RL agent learns the implicit tradeoffs between computation and communication of the underlying resources and optimizes the allocation using only the true reward function (e.g., the runtime of the generated allocation). The complexity of our search space is on the order of 9^80000, compared to 10^360 states for Go (solved by AlphaGo). Finally, I discuss our work on deep models that learn to find solutions for the classic problem of balanced graph partitioning with minimum edge cuts. We define an unsupervised loss function and use neural graph representations to adaptively learn partitions based on the graph topology. Our method enables generalization; we can train models that produce performant partitions at inference time on new unseen graphs. The generalization significantly speeds up the partitioning process over all existing baselines, where the problem is solved from scratch for each new graph.

zur Liste