Microsoft gives Distributed Machine Learning Toolkit to the open source community

dmtkWhile Microsoft may be looked at as the enemy of open source, it actually does contribute to the cause. In fact, I am comfortable saying that the company embraces open source; although closed source ideology will always be its "bread and butter".

Today, the Windows-maker announces that it is making yet another one of its projects open source. The Distributed Machine Learning Toolkit seems quite interesting and could prove valuable.

"The toolkit, available now on GitHub, is designed for distributed machine learning -- using multiple computers in parallel to solve a complex problem. It contains a parameter server-based programing framework, which makes machine learning tasks on big data highly scalable, efficient and flexible. It also contains two distributed machine learning algorithms, which can be used to train the fastest and largest topic model and the largest word-embedding model in the world", says George Thomas Jr., Microsoft.

Thomas further explains, "the toolkit is unique because its features transcend system innovations by also offering machine learning advances, the researchers said. With the toolkit, the researchers said developers can tackle big-data, big-model machine learning problems much faster and with smaller clusters of computers than previously required".

Microsoft shares the following components of the project.

  • DMTK framework: A parameter server, which supports storing a hybrid data-structure model, and a client SDK, which supports scheduling client-side, large-scale model training and maintaining a local model cache syncing with the parameter server side model.
  • LightLDA: A new, highly efficient algorithm for topic model training that can process large-scale data and model even on a modest computer cluster.
  • Distributed Word Embedding: A popular tool used in natural language processing, the toolkit offers the distributed implementations of two algorithms for word embedding: The standard Word2vec algorithm and a multi-sense algorithm that learns multiple embedding vectors for polysemous words.

dmkt-client-sdk_550

If you want to give it a look, you can access the toolkit here. If you want to see all of Microsoft's open source projects, and see just how involved in OSS the company is, simply click here. Heck, if you want to reach out to Microsoft about open source issues or offerings, you can email it at [email protected].

7 Responses to Microsoft gives Distributed Machine Learning Toolkit to the open source community

© 1998-2024 BetaNews, Inc. All Rights Reserved. Privacy Policy - Cookie Policy.