Optimization for efficient data communication in distributed machine training system
Gan, Hsien Yan
Date of Issue2017-12-12
School of Computer Science and Engineering
The rising trend of deep learning causes the complexity and scale of machine learning to increase exponentially. But, the complexity is limited by hardware processing speed. To solve the issue, there are a few machine learning frameworks online, which support distributed training on multiple nodes. Compared to interprocess communication, data exchange between nodes is relatively slow, high latency and high overhead cost. When the network link is shared among multiple nodes, limited bandwidth arises, which is a more undesirable property. This project is to minimize the data flow between nodes by adding a data filter and Snappy compression. The filter reduces the unnecessary data flow while the Snappy does data compression to reduce bandwidth consumption. This implementation successfully reduces the data flow to 8 percent and decrease training time to 76 percent. Due to the low required bandwidth, distributed system on different geographical area and hardware such as a mobile laptop is possible.
DRNTU::Engineering::Computer science and engineering
Final Year Project (FYP)
Nanyang Technological University