Selected Publications
Deep Learning Systems
Zhang, Chengming, Tong Geng, Anqi Guo, Jiannan Tian, Martin Herbordt, Ang Li, and Dingwen Tao. “H-gcn: A graph convolutional network accelerator on versal acap architecture.” In 2022 32nd International Conference on Field-Programmable Logic and Applications (FPL), pp. 200-208. IEEE, 2022.
Zhang, Chengming, Geng Yuan, Wei Niu, Jiannan Tian, Sian Jin, Donglin Zhuang, Zhe Jiang et al. “Clicktrain: Efficient and accurate end-to-end deep learning training via fine-grained architecture-preserving pruning.” In Proceedings of the ACM international conference on supercomputing, pp. 266-278. 2021.
Zhang, Chengming, Shaden Smith, Baixi Sun, Jiannan Tian, Jonathan Soifer, Xiaodong Yu, Shuaiwen Leon Song, Yuxiong He, and Dingwen Tao. “HEAT: A Highly Efficient and Affordable Training System for Collaborative Filtering Based Recommendation on CPUs.” In Proceedings of the 37th International Conference on Supercomputing, pp. 324-335. 2023.
Jin, Sian, Chengming Zhang, Xintong Jiang, Yunhe Feng, Hui Guan, Guanpeng Li, Shuaiwen Leon Song, and Dingwen Tao. “Comet: a novel memory-efficient deep learning training framework by using error-bounded lossy compression.” arXiv preprint arXiv:2111.09562 (2021).
Xiang, Lizhi, Miao Yin, Chengming Zhang, Aravind Sukumaran-Rajam, P. Sadayappan, Bo Yuan, and Dingwen Tao. “TDC: Towards extremely efficient CNNs on GPUs via hardware-aware tucker decomposition.” In Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, pp. 260-273. 2023.
Xiao, Jinqi, Chengming Zhang, Yu Gong, Miao Yin, Yang Sui, Lizhi Xiang, Dingwen Tao, and Bo Yuan. “HALOC: hardware-aware automatic low-rank compression for compact neural networks.” In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 9, pp. 10464-10472. 2023.
Dong, Peiyan, Siyue Wang, Wei Niu, Chengming Zhang, Sheng Lin, Zhengang Li, Yifan Gong, Bin Ren, Xue Lin, and Dingwen Tao. “Rtmobile: Beyond real-time mobile acceleration of rnns for speech recognition.” In 2020 57th ACM/IEEE Design Automation Conference (DAC), pp. 1-6. IEEE, 2020.
Data reduction
Zhang, Chengming, Sian Jin, Tong Geng, Jiannan Tian, Ang Li, and Dingwen Tao. “Ceaz: accelerating parallel i/o via hardware-algorithm co-designed adaptive lossy compression.” In Proceedings of the 36th ACM International Conference on Supercomputing, pp. 1-13. 2022.
Tian, Jiannan, Sheng Di, Chengming Zhang, Xin Liang, Sian Jin, Dazhao Cheng, Dingwen Tao, and Franck Cappello. “Wavesz: A hardware-algorithm co-design of efficient lossy compression for scientific data.” In Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 74-88. 2020.