1 code implementation • 16 Apr 2024 • TaeHo Kim, Yanming Wang, Vatshank Chaturvedi, Lokesh Gupta, Seyeon Kim, Yongin Kwon, Sangtae Ha
Fine-tuning pre-trained large language models (LLMs) with limited hardware presents challenges due to GPU memory constraints.
no code implementations • 6 Apr 2023 • Rafael Sousa, Marcio Pereira, Yongin Kwon, TaeHo Kim, Namsoon Jung, Chang Soo Kim, Michael Frank, Guido Araujo
Although code generation for Convolution Neural Network (CNN) models has been extensively studied, performing efficient data slicing and parallelization for highly-constrai\-ned Multicore Neural Processor Units (NPUs) is still a challenging problem.
1 code implementation • 22 Mar 2023 • Jemin Lee, Yongin Kwon, Sihyeong Park, Misun Yu, Jeman Park, Hwanjun Song
For mobile devices, achieving optimal acceleration for ViTs necessitates the strategic integration of quantization techniques and efficient hybrid transformer structures.
1 code implementation • 4 Jul 2022 • Yongin Kwon, Jemin Lee, TaeHo Kim, Sangtae Ha
We propose CPrune, a compiler-informed model pruning for efficient target-aware DNN execution to support an application with a required target accuracy.
no code implementations • 10 Feb 2022 • Jemin Lee, Misun Yu, Yongin Kwon, TaeHo Kim
To adopt convolutional neural networks (CNN) for a range of resource-constrained targets, it is necessary to compress the CNN models by performing quantization, whereby precision representation is converted to a lower bit representation.