Search Results for author: Wonkyo Choe

STI: Turbocharge NLP Inference at the Edge via Elastic Pipelining

Yet, the unprecedented size of an NLP model stresses both latency and memory, creating a tension between the two key resources of a mobile device.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.