Offloading AI inference to your users’ devices

This content originally appeared on DEV Community and was authored by Miguel Ángel Cabrera Miñagorri

Integrating LLMs in existing web applications is becoming the norm. Also, there are more and more AI native companies. These create autonomous agents putting the LLM in the center and giving it tools allowing it to perform actions on different systems.

In this post I will present a new project called Offload, which allows you to move all that processing to the user devices, increasing their data privacy and reducing the inference costs.

The 2 problems

The are two big concerns when integrating AI in an application: Cost and user data privacy.

1. Cost. The typical way to connect an LLM is to use a third-party API, like OpenAI, Anthropic, or others, there are many alternatives in the market. These APIs are very practical, with just an HTTP request you can easily integrate an LLM into your application. However, these APIs are expensive at scale. They are putting big efforts into reducing the cost, but if you make many API calls per user per day the bill becomes huge.

2. User data privacy. Using third-party APIs for inference is not the best alternative if you work with sensitive user data. These APIs often use the data you send to continue training the model which can expose your confidential data. Also, the data could become visible at some level when it reaches the third-party API provider (for example in a logging system). This is not just a problem for companies, but also for consumers that may not want to send their data to those API providers.

Addressing them

Offload addresses both problems at once. The application "invokes" the LLM via an SDK that behind the scenes runs the model directly on each user device instead of calling a third-party API. This saves money on the inference bill because you do not need to pay for API usage and maintain the user data within each user device, not needing to send it to any API.

If this is of your interest and want to remain in the loop, check out the Offload website here

This content originally appeared on DEV Community and was authored by Miguel Ángel Cabrera Miñagorri

Print Share Comment Cite Upload Translate Updates

APA

Miguel Ángel Cabrera Miñagorri | Sciencx (2024-09-12T17:32:18+00:00) Offloading AI inference to your users’ devices. Retrieved from https://www.scien.cx/2024/09/12/offloading-ai-inference-to-your-users-devices/

MLA

" » Offloading AI inference to your users’ devices." Miguel Ángel Cabrera Miñagorri | Sciencx - Thursday September 12, 2024, https://www.scien.cx/2024/09/12/offloading-ai-inference-to-your-users-devices/

HARVARD

Miguel Ángel Cabrera Miñagorri | Sciencx Thursday September 12, 2024 » Offloading AI inference to your users’ devices., viewed ,<https://www.scien.cx/2024/09/12/offloading-ai-inference-to-your-users-devices/>

VANCOUVER

Miguel Ángel Cabrera Miñagorri | Sciencx - » Offloading AI inference to your users’ devices. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2024/09/12/offloading-ai-inference-to-your-users-devices/

CHICAGO

" » Offloading AI inference to your users’ devices." Miguel Ángel Cabrera Miñagorri | Sciencx - Accessed . https://www.scien.cx/2024/09/12/offloading-ai-inference-to-your-users-devices/

IEEE

" » Offloading AI inference to your users’ devices." Miguel Ángel Cabrera Miñagorri | Sciencx [Online]. Available: https://www.scien.cx/2024/09/12/offloading-ai-inference-to-your-users-devices/. [Accessed: ]

rf:citation

» Offloading AI inference to your users’ devices | Miguel Ángel Cabrera Miñagorri | Sciencx | https://www.scien.cx/2024/09/12/offloading-ai-inference-to-your-users-devices/ |

Please log in to upload a file.

There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.

The 2 problems

Addressing them

Related Posts