This content originally appeared on DEV Community and was authored by Thomas Künneth
Welcome to the second part of On building a digital assistant for the rest of us. Last time, I explained a couple of terms and showed you the first increment of viewfAInder, the project we will be building in this series. That version uses camerax
to obtain continuous preview images. Tapping on the screen sends the image to Gemini and ask the LLM to provide a detailed description. In this part of the series, we refine the user experience: the user will be able to highlight an area of the image. If they do so, the app will ask Gemini to focus on the selection.
Before we dive in, allow me to highlight the power of Gemini by pointing out an omission in my source code. The following snippet shows how camerax
is used to obtain a preview and an image analyzer (which provides the images we are sending to the LLM).
val previewView = PreviewView(ctx)
val executor = ContextCompat.getMainExecutor(ctx)
cameraProviderFuture.addListener({
val cameraProvider = cameraProviderFuture.get()
val preview = Preview.Builder().build().also {
it.setSurfaceProvider(previewView.surfaceProvider)
}
val imageAnalyzer = ImageAnalysis.Builder()
.setBackpressureStrategy(
ImageAnalysis.STRATEGY_KEEP_ONLY_LATEST).build().also {
it.setAnalyzer(executor) { imageProxy ->
setBitmap(imageProxy.toBitmap())
imageProxy.close()
}
}
try {
cameraProvider.unbindAll()
cameraProvider.bindToLifecycle(
lifecycleOwner, CameraSelector.DEFAULT_BACK_CAMERA,
preview, imageAnalyzer
)
} catch (e: Exception) {
// Handle exceptions, e.g., log the error
}
}, executor)
It works pretty well. However, while debugging the ViewModel, I found out that I missed something:
While the preview works flawlessly, the analysis image isn't rotated properly. Still, Gemini can describe its contents. Pretty cool. Anyway, let's fix it.
Aligning the device orientation with the camera sensor orientation isn't one of the most self-explanatory tasks on Android. The appearance of foldables makes it even more challenging. Google provides great guidance on how to deal with it properly. To make sure our fix works, we can add a preview of the analysis image. The ViewModel needs just one line of code.
val bitmap = _bitmap.asStateFlow()
We can make use of this property in MainScreen()
. Instead of
else -> {}
we would have
is UiState.Initial -> {
val bitmap by viewModel.bitmap.collectAsState()
bitmap?.let {
Image(
bitmap = it.asImageBitmap(),
contentDescription = null,
contentScale = ContentScale.Inside,
modifier = Modifier
.align(Alignment.TopStart)
.safeContentPadding()
.size(200.dp)
)
}
}
Now let's turn to the rotation fix. While browsing through my source code, you may have wondered why quite a bit of camerax
setup is done inside the CameraPreview()
composable. Doing a tutorial is always a trade-off. Sometimes you need to cut corners to keep things comprehendable. While configuring camerax
inside the factory function of AndroidView
works, onCreate()
is a more natural place. Here's the refactored code:
override fun onCreate(savedInstanceState: Bundle?) {
super.onCreate(savedInstanceState)
val executor = ContextCompat.getMainExecutor(this)
val previewView = PreviewView(this)
val future = ProcessCameraProvider.getInstance(this)
enableEdgeToEdge()
setContent {
MaterialTheme {
Surface(
modifier = Modifier.fillMaxSize(),
color = MaterialTheme.colorScheme.background,
) {
val hasCameraPermission by
cameraPermissionFlow.collectAsState()
val mainViewModel: MainViewModel = viewModel()
val uiState by mainViewModel.uiState.collectAsState()
LaunchedEffect(future) {
setupCamera(
future = future,
lifecycleOwner = this@MainActivity,
previewView = previewView,
executor = executor,
rotation = display.rotation
) { mainViewModel.setBitmap(it) }
}
MainScreen(uiState = uiState,
hasCameraPermission = hasCameraPermission,
previewView = previewView,
askGemini = { mainViewModel.askGemini() },
reset = { mainViewModel.reset() })
}
}
}
}
All variables related to camerax
are now defined inside onCreate()
and passed to setupCamera()
.
private fun setupCamera(
future: ListenableFuture<ProcessCameraProvider>,
lifecycleOwner: LifecycleOwner,
previewView: PreviewView,
executor: Executor,
rotation: Int,
setBitmap: (Bitmap?) -> Unit
) {
future.addListener({
val cameraProvider = future.get()
val preview = Preview.Builder().build().also {
it.setSurfaceProvider(previewView.surfaceProvider)
}
val imageAnalyzer = ImageAnalysis.Builder()
.setBackpressureStrategy(
ImageAnalysis.STRATEGY_KEEP_ONLY_LATEST)
.build()
.also {
it.targetRotation = rotation
it.setAnalyzer(executor) { imageProxy ->
val matrix = Matrix().also { matrix ->
matrix.postRotate(
imageProxy.imageInfo.rotationDegrees.toFloat())
}
val bitmap = imageProxy.toBitmap()
val rotatedBitmap = Bitmap.createBitmap(
bitmap, 0, 0, bitmap.width, bitmap.height, matrix, true
)
setBitmap(rotatedBitmap)
imageProxy.close()
}
}
try {
cameraProvider.unbindAll()
cameraProvider.bindToLifecycle(
lifecycleOwner,
CameraSelector.DEFAULT_BACK_CAMERA,
preview, imageAnalyzer
)
} catch (e: Exception) {
// Handle exceptions, e.g., log the error
}
}, executor)
}
Can you spot the rotation fix? Passing the device rotation (rotation
) to setupCamera()
and assigning it to targetRotation
makes sure that imageProxy.imageInfo.rotationDegrees
provides the correct value for matrix.postRotate()
. The final step is to create a new bitmap based on the old one, but applying a matrix that does the rotation.
Drawing shapes
To implement a Circle to search-like feature, the user must be able to draw on screen. When the object (circle, box) is complete, the app must apply it to the analysis bitmap and then send it to Gemini. Here's how a simple drawing area could be implemented:
@Composable
fun DrawingArea(drawComplete: (IntSize, List<Offset>) -> Unit) {
val points = remember { mutableStateListOf<Offset>() }
Canvas(modifier = Modifier
.fillMaxSize()
.pointerInput(Unit) {
awaitPointerEventScope {
while (true) {
val event = awaitPointerEvent()
val touch = event.changes.first()
points.add(touch.position)
if (!touch.pressed) {
if (points.size > 2) {
drawComplete(size, points.toList())
}
points.clear()
}
}
}
}) {
if (points.size > 2) {
drawPath(
path = Path().apply {
moveTo(points[0].x, points[0].y)
for (i in 1..points.lastIndex) {
lineTo(points[i].x, points[i].y)
}
close()
},
color = DRAWING_COLOR,
style = Stroke(width = STROKE_WIDTH)
)
} else {
points.forEach { point ->
drawCircle(
color = DRAWING_COLOR,
center = point,
radius = STROKE_WIDTH / 2F
)
}
}
}
}
Using Canvas()
, we draw a closed path (drawPath()
) once we have received at least three points (represented by Offset
instances. Until then we plot individual circles (drawCircle()
). When the user stops drawing, we pass the list to a callback (drawComplete
). In addition, we need to pass the measured size of the pointer input region, because camerax
analysis bitmaps may have other sizes than the preview. Here's how the drawing area is composed and what happens with the list of Offset
s:
DrawingArea { size, offsets ->
viewModel.getCopyOfBitmap()?.let {
val xRatio = it.width.toFloat() / size.width.toFloat()
val yRatio = it.height.toFloat() / size.height.toFloat()
val scaledOffsets = offsets.map { point ->
PointF(point.x * xRatio, point.y * yRatio)
}
val canvas = Canvas(it)
val path = android.graphics.Path()
if (scaledOffsets.isNotEmpty()) {
path.moveTo(scaledOffsets[0].x, scaledOffsets[0].y)
for (i in 1 until scaledOffsets.size) {
path.lineTo(scaledOffsets[i].x, scaledOffsets[i].y)
}
path.close()
}
canvas.drawPath(path, Paint().apply {
style = Paint.Style.STROKE
strokeWidth = STROKE_WIDTH
color = DRAWING_COLOR.toArgb()
})
viewModel.askGemini(it)
}
}
Once the list of Offset
s has been scaled to fit the bitmap size, a Path
instance is created, populated (moveTo()
, lineTo()
) and closed. It is then drawn onto a Canvas
which contains the bitmap. askGemini()
sends the bitmap including the drawing to Gemini.
Talking to Gemini
The askGemini()
function immediately calls this one:
private fun sendPrompt(bitmap: Bitmap) {
_uiState.update { UiState.Loading }
viewModelScope.launch(Dispatchers.IO) {
try {
val response = generativeModel.generateContent(content {
image(bitmap)
text(prompt)
})
response.text?.let { outputContent ->
_uiState.value = UiState.Success(outputContent)
}
} catch (e: Exception) {
_uiState.value = UiState.Error(e.localizedMessage ?: "")
}
}
}
We just pass two things, the bitmap and the prompt. The latter one looks like this:
private const val prompt = """
Please describe what is contained inside the area of the image
that is surrounded by a red line. If possible, add web links with
additional information
"""
This concludes the second part of this series about building a digital assistant for the rest of us. The GitHub repo contains two tags, part_one
and part_two
. Development takes place on main
. The next part will further refine the user interface and install the app as an assistant on Android.
This content originally appeared on DEV Community and was authored by Thomas Künneth
Thomas Künneth | Sciencx (2024-09-10T14:39:40+00:00) On building a digital assistant for the rest of us (part 2). Retrieved from https://www.scien.cx/2024/09/10/on-building-a-digital-assistant-for-the-rest-of-us-part-2/
Please log in to upload a file.
There are no updates yet.
Click the Upload button above to add an update.