This content originally appeared on DEV Community and was authored by Dhanush Reddy
What is ComfyUI?
ComfyUI is a powerful and flexible user interface for Stable Diffusion, allowing users to create complex image generation workflows through a node-based system. While ComfyUI comes with a variety of built-in nodes, its true strength lies in its extensibility. Custom nodes enable users to add new functionality, integrate external services, and tailor it to their specific needs.
In this blog post, we will walk through the process of creating a custom node for image captioning using ComfyUI. This node will take an image as input and return a generated caption using an external API.
We will be using Google Gemini API for generating the caption of an image.
So here is the entire code which does the ImageCaptioning using Gemini API.
You can copy the following code into any file under the custom_nodes
folder in ComfyUI, I have named mine as gemini-caption.py
Complete code for Generating Image Captions
import numpy as np
from PIL import Image
import requests
import io
import base64
class ImageCaptioningNode:
@classmethod
def INPUT_TYPES(s):
return {
"required": {"image": ("IMAGE",), "api_key": ("STRING", {"default": ""})}
}
RETURN_TYPES = ("STRING",)
FUNCTION = "caption_image"
CATEGORY = "image"
OUTPUT_NODE = True
def caption_image(self, image, api_key):
# Convert the image tensor to a PIL Image
image = Image.fromarray(
np.clip(255.0 * image.cpu().numpy().squeeze(), 0, 255).astype(np.uint8)
)
# Convert the image to base64
buffered = io.BytesIO()
image.save(buffered, format="PNG")
img_str = base64.b64encode(buffered.getvalue()).decode()
api_url = f"https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash:generateContent?key={api_key}"
payload = {
"contents": [
{
"parts": [
{
"text": "Generate a caption for this image in as detail as possible. Don't send anything else apart from the caption."
},
{"inline_data": {"mime_type": "image/png", "data": img_str}},
]
}
]
}
# Send the request to the Gemini API
try:
response = requests.post(api_url, json=payload)
response.raise_for_status()
caption = response.json()["candidates"][0]["content"]["parts"][0]["text"]
except requests.exceptions.RequestException as e:
caption = f"Error: Unable to generate caption. {str(e)}"
print(caption)
return (caption,)
NODE_CLASS_MAPPINGS = {"ImageCaptioningNode": ImageCaptioningNode}
Here is how the node looks on the UI:
Let's go over it line by line, to get an understanding how do we go about creating a similar node for your use case. First of all whatever node you want to create, make it as a function, so you can call it just in the same way in ComfyUI, as I did here for my caption_image
function.
Import the necessary libraries needed
import numpy as np
from PIL import Image
import requests
import io
import base64
These lines import the necessary libraries for my Image Captioning node:
-
numpy
for numerical operations -
PIL
(Python Imaging Library) for image processing -
requests
for making HTTP requests to Gemini API -
io
for handling byte streams -
base64
for encoding the image
Defining the ClassName for your ComfyUI node
class ImageCaptioningNode:
@classmethod
def INPUT_TYPES(s):
return {
"required": {"image": ("IMAGE",), "api_key": ("STRING", {"default": ""})}
}
In my case, I have named it as ImageCaptioningNode as it does what is says.
The class method defines the input types for our node:
- An "image" input of type "IMAGE"
- An "api_key" input of type "STRING" with a default empty value, needed for sending API requests to Gemini API.
RETURN_TYPES = ("STRING",)
FUNCTION = "caption_image"
CATEGORY = "image"
OUTPUT_NODE = True
These class variables define:
- The return type (a string)
- The main function to be called ("caption_image")
- The category in which the node will appear in ComfyUI
- That this node can be an output node
def caption_image(self, image, api_key):
# Convert the image tensor to a PIL Image
image = Image.fromarray(
np.clip(255.0 * image.cpu().numpy().squeeze(), 0, 255).astype(np.uint8)
)
# Convert the image to base64
buffered = io.BytesIO()
image.save(buffered, format="PNG")
img_str = base64.b64encode(buffered.getvalue()).decode()
api_url = f"https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash:generateContent?key={api_key}"
# Prepare the request payload
payload = {
"contents": [
{
"parts": [
{
"text": "Generate a caption for this image in as detail as possible. Don't send anything else apart from the caption."
},
{"inline_data": {"mime_type": "image/png", "data": img_str}},
]
}
]
}
try:
response = requests.post(api_url, json=payload)
response.raise_for_status()
caption = response.json()["candidates"][0]["content"]["parts"][0]["text"]
except requests.exceptions.RequestException as e:
caption = f"Error: Unable to generate caption. {str(e)}"
print(caption)
return (caption,)
This is a standalone function which I have written that takes an Image as input, and sends it to Gemini API using the API key. The code is straightforward, we are just doing base64 encoding so image gets sent via API. We instruct Gemini to caption the image in detail using the prompt. The response from API is parsed, and printed in the console and returned as a tuple (required by ComfyUI).
NODE_CLASS_MAPPINGS = {"ImageCaptioningNode": ImageCaptioningNode}
This dictionary maps the class name to the class itself, which is used by ComfyUI to register the custom node.
To conclude your article on creating a custom ComfyUI node, you can summarize the key points and provide some final thoughts. Here's a suggested conclusion:
Conclusion:
Creating custom nodes for ComfyUI opens up a world of possibilities for extending and enhancing your image generation workflows. In this article, we've walked through the process of building a custom image captioning node, demonstrating how to:
- Define input and output types
- Integrate with external APIs (in this case, the Gemini API for image captioning)
By following these steps, you can create your own custom nodes to add virtually any functionality you need to ComfyUI. Whether you're integrating new LLM models, adding specialized image processing techniques, or creating shortcuts for common tasks, custom nodes allow you to tailor ComfyUI to your specific requirements.
Remember that while we've focused on image captioning in this example, the same principles can be applied to create nodes for a wide variety of tasks. The key is to understand the structure of a ComfyUI node and how to interface with the expected inputs and outputs.
In case if you still have any questions regarding this post or want to discuss something with me feel free to connect on LinkedIn or Twitter.
If you run an organization and want me to write for you, please connect with me on my Socials 🙃
This content originally appeared on DEV Community and was authored by Dhanush Reddy
Dhanush Reddy | Sciencx (2024-07-28T11:16:05+00:00) How to create custom nodes in ComfyUI. Retrieved from https://www.scien.cx/2024/07/28/how-to-create-custom-nodes-in-comfyui/
Please log in to upload a file.
There are no updates yet.
Click the Upload button above to add an update.