Skip to content
#

multimodal-ai

Here are 27 public repositories matching this topic...

LocalineAI

LocalineAI brings powerful AI capabilities directly to your Windows terminal while keeping your data completely private and secure. No cloud dependencies, no data sharing - just pure AI power at your fingertips.

  • Updated May 24, 2025
LocalineAI

LocalineAI brings powerful AI capabilities directly to your Windows terminal while keeping your data completely private and secure. No cloud dependencies, no data sharing - just pure AI power at your fingertips.

  • Updated May 24, 2025
LocalineAI

LocalineAI brings powerful AI capabilities directly to your Windows terminal while keeping your data completely private and secure. No cloud dependencies, no data sharing - just pure AI power at your fingertips.

  • Updated May 24, 2025
LocalineAI

LocalineAI brings powerful AI capabilities directly to your Windows terminal while keeping your data completely private and secure. No cloud dependencies, no data sharing - just pure AI power at your fingertips.

  • Updated May 24, 2025

AI-powered tool to turn long videos into short, viral-ready clips. Combines transcription, speaker diarization, scene detection & 9:16 resizing — perfect for creators & smart automation.

  • Updated May 31, 2025
  • Python

ChatGPT said: Generative AI (Gen AI) is a branch of artificial intelligence that creates new content such as text, images, audio, or code using models like GPT or Gemini. It powers applications like AI chatbots, image generation tools, and creative assistants across various industries.

  • Updated Apr 23, 2025
  • Jupyter Notebook

This repository contains code for fine-tuning Google's PaliGemma vision-language model on the Flickr8k dataset for image captioning tasks

  • Updated May 25, 2025
  • Jupyter Notebook

LocalineAI brings powerful AI capabilities directly to your Windows terminal while keeping your data completely private and secure. No cloud dependencies, no data sharing - just pure AI power at your fingertips.

  • Updated May 24, 2025

The teaches you to integrate text, images, and videos into applications using Gemini's state-of-the-art multimodal models. Learn advanced prompting techniques, cross-modal reasoning, and how to extend Gemini's capabilities with real-time data and API integration.

  • Updated Sep 2, 2024

Improve this page

Add a description, image, and links to the multimodal-ai topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the multimodal-ai topic, visit your repo's landing page and select "manage topics."

Learn more

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy