Friday, February 28, 2025

Create AI agents in Google Apps Script with Vertex AI and Gemini




Imagine that you write in plain English what you want to do in Google Workspace (eg. workflows) and it happens just like magic. Insert text prompt, Gemini will generate the code for you and run it immediately. A dream? No, reality, thanks to my conceptual and practical idea of how to implement AI Agents in Google Apps Scripts to leverage the V8 runtime.



Google Apps Script


Google Apps Script lets you connect and automate Google Workspace services (like Gmail, Docs, and Drive) by writing JavaScript code in your browser, without needing a separate server. Until now, you had to know programming to create that code.

Gemini


Large Language Models (LLMs) like Gemini are revolutionizing how we interact with technology. Gemini can convert natural language instruction ("prompts") into executable code. Imagine simply describing what you want to automate in plain English, and Gemini generates the code for you.

Vertex AI

Google Cloud's Vertex AI platform offers a powerful tools for working with AI and machine learning models. The Vertex AI Reasoning Engine is a particularly interesting aspect. Its code interpreter allows to run generated code like it would be done in a local environment, but development requires a Python environment and it also becomes more complex to integrate into the Google Workspace environment via APIs. While testing, it occurred to me, could it be simpler?"

Introducing AI Agents for Google Apps Script

In this project, I explored creating a concept for AI Agents. Using a natural language you describe your need, this is then passed to Gemini, leveraging Gemini 2.0 Flash Thinking, to generate the necessary code. The generated code is then fed back into the Google Apps Script environment. The generated code can then execute as part of App Script.

You have to double-check before executing the code. To mitigate this potential risk, a "dry run" function can be included.  This sends the generated code to a smaller model, Gemini 2.0 Flash. The test compares what the code does compared to original task. The results from testing is presented in plain text for verification.




Explanation

1. Open Google Apps Script: The easiest way to start is to simply visit https://script.new in your browser. This will instantly create a new Apps Script project.



2.  You'll need to configure the Apps Script project's manifest file. The manifest defines the settings for your Apps Script project. The manifest specifies the permissions ("scopes") that your application will need to access Google services. Be sure to carefully select and authorize only the necessary scopes for your agent. If you get an error when running, it's because you don't have the necessary permissions and you need to add more scopes.



3. Now I have prepared a function that calls the Gemini API within Vertex AI.




Copy the code into a new .gs file within the Google Apps Script project."

This code describes running an agent. First, it must be configured using the GCP Project and region. Then, you describe in natural language what needs to be done. After running the .act() method, Gemini 2.0 Flash Thinking within Vertex AI is called to generate code.

To ensure that the agent is doing the right thing, you can test the execution via a dry-run, where the code is sent to an internal Tester agent who, via Gemini 2.0 Flash, comments on the code and summarizes it in a log.

If everything is in order, you can then run .run(). I remind you again that you have given the script rights to everything. So if you write something wrong, for example, to delete data, it will actually be deleted. I am not responsible for the results of the script, and you should always review it before running it.

The generated code is stored in the Cache, so after running a dry-run and then a run, the same version will be executed within the Cache limit (currently set to 5 minutes).

  
  ```javascript
  // Define the label name
  const labelName = 'DEMO';

  // Check if the label exists, create if not
  let demoLabel = GmailApp.getUserLabels().find(label => label.getName() === labelName); // Find the label by name
  if (!demoLabel) { // If the label doesn't exist
    demoLabel = GmailApp.createLabel(labelName); // Create the label
    Logger.log(`Label "${labelName}" created.`); // Log that the label was created
  } else { // If the label already exists
    Logger.log(`Label "${labelName}" already exists.`); // Log that the label already exists
  }

  // Load last 10 emails with subject 'Security alert'
  const threads = GmailApp.search('subject:"Security alert"', 0, 10); // Search for threads with the specified subject, starting from the first thread (0) and retrieving a maximum of 10 threads

  // Set the 'DEMO' label to found emails
  if (threads.length > 0) { // If any threads were found
    threads.forEach(thread => { // Iterate over each thread
      thread.addLabel(demoLabel); // Add the 'DEMO' label to the current thread
    });
    Logger.log(`Label "${labelName}" applied to ${threads.length} emails.`); // Log the number of emails the label was applied to
  } else { // If no threads were found
    Logger.log('No emails with subject "Security alert" found in the last 10 threads.'); // Log that no emails were found
  }
```

**Summary:**

The code functions exactly as described in the task:

1.  **Creates a new label 'DEMO' if it doesn't exist:** The code first checks if a label named 'DEMO' already exists. If not, it creates the label.
2.  **Loads the last 10 emails with the subject 'Security alert':** The code then searches for the last 10 emails that have the subject 'Security alert'.
3.  **Sets the 'DEMO' label to the found emails:** Finally, the code iterates through the found emails (represented as threads) and applies the 'DEMO' label to each of them.

The code also includes logging statements to provide information about the actions being performed, which is good practice.
  

Acknowledgments

This project was developed during the Vertex sprints organized by Google’s Developer Expert Program. Google Cloud credits were provided. Thanks, Google, for providing GCP credits for making this happen. #VertexAISprint

Monday, September 30, 2024

Can AI Watch YouTube for You? Automating Insights with Gemini & Google Apps Script

Have you ever wished you could automatically process YouTube research (24/7 nonstop), summarizing key insights and seamlessly integrating them into your existing knowledge base? πŸ€ͺ My favorite author Yuval Noah Harari recently wrote a new book Nexus, I was curious what he mentioned during his interviews on this book. (Spoiler: He says that silicon chips can create spies that never sleep or financers that never forget)

 

This tutorial demonstrates, how to build a research and archiving agent using my favourite Google Apps Script, YouTube Data API, and Google's powerful Gemini multimodal model (Gemini 1.5 Pro 002)

This automated system streamlines the process of discovering, summarizing, and archiving information from YouTube videos, allowing you to focus on analysis and synthesis rather than manual transcription and summarization. 

The system is composed of two primary agents the Researcher and the Librarian. The Researcher searches for relevant videos and saves their metadata. The Librarian processes these videos, generating summaries with Gemini, and appends them to a master Google Docs document. 


1. Create a new Google Apps Script project: 
Navigate to https://script.new/ to create a new Google Apps Script project. 

2. Manifest Configuration: In your Apps Script project, go to Project settings and enable Show "appsscript.json" manifest file in editor. Then, replace the contents of the manifest file with the following scopes: 

 This manifest grants the script necessary permissions to interact with various Google services, including
- YouTube,
- Google Cloud (for Gemini in Vertex AI), 
- Google Docs, and Google Sheet. 

Google Cloud Project: If you don't already have one, create a πŸ†• Google Cloud project at  https://console.cloud.google.com/projectcreate. 


3. Enable the Vertex AI API and YouTube API 


 4. Configuration Constants: Add the following constants to your script, replacing the placeholders with your actual values: 


5. πŸ‘¨‍πŸ”¬ The Researcher Agent (Agent Researcher) 
This agent searches YouTube for videos based on a query and stores relevant information in a Google Sheet. This acts as our "video database".
 


  6. πŸ€“The Librarian Agent (Agent Librarian) 
The Librarian iterates through the video database, summarizes new videos using Gemini, and appends these summaries to a Google Doc. 




7. Calling Gemini API in Vertex AI
 This function handles the interaction with the Gemini API in Vertex AI. It sends requests and parses responses. 


8. Running the System After setting up the project and configuring the necessary parameters, you can run the agents. First, execute agentResearcher() to populate the video database. Then run agentLibrarian() to process and summarize videos. 

This setup leverages the power of LLMs like Gemini to automate time-consuming research tasks, allowing you to efficiently curate and integrate knowledge from YouTube videos directly into your workflowRemember to manage your Google Cloud credits appropriately, as using the Gemini API will incur costs.

Sunday, March 31, 2024

Smart replacing images in Google Slides with Gemini Pro API and Vertex AI

Surely, you have also experienced having a presentation in which you needed to replace old content with new. Replacing text is very simple because you just need to use the Replace function, and you can either do it in the Google Slides user interface.

The problem arises when you need to replace one image with another, for example, if your corporate logo is updated to a new graphic design or if one of your favorite cloud services updates its icons (Gmail, blink blink ;-) It's still somewhat bearable with one presentation, but what do you do when, like me, you have thousands of Google Slides files on your Google Drive?


Fortunately, there are large language models and, specifically, multimodal models that allow input prompts to include images in addition to text. Specifically, with Gemini Pro, you can have up to 16 such images as input. And then the old saying applies that one picture is worth a thousand words :)

I used Gemini Pro for exactly this use case in the Vertex AI service with integration into Google Apps Script, which could connect to my presentation, go through all the slides, and if there was an image containing the old logo, it replaced it with the new logo. I will show you how you can replicate such a procedure yourself, and all you need for it is just a Google Cloud account."


1. Create a new Google Apps Script project https://script.new/


2) Go to Project settings -> and click the checkbox Show "appsscript.json" manifest file in editor


3) Copy the manifest.json below


4) Prepare a Google Cloud project and if you don't have one, create one here: https://console.cloud.google.com/projectcreate 


Then enable the Vertex AI API https://console.cloud.google.com/marketplace/product/google/aiplatform.googleapis.com



The Gemini Pro Vision API takes as input data that consists of parts (parts as an array), where each item can be either text or binary data (either embedded or embedded via a URL link)

https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/gemini


We compose our prompt as you might be used to, only with the difference that we also load two images and tell it which is the old and which is the new logo. We will use the a-few-shot learning technique for examples.




Finally, all that's left is to create a function that can load all the slides in a presentation, load all the images in each slide, and then send each image to the Gemini Pro API to see if it's an old or new image. If it's an old image, then replace it directly in the presentation with the new image.
And that's all. Now you just need to run the getSlides() function, which will replace all the old Gmail logos with the new ones. Of course, the script can be modified to go through all your files. Or better yet, to go through all the files in the company through domain-wide delegation

Google Cloud credits are provided for this project
#GeminiSprint

Tuesday, August 23, 2022

List all GCP regions with Google (unofficial) API endpoint

I have several scenarios where I needed to list all GCP regions. (e.g. Cloud Billing API https://cloud.google.com/billing/docs/reference/rest).




I was surprised that there is no API for that. 

When you try to search "list of GCP regions" you will end with one of the top results to documentation e.g. https://cloud.google.com/compute/docs/regions-zones or https://cloud.google.com/about/locations. It is not suitable for programmatic access.

I have found recently an endpoint (not API!) with the list of IP ranges for each GCP regions

https://www.gstatic.com/ipranges/cloud.json


That was the last piece of the puzzle to create my desired function.

Here is a snippet for Google Apps Script: