Use OpenAI's Assistants to build an app in 15 mins

Tuesday, November 14, 2023

Our team attended OpenAI's DevDay and like many others, were amazed at all the announcements— we couldn't wait to start building. Today we're sharing our learnings from using the Assistants API, and how you can build your own PDF extraction app in 15 minutes. We'll build a grant eligibility parser (more on that later) but the same principles can be applied to any PDF extraction.

👉 To hear more about how AI can accelerate the climate transition, check out our CEO's talk on OpenAI's DevDay main stage.

What are we building?

Today we're building a grant eligibility parser using the Assistants API. If you've never applied to a federal grant this problem likely feels trivial. Let me explain.

The Problem

Climate companies need grants to survive. And we need them to survive to solve the climate crisis. Today, applying to a grant is painful as it means writing hundreds of pages of content. The pain starts before you even start applying, however: finding which grants you are eligible for is a process that can take hundreds of hours of navigating through PDFs. The funding opportunity announcements (FOAs) are over a hundred pages long, with details about who is eligible sprinkled across the pages.

tldr: It can take hours to establish if you’re a good fit for a federal grant program.

The Solution

Using OpenAI’s new Assistants API I built a tool that extracts this information in just a few seconds, which quite literally can save someone hours of research.

For context, working on similar problems prior to the Assistants API often required the following process:

  1. Chunk the PDF into a series of vectors and store these in a vector DB like Pinecone or Weaviate.

  2. Find the right queries for this vector DB to find relevant snippets for eligibility requirements, and use these queries to extract the snippets.

  3. Pass these snippets to an LLM call manually, crossing your fingers that the snippets are correct, and that the model will siphon the correct information out of them.

This pattern is relatively common among modern AI apps, and under the hood, it’s likely that OpenAI’s assistants are performing this retrieval-augmented-generation (RAG) in a similar manner. The benefit, of course, is that using these assistants, we don’t have to worry about the implementation of this RAG pipeline at all, and we can focus purely on building the right prompts to extract the relevant information from our documents.

For the sake of simplicity, we’ll be building an app with only one workflow, where a user loads the page, enters a direct download link to the grant PDF (more on this later), and we parse out the criteria. The stack we’ll use for this will also be quite simple, with the whole thing living in a NextJS monolith (sorry Python-enjoyers) using TypeScript and Tailwind throughout.


Before we start: Terminology and the lifecycle of an OpenAI Assistant.

If you’ve used LLM style agents before, that’s a helpful starting point for thinking about assistants; you can think of them as a tailored instance of a model with a more durable lifespan than a standalone completion/chat call.

  • Assistants live for as long as you need them to, and you can continuously interact with them by creating threads

  • Threads represent a single “conversation” between you (or your code) and the assistant. You create a thread with a handful of messages, run the thread, get output back, and continue on as needed.

  • When you run a thread, you’ll get the ID of that run ********back as an output. At any point in time this run will have a specific status and relevant data depending on that status. Fore example, it could be in_progress while the LLM is running, it could requires_action if the run is paused awaiting more input, or cancelling if you’ve called the run off. The full lifecycle of an assistants run is as follows:

source: https://platform.openai.com/docs/assistants/how-it-works/runs-and-run-steps

Getting set up

Clone/deploy a fresh copy of Next.js Boilerplate and install the openAI NodeJS package with npm install openai.

Once you have these installed, simply npm run build to build the project, then npm run dev to get everything started. Navigate to http://localhost:3000 in your browser and your bootstrapped app should be up and running!

Building the shell of our app

We’ll be brief here as we’re mostly interested in the OpenAI side of this project, so feel free to clear out Vercel’s boilerplate and replace page.tsx with the following basic UI

"use client"
import { useCallback, useState } from "react"

export default function Home() {
  const [pdfLink, setPdfLink] = useState<string>("")
  const [result, setResult] = useState<string>("")
  const [isLoading, setIsLoading] = useState<boolean>(false)

  const getEligibility = useCallback(async () => {
    if (isLoading) return
    setResult("Loading...")
    if (!pdfLink) {
      setResult("Please enter a link to a PDF")
      return
    }
    setIsLoading(true)
    // Our LLM call will go here.
    setIsLoading(false)
    setResult("")
  }, [isLoading, pdfLink])

  const copyToClipboard = useCallback(() => {
    navigator.clipboard.writeText(result)
  }, [result])

  return (
    <div className="flex min-h-screen w-full justify-center">
      <div className="flex flex-col items-start w-2/3 space-y-8 mt-8">
        <span className="text-3xl">Grant eligibility parser</span>

        <input 
          className="border border-gray-400 p-2 rounded"
          type="text"
          placeholder="Link to PDF"
          onChange={(e) => setPdfLink(e.target.value)}
          value={pdfLink}
        />
        <button
          className="border border-gray-400 p-2 rounded bg-blue-500 text-white flex items-center h-12 disabled:opacity-50 disabled:cursor-not-allowed"
          disabled={isLoading}
          onClick={getEligibility}
        >
          Get eligibility!
          {isLoading && <LoadingAnimation />}
        </button>
        <div className="flex flex-col w-full space-y-2">
          <button className="border border-gray-400 p-1 rounded bg-white0 text-gray-600 w-40"
            onClick={copyToClipboard}
          >
            Copy to clipboard
          </button>
          <textarea
            className="border border-gray-400 p-2 rounded"
            placeholder="Eligibility"
            value={result}
            onChange={(e) => setResult(e.target.value)}
            rows={20}
          />
        </div>
      </div>
    </div>
  )
}

const LoadingAnimation = () => {
  return (
    <div role="status">
        <svg aria-hidden="true" className=" ml-2 w-6 h-6 text-gray-200 animate-spin dark:text-gray-600 fill-white" viewBox="0 0 100 101" fill="none" xmlns="http://www.w3.org/2000/svg">
            <path d="M100 50.5908C100 78.2051 77.6142 100.591 50 100.591C22.3858 100.591 0 78.2051 0 50.5908C0 22.9766 22.3858 0.59082 50 0.59082C77.6142 0.59082 100 22.9766 100 50.5908ZM9.08144 50.5908C9.08144 73.1895 27.4013 91.5094 50 91.5094C72.5987 91.5094 90.9186 73.1895 90.9186 50.5908C90.9186 27.9921 72.5987 9.67226 50 9.67226C27.4013 9.67226 9.08144 27.9921 9.08144 50.5908Z" fill="currentColor"/>
            <path d="M93.9676 39.0409C96.393 38.4038 97.8624 35.9116 97.0079 33.5539C95.2932 28.8227 92.871 24.3692 89.8167 20.348C85.8452 15.1192 80.8826 10.7238 75.2124 7.41289C69.5422 4.10194 63.2754 1.94025 56.7698 1.05124C51.7666 0.367541 46.6976 0.446843 41.7345 1.27873C39.2613 1.69328 37.813 4.19778 38.4501 6.62326C39.0873 9.04874 41.5694 10.4717 44.0505 10.1071C47.8511 9.54855 51.7191 9.52689 55.5402 10.0491C60.8642 10.7766 65.9928 12.5457 70.6331 15.2552C75.2735 17.9648 79.3347 21.5619 82.5849 25.841C84.9175 28.9121 86.7997 32.2913 88.1811 35.8758C89.083 38.2158 91.5421 39.6781 93.9676 39.0409Z" fill="currentFill"/>
        </svg>
        <span className="sr-only">Loading...</span>
    </div>
  )
}


This should leave us with a basic page like so:

Beautiful, right? (I am not good at the visual part; usually our founding designer Deo tells me how to make things pretty, but for some reason he didn’t want to spend his Sunday evening parsing eligibility with me 🙁).

For our backend, we’ll just build one API route, create a file route.ts under the path app/api/parseEligibility, this will automatically set up our endpoint using NextJS’ new app router. Here we can set up a basic POST handler that takes in a pdfLink from the request body and handles the parsing from there!

One thing to mention is that currently openAI’s file API requires the file to actually be passed to it in order to be properly uploaded - this means we’ll be downloading the files locally on our server (temporarily) so we can get them into OpenAI. There’s likely an elegant way to stream the file content directly from your frontend into OpenAI - but that’s a discussion for another day.

With that in mind, our initial API endpoint looks something like this:

export const maxDuration = 300 // Increase since assistant's take a while.

import { NextResponse, type NextRequest } from 'next/server'
import https from 'https'
import path from 'path'
import fs from 'fs'
import { OpenAI } from 'openai'

export const POST = async (request: NextRequest): Promise<NextResponse> => {
  const reqBody = await request.json()

  const pdfLink = reqBody.pdfLink

  const openAI = new OpenAI({
    apiKey: process.env.OPENAI_API_KEY
  })

  const filePath = await downloadFile(pdfLink, 'grant.pdf')

   // Upload file to OpenAI here

   deleteFile(filePath)

  return new NextResponse
}

// Downloads a file directly to "servers" /tmp directory.
const downloadFile = async (
  grantUrl: string,
  fileName: string, // Must include extension.
): Promise<string> => {
  // Wrap in promise so we can use async await
  return new Promise((resolve, reject) => {
      https.get(grantUrl, (res) => {
          // Must be in /tmp for Vercel write access.
          const pathName = path.join(`/tmp/${fileName}`)
          const filePath = fs.createWriteStream(pathName)
          res.pipe(filePath)
          filePath.on('finish', () => {
              filePath.close()
              resolve(pathName)
          })
          filePath.on('error', (err) => {
              fs.unlink(pathName, () => {
                  reject(err)
              })
          })
      })
  })
}

const deleteFile = (filePath: string): boolean => {
  fs.unlink(filePath, (err) => {
      if (err) {
          return false
      }
  })
  return true
}

Now we can upload our file to OpenAI with a simple call:

const file = await openAI.files.create({
  file: fs.createReadStream(filePath),
  purpose: 'assistants',
})

With this file, ready, we can finally set up our assistant!

Building the assistant

Assistants are remarkably easy to spin up - the only required parameter is the model type. In our case, we’ll also pass in a name and description (mostly for readability), initial instructions (which can be thought of like the system prompt in a normal chat/completions call, a set of tools for our assistant to use, and the ID of the file we’re looking into. Additionally, we could add additional metadata to our assistant, but we’ll leave that out for now. Our basic assistant looks like this:

assistant = await openAI.beta.assistants.create({
    name: 'Eligibility creator',
    description:
        'Parses the eligibility from a grant PDF into a ' +
        'structured format.',
    instructions:
        'You are an expert grant consultant hired to ' +
        'parse the eligibility criteria of a grant into a structured ' +
        'format. You will be given a grant PDF and asked to find the ' +
        'attributes that an organization must have and not have to ' +
        'qualify for the grant. You will also be asked to identify the ' +
        'types of organizations that are eligible to be the prime ' +
        'applicant and the types of organizations that are eligible to ' +
        'be a sub applicant.',
    model: 'gpt-4-1106-preview',
    tools: [
        { type: 'retrieval' },
        {
            type: 'function',
            function: checkEligibilityFunction,
        },
    ],
    file_ids: [file.id],
  })
  • Note that the retrieval tool is what allows our Eligibility Creator assistant to search through the grant file we’ve passed in, and the checkEligibilityFunction we pass in allows us to define a structured output for the calls we’ll make to our assistant.

Important: Why are we using a function call?

OpenAI function calling allows you to describe functions, and the model you call will “intelligently choose to output a JSON object containing arguments to call one or many functions” (Function calling). Often this allows OpenAI calls to return the necessary arguments for a function call that your code will run after the call is complete, but in our case, we’re using the function call purely for the consistency it can “guarantee” us in our results. This has important implications that we’ll talk about later on.

With that, our assistant is built, so all that’s left to do is create a thread with our actual prompt and get the results!

Creating and running a thread

Creating threads is again, very simple, there’s no required arguments - just pass in the user messages you want (along with the relevant file ID’s) and let the AI do its magic. In our case this looks like…

const thread = await openAI.beta.threads.create({
    messages: [
        {
            role: 'user',
            content:
                'Return the eligibility criteria for this grant. ' +
                'Here are some helpful tips: ' +
                '- Prime applicants are the main applicants for a grant. ' +
                'They are able to qualify by themselves. ' +
                '- Sub applicants are the secondary applicants for a grant. ' +
                'They usually do not qualify by themselves, but can ' +
                'qualify if they are paired with a prime applicant. ' +
                '- Each qualifier is an actual requirement that the ' +
                'applicant must meet to qualify for the grant. ' +
                '- Disqualifiers are things that prohibit the applicant ' +
                'from being eligible for the grant. ' +
                'Parse it out of the overall document and MAKE SURE you ' +
                'Return it according to the format in the ' +
                'checkEligibility function.',
            file_ids: [file.id],
        },
    ],
  })
  • Note that you can’t pass AI or system messages to a thread, this is because thread’s are designed to be an actual back and forth between users and the AI, so the AI messages that would normally be programmed into a completions call instead come from the model itself.

With the thread created, we can run it with

let run = await openAI.beta.threads.runs.create(thread.id, {
    assistant_id: assistant.id,
  })

Note that the run returned here is not finished- since threads are effectively long running operations, we actually have to continuously poll on this run until the result shows up - OpenAI plans to change this soon, but as of writing, this is the only way. Let’s create a simple polling loop:

let runResults
  while (!runResults) {
      // Poll results every second.
      await new Promise((r) => setTimeout(r, 1000))
      run = await openAI.beta.threads.runs.retrieve(thread.id, run.id)
      if (run.status !== 'in_progress') {
          runResults = run
          break
      }
  }

When this loop is finished, we know our runResults variable is populated, so the last part of our job is to get the function output from our results and return it to our frontend app.

Note: If you’re files and assistants don’t exist in persisted storage, it’s a good idea to delete them after use with

await openAI.files.del(file.id)
await openAI.beta.assistants.del(assistant.id)


Getting our results

As mentioned earlier, we instructed the model to output to a function call - which means instead of being in a completed state, our assistant should actually be in a requires_action state, since it expects us to call the function and come back to it with the output of that. Since we’re not really going to call a function with this, we’re just going to parse out our function call and return it:

const requiredAction = runResults.required_action
const toolCalls = requiredAction.submit_tool_outputs.tool_calls
const criteriaCall = toolCalls[0]
return NextResponse.json({criteria: criteriaCall}, { status: 200 })
  • In our production system we have lots of checks to validate the output before we return it, check out the complete example here.

And that’s it for our assistant API endpoint! Easy, right? Now all that’s left is to call this endpoint from out frontend and output the result to users. Our getEligibility function from earlier now looks like this:

const getEligibility = useCallback(async () => {
    if (isLoading) return
    setResult("Loading...")
    setIsLoading(true)
    const res = await fetch("/api/parseEligibility", {
      method: "POST",
      body: JSON.stringify({ pdfLink }),
    })
    setIsLoading(false)

    if (!res.ok) {
     // Handle errors
    }

    const resJSON = await res.json()
    const criteria = resJSON.criteria.function.arguments
    const JSONCriteria = JSON.parse(criteria)
    const eligibility = JSONCriteria.eligibility
    const prettyCriteria = JSON.stringify(eligibility, null, 2)
    setResult(prettyCriteria)
  }, [isLoading, pdfLink])
  • Our criteria arguments come back as a string of JSON, so we have to parse this out, take the criteria, and then we prettify it as a string again before returning it to the user.


Wrapping up

With our system fully working, we’ve now built out a fully-fledged assistant to find our eligibility criteria. For the full source code, including error handling and rate limiting, see the original repository. We’ve also decided to host the app we built for free at https://grant-eligibility-peach.vercel.app/, you’re welcome to use it!


Things we’re still figuring out

  • The advantage of OpenAI’s assistants is that so much of the complex RAG implementation is hidden from us as users. This is also a disadvantage, since, as one piece of a large and complex system, it’s important to know how your dependencies are working in order to properly extend and debug them.

  • How this pattern can help us build the rest of our upcoming grant discovery tool - we’re building software to help dynamically find grants that are a great fit for our users. Eligibility is one part of this, but there’s so much more to consider - how to properly represent a company’s data such that it can be matched with grants, how to build an accurate and up-to-date database of all grants etc. etc.

Anyhow, putting this together this was a fun way to explore assistants while building a useful tool, but it does a relatively poor job of addressing the underlying challenge of finding the right grants, and winning them. That’s why we’re building Streamline - a platform to accelerate the discovery, drafting, and and management of grants, RFPs, permits and more.

Sound interesting? Reach out to me at eric@streamlineclimate.com to learn more.