To create an AI agent that can fill out forms using web pages, PDFs, and text sources using Python, you can use:
-
Ollama for local LLM inference (e.g., LLaMA3.2).
-
LangChain or custom agents for orchestrating form-filling tasks.
-
PDF/text parsing via libraries like
PyMuPDF
,pdfplumber
, ordocx
. -
Web form filling using
Selenium
. -
Multimodal context support (text + form structure) to extract and align relevant information.
π§ Objective
Create a Python agent that:
-
Reads content from PDFs / web / plain text.
-
Understands and extracts the relevant data.
-
Matches it to form fields using prompt-based reasoning.
-
Uses Ollama with a LLaMA3.2-based model (e.g.,
lemma
) to infer the correct values. -
Automatically fills a web form using
Selenium
.
π¦ Prerequisites
pip install selenium requests flask langchain pdfplumber ollama
You’ll also need:
-
A local Ollama instance with LLaMA3/lemma model running:
ollama run lemma
-
ChromeDriver installed (for Selenium).
π§ͺ Sample Setup
Here’s a minimal Python setup to create such an agent:
π agent.py
import json
import pdfplumber
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
import time
import requests
OLLAMA_URL = "http://localhost:11434/api/generate"
MODEL_NAME = "lemma" # or llama3
def extract_text_from_pdf(pdf_path):
text = ""
with pdfplumber.open(pdf_path) as pdf:
for page in pdf.pages:
text += page.extract_text() + "\n"
return text
def extract_form_fields(driver):
fields = {}
elements = driver.find_elements(By.XPATH, "//input[@type='text' or @type='email' or @type='number' or @type='date']")
for element in elements:
try:
label = element.get_attribute("name") or element.get_attribute("id")
if label:
fields[label] = element
except Exception:
pass
return fields
def ask_ollama_to_fill(form_labels, context_text):
prompt = f"""
You are a helpful assistant. Given the following text:\n\n{context_text}\n\n
Fill out this form with these fields: {list(form_labels)}.
Respond in JSON format with field names and their values.
"""
response = requests.post(OLLAMA_URL, json={
"model": MODEL_NAME,
"prompt": prompt,
"stream": False
})
data = response.json()
try:
filled_data = json.loads(data['response'].strip())
return filled_data
except:
print("Could not parse JSON. Response:", data['response'])
return {}
def fill_web_form(url, context_text):
options = Options()
options.add_argument("--headless=new")
driver = webdriver.Chrome(options=options)
driver.get(url)
time.sleep(2)
form_fields = extract_form_fields(driver)
field_labels = list(form_fields.keys())
filled_data = ask_ollama_to_fill(field_labels, context_text)
for label, value in filled_data.items():
if label in form_fields:
form_fields[label].send_keys(str(value))
print("Form filled. Pausing for review...")
time.sleep(10)
driver.quit()
if __name__ == "__main__":
# Example source: PDF or text
source_text = extract_text_from_pdf("sample_resume.pdf")
# Or use plain text: source_text = open("info.txt").read()
# Example web form URL (local or online)
form_url = "http://example.com/form"
fill_web_form(form_url, source_text)
π Example Prompt Handling
Input PDF Text:
Name: Ramesh Kumar
Email: ramesh@example.com
Phone: 9876543210
DOB: 1995-01-15
Form Labels (HTML fields):
["name", "email", "phone", "dob"]
Ollama Output:
{
"name": "Ramesh Kumar",
"email": "ramesh@example.com",
"phone": "9876543210",
"dob": "1995-01-15"
}
✅ Enhancements
-
Add support for
textarea
,select
, andcheckbox
types. -
Use LangChain Agents with tools like
SeleniumTool
,PDFLoader
, etc. -
Log/Store filled data as JSON for audit.
-
Integrate file upload UI using Flask for real-world apps.
Would you like me to provide:
-
A ready-to-run zip file for this?
-
A Flask frontend where users can upload PDF/text and see the form filled?
Let me know how you'd like to expand this.
No comments:
Post a Comment