Building a Chatbot Part 5: Challenges & Changes
Nina Cialone shares her journey as she builds a chatbot for Lehigh University's student-run publication, The Brown and White - with no coding background! Follow along as she documents the process.
Welcome back to my journey of building a chatbot for The Brown and White at Lehigh University— long time no see! I know it’s been a while since Part 4, so let me fill you in on what we’ve been up to.
In September, I completed a prototype of the chatbot, which has access to an archive of all The Brown and White’s 6,733 print publications from 1984 through the end of 2022, as well as a full sitemap of the current website.
Next, from October through December of 2023, the chatbot underwent its beta testing phase. Lehigh journalism faculty and editors of The Brown and White queried the chatbot and provided feedback in order to help improve its functionality. As a reminder, this aligned with our original timeline for the project, which we set way back in Part 1. (If you haven’t checked out Parts 1-4 yet, please do so, it’s been quite the journey!)
Steps to Building a Chatbot
Step 1: Review literature on how to build a chatbot without programming experience, and begin a course on building a chatbot without any coding.
Step 2: Discuss with The Brown and White staff/editors what simple queries the chatbot should handle and how to make sure it doesn’t go out of bounds.
Step 3: Build the prototype!
Step 4: Perform a beta test on selected journalism students, The Brown & White editors, and faculty.
Step 5: Revise the model and open the chatbot to a select group before the start of the fall semester.
Step 6: Gather and implement new suggestions that users might have for additional areas the chatbot can service.
Step 7: Create a simple list of prompts that might be useful for users as the chatbot becomes available to the Lehigh community.
Clearly, we underestimated the time required for Steps 3 and 4, since we initially planned for Step 5 to happen around September, but that’s okay! This was a learning experience, and we truly had no idea how the project would turn out when we started. So, we’re currently undergoing Step 5, just one semester late.
While I had thought that after Step 3 the hardest part would be over, I should’ve known that any technology project wouldn’t be complete without a few extra unexpected obstacles.
The Lehigh Digital Archives, where the 6,733 print publications are stored, uses optical character recognition (OCR) to convert images of each archived paper into text. Our chatbot then accesses that text via a sitemap, which directs it to the link of each individual publication. If you think this is a rather roundabout way of accessing the information, you’re right, but unfortunately the other option was to obtain, download and upload 6,733 PDFs individually.
NOTE: When OpenAI announced the release of GPTs, we immediately looked into whether one of the big Generative AI (GenAI) powerhouses had finally released a feature that enabled users to create a custom chatbot by uploading their own information. The answer is yes… kind of. With GPTs, users can upload up to 20 documents, each up to 512 MB, and all files may not exceed 100 GB. Our archives definitely comprise more than 20 documents, and if we wanted to combine pages, that would circle us back to the issue of downloading and consolidating 6,733 PDFs. So, alas, OpenAI cannot yet compete with CustomGPT for our purposes.
However, since we went with the sitemap approach via CustomGPT, sometimes the chatbot would miss a piece of information due to the large quantity of data and the steps required to access it. For instance, you might ask it a question for which you know the answer exists in the archives, but it would tell you that it did not know. While the chatbot excels at answering big picture questions, some more nuanced ones were tripping it up, and we were determined to fix this.
After many conversations with the staff at CustomGPT (who have been enormously helpful!), we’re currently working on using a variety of re-formatting and re-indexing tactics to improve the bot’s functionality from around 70 percent to 85 percent.
So, what are the next few months going to look like? Once we get everything running as smoothly as possible, I’ll work on implementing the chatbot into The Brown and White’s Slack workspace, and I’ll be sure to update you all on that process. I’ll introduce the chatbot to The Brown and White staff, then, as it gets integrated into the newsroom, I’ll collect feedback for future updates. Something that is so cool about this type of project is that it can be ever-evolving!
Finally, something that I’ve already begun working on just pulling from my own experiences, but which will expand further with staff feedback, is a “guidebook” for the chatbot. Like ChatGPT itself, sometimes you have to ask questions a certain way to get the answer that you’re looking for. This guide will include sample prompts, a list of great use cases for the bot, and explain how to work around some types of questions that it may have difficulty answering. By the end of the semester, we hope for the chatbot to become open to the Lehigh community, potentially as a fixture on The Brown and White’s website! As we keep working toward this goal, stay tuned for future updates here at Don't Count Us Out Yet!
Best,
Nina for the Don’t Count Us Out Yet Team