June 19, 2018

Founder Stories: Why We Built Api2Pdf

The year was 2006. I was 21 years old and fresh out of college with a newly minted degree in computer science. While I had acquired a stable job with the government, at night time I would scour the Gigs section on Craigslist to fuel my side hustle. I would reach out to anyone looking for help building software or databases. The plan was to acquire enough customers of my own to begin a full-time software consultancy.

The first major project I landed was PriorArt.com — a patent search and discovery firm. PriorArt was drowning in reams of paper that was the result of thousands of faxes, printed emails, and legal documents. I had to provide a mechanism to store all of this information into a centralized, web accessible, online system. I was naive and had no idea what I was getting myself into with this project or the challenges I would face.

The greatest obstacle to building this “paperless office” centered around PDFs. Every physical piece of paper PriorArt dealt with needed to be converted to a PDF in this online system. The PDFs were heavily used for “Docket Agreements” — mini-contracts or statements of work.

To make things more complicated, one requirement I had to meet was that the system needed to run in a shared hosting environment. The mid-2000s was a transformational time for web development. It was still very much the wild west. Amazon Web Services hit the market, but it would be several years before I would pay much attention to it. All I knew about shared hosting back then was that you would pay $10 per month and you would be allocated some space on that server to run some code. Security and server isolation were distant thoughts.

My first attempt was to use a component called ABCPDF, miraculously still for sale. As with most .NET components back in 2006, ABCPDF required some wrangling to get the DLLs running in a shared hosting environment. EasyHosting, the hosting provider at the time, rubber-stamped my custom DLLs and we were off to the races. PriorArt was generating PDFs.

Barely two months go by and providers started to wise up to the security holes posed by allowing custom DLLs in shared environments. Microsoft issued guidance on a concept called Medium Trust which ensured that neighboring customers on the same server could not interfere with each other. ABCPDF required Full Trust, so that was the end of that.

My backup plan was to run PDF generation on a shared host using PHP. I turned this into an API only accessible to the .NET platform. After a lot of trial and error, I came across https://tcpdf.org/, a library that persists to this very day. At the time, TCPDF, provided HTML to PDF conversion, but did not allow the use of external stylesheets. It was immensely painstaking, but I was able to convert all of my templates into a format suitable for TCPDF.

PriorArt was live and I was hopeful I would never have to go through PDF hell again.

That wish did not come true, but I did succeed in leaving the government and started my software consultant business, vOfficeware and then later nonprofitCMS. In 2009 I was approached by Cambridge Information Group (CIG), a private equity firm that owns several education companies including B2RMusic.com and Sothebysinstitute.com.

My PDF journey continues. I had to generate PDFs of statements and invoices for students and their parents for the online portal of this music school. CIG had their own servers though, on premises. I was back to using ABCPDF with Full Trust and I forgot I had ever faced any PDF problems at all.

Fast forward to 2012. By now my company had launched its own product — an application and review platform that helps organizations run their awards, grants, and scholarships. We rebranded as OpenWater and left consulting work behind.

Part of any applications process requires a form where an applicant submits their information, and will often have the ability to upload files, such as transcripts or letters of reference to produce a full application package.

Once again I was thrust into PDF purgatory. When OpenWater first launched, we were running on Mosso, a cloud-based platform that allowed for Full Trust packages. We decided to go with the open source iTextSharp library to handle our PDF needs. Almost immediately after we launched, we found out iTextSharp was pivoting their licensing model and would no longer be free for commercial use. We stayed on their old version as long as we could until a suitable replacement came out.

WKHTMLTOPDF fell into our laps like a gift from the heavens. It was the rendering engine of Google Chrome and Safari (WebKit) that created reliable PDFs time and time again. By this point too, the idea of renting a virtual machine from a cloud provider became affordable. We were able to swap out iTextSharp with an API call to a secret VM that would sit idle and generate PDFs as we needed it to.

As we expanded OpenWater, we continued to find new ways of linking up to this secret API. Having a VM dedicated to producing PDFs saved us many hours when we had to add PDF functionality on any new project. However, we were headed down a path we would later come to regret.

As the success of OpenWater continued, our PDF needs would increase as well. We went from generating dozens of PDFs per day, to hundreds, to thousands. In 2016, we rolled out a feature to let our users download their data to PDF in bulk. These were not your average PDFs. In many cases, the PDFs could be thousands of pages long. We saw our PDF usage explode. Tens of thousands of PDFs were pulled daily! Our single VM dedicated to producing PDFs was getting crushed, would shut down and require restarts. With so many projects depending on this single point of failure, everything was coming to a grinding halt. Customers were upset and could not understand why our system struggled to generate what was a simple PDF.

We were able to temporarily mitigate the issues with some combination of scaling the server up and out. We found we were stable again at around a burn rate of $400 per month to keep this now cluster of VMs running.

This reprieve was short-lived. We rolled out the ability to include images and merge multiple documents into a single PDF. With the increase in both PDF size and raw processing power, our PDF servers could not keep up with the load. We could not even get them to respond to a reboot request.

We scaled up to the max, and now our PDF generation costs surged to nearly $4,000 per month, but the inundation kept coming.

A few customers ended up being a chokehold for our entire platform’s PDF needs. We had no choice but to cut off these customers from the PDF engine. To help these customers out, we had to work through the night to generate their PDFs using our local computing resources so they could make their deadlines.

Our customers have come to expect reliable PDF generation and do not understand what goes on behind the scenes to produce massive PDF files at scale. Admittedly, when we told our friends about Api2Pdf.com, their first reactions were “I can just print to PDF, how hard can it be?” We knew that $4,000 per month was not sustainable for a steady state server that will be used sporadically for PDF generation.

Over coffee, my colleague and Api2Pdf co-founder, Zack Schwartz, proposed the idea of using Amazon Web Service’s Lambda service for our PDF generation needs. On paper it seemed like the perfect solution. AWS Lambda would give us computational power for each PDF request, independent from the next request. If we have 1,000 people needing PDFs all at once, AWS Lambda would give us the resources we need on the spot.

Over the 2017 winter holidays, we set out to rebuild our PDF virtual machine cluster into a set of AWS Lambda functions and roll them out before our January busy season kicks in. If we could build a drop-in-replacement for WKHTMLTOPDF, we could always just revert back to the $4,000 per month solution.

Throughout January 2018 — May 2018, we continued battle testing our lambda functions. The result worked out way better than we could have ever expected. Our PDF generation stayed up throughout the entire busy season and our costs dropped from $4,000 per month to a range between $31 and $61. We generated several hundred gigabytes worth of PDFs.

By March, we realized we were on to something. If we put a bit more effort, we could offer a tool to all developers at a nominal cost. PDF Generation really sucks — there are a lot of great resources out there and a lot of free ones for those developers who have the DIY mindset. We wanted to create a tool for those who just don’t have the time or are overworked and have too many priorities. In the end, customers just want things to work.

Api2Pdf.com launches today and is a wrapper around popular tools most developers already use today:

WKHTMLTOPDF
Headless Chrome
Libre Office
Merge / Concatenate PDFs

Our goal is to simply make it easier to use the wheels that already exist at the lowest cost possible. For about $1.50/month, we think that 90% of developers can have their PDF needs fully met. We also believe our service can scale for the remaining 10% that have heavy duty PDF generation needs.

Cheers!

Founder Stories: Why We Built Api2Pdf

Founder Stories: Why We Built Api2Pdf

Featured:

Elsewhere: