I've been using around 14.9GB of the 15GB Google storage quota for over a year now:
Every time I got really close to the 15GB mark, I'd search Gmail or Google Drive for my largest items, find some pretty useless ones, and then delete them. This worked until about a week ago, when I ran out of obvious candidates for deletion. So, I decided to implement an idea I've had for a while now: a tool that rips attachments out of emails, uploads them as Google Docs/Sheets/Slides (which don't count against the storage quota), and inserts links to the converted documents back in the original email.
After a few days of API-combining and reading about email encoding, the tool is done. I called it gdoctor, because it doctors your emails to use (g)docs. You can see it deployed at gdoctor.psun.me or view the source here. Here's two screenshots to show what it does to emails:
Google Docs is bad at importing
The tool ended up not as useful as I imagined because Google Docs often completely butchers the attachments upon import. gdoctor certainly has its uses, and I personally applied it to all 2000+ attachment-containing emails I've received from one of MIT's job posting mailing lists (and saved about a gigabyte of storage in the process). And you should definitely try gdoctor out if you're curious! It's not dangerous–it only inserts emails and applies a label to the original emails to make them easy to delete; the user does the deleting if he/she wishes. No existing data is modified or removed by gdoctor.
But the poor attachment-to-Google Doc conversion quality means it's only useful for emails whose attachments you don't really care about. The job posting emails I applied gdoctor to are a great use case because they're largely filled with fliers for past events that I both didn't attend and didn't want to attend. The only reason I've even kept these emails around is that I think it'd be neat if a startup that posted to the mailing list later became a $100B company–I'd have a cool piece of history in my inbox.
Anyways, onto import quality. Google Docs seems to only extract the text from PDFs; all images are dropped. The text is often layed out wrong as well, and in general the imported PDF will look almost nothing like the original.
Google Docs also supports importing images, which basically means it creates a blank document and places the image at the top of the page. This doesn't sound very hard, but still sometimes fails and results in a blank document. What's even more hilarious is that Google Docs runs OCR on the image and puts whatever text it finds below the image. I haven't found an option in the API to disable this “feature” but it's pretty entertaining.
An image from an email sent around Halloween got turned into this:
Part of a Goldman Sachs event announcement became:
So in conclusion Google Docs isn't importing files well enough for gdoctor to be used on any emails that are even vaguely important. There's still some things I can improve with gdoctor, though:
- Use Google Photos instead of Google Drive for images? That should hopefully be more reliable
- Better error logging
- Probably just pay Google $20/year for more space