I’ve long wanted to archive a whole blog to a book – not this blog, but a different private one. I used the platform to blog about family stuff and post pictures and having a permanent record of it, both in electronic book form and physical printed form will help ensure it’s longevity. Long after the current WordPress version is a laughable footnote in web history, PDFs will be able to be opened and presumably people will still know what to do with a physical book. So I’ve been wanting to do this for a while but haven’t known where to start. Or even when to do it. Due to the recent Yahoo nuttiness, I had made the plan to switch hosting providers and as long as I was switching providers, now seemed like the time. But how to get started…
If you Google “WordPress Blog Print Book”, you’ll find tons of results that are somewhat relevant. Lots of them were only about how to make an eBook out of the Blog – nice, but only halfway to my intended destination. Many were about how to make a “book” as in a novel or non-fiction tome, not as in a bound thing that contains pages which are effectively prints of your posts. I found some basic Lifehacker kind of stuff about how to print on large paper and make your own physical book at home which is crafty and all, but I wanted something more bookshelf worthy and something that would look somewhat professional.
I was excited to find in the search results a link to a service named Blog2Print. And even more excited when the “Get Started” section asked to point to your WordPress blog. And it even worked on my “self-hosted” blog (i.e. not being hosted at WordPress.com). All great news. I got hung up, though, when I discovered that no comments were included. (Although the service theoretically works with comments, it didn’t find any that were in the blog.) And no categories. (Not a feature of the service.) And you didn’t have really any control over how it appeared. (I wanted to tweak the colors a bit to make it look more like my blog.) I tried to make it work – I really did. But somewhere in the middle, I realized that Blog2Print was an additional service offered by something called SharedBook and when I tried to learn more about what other things SharedBook does, I discovered that it is now known as XanEdu and book printing isn’t really their thing any more. While I might need to revisit Blog2Print, I hoped I would find something better.
So I set out looking for other service options. Blurb said that it would print a blog too, but not WordPress. And frankly, Blurb seemed too geared toward people writing books – the novel and non-fiction type, not the archiving blog types of books. I couldn’t find any other services that offered to do what Blog2Print does – I was hoping for better and I found nothing that even did as much.
Disappointed, I went back to the original Google search and dug deeper looking for other ideas. I eventually hit on the solution I ended up using. Kalin’s PDF Creation Station is a plugin for WordPress that queries the post data from the database and produces a PDF from it. A Table Of Contents is included and links work throughout including links on the Table Of Contents to the pages in the doc as well as links that were in the posts to live web sites. In other words, the PDF it creates is pretty full featured. There’s also a lot of customizability. The author, Kalin, clearly thinks like me when writing code and created a massive set of shortcuts that help you tailor the output to your needs. And yet there were a number of things that didn’t work the way I needed or look the way I wanted in the PDF. Fortunately, this is plugin code, so there’s no black box – easy to modify, right?
Never having done PHP coding before, it took me an hour or so to orient myself in the code and figure out what pieces were doing what. I found the first thing I needed to change pretty quickly. The blog has a number of ghost draft posts lingering in the database and the PDF Creation was picking up all posts. I changed the call to get_posts to work with a post_status of “publish” instead of “any”. Much better.
Next up was the before and after post sections that contained info I didn’t want or info I did but not in the place I wanted it. It was easy to change but the original stuff kept coming back. It took a while before I found the template saving ability lower on the page. The trick to it is that you put what you want in all the various boxes and select the posts you want to write and then you click to save the template. I should mention that there are two pages to run the PDF writing – one from the Tools menu and one from the Settings menu. I used the one from the Tools menu because the ability to save templates was clearer to me there. Oh, and these sections are where I included the settings for font color for the post title and for the secondary string after the title. It isn’t good HTML practice to use inline class definition but I didn’t see a CSS file to use instead, and this isn’t public HTML anyway – seems like the perfect time to cheat.
Now looking at my sample set of posts, I realized that the posts were writing to the PDF in the order I had selected them. That’s advantageous when you are selecting only a few and need a certain order. But I wasn’t sure I’d be able to select all the posts in exclusively the correct order so I added in a sort using “usort”, “strcmp”, and the post_date for the posts. (Thank you, Stack Overflow.) It works a treat.
The next issue for me was the page numbers at the bottom. It shouldn’t have bothered me, but I didn’t like the page numbers always being on the right when everything else on the page was left justified. I think it’s okay to have the page numbers on the outside of the page but since this print would be double sided, the page numbers would need to be on the left then the right. I thought it was easier to just center them. And I want it to be book like, which means no “/pagetotal”. Just the page number please. Unfortunately, none of this is exposed from Kalin’s tool because it isn’t an optional element of TCPDF, the PDF creation tool upon which Kalin’s tool is built. So I had to make some hack changes directly in TCPDF. I updated the definition of “pagenumtxt” and then commented out the block for deciding which side to put the page number on (depending on the status of RTL) and just created a new Cell with “C” as the alignment. Perfect.
Kalin’s tool does allow for comments but I didn’t get the look I was hoping for. And the ability to modify how comments are presented isn’t as robust as how the posts are. So I created my own code for doing comments, querying only those that had been approved. I added a horizontal rule and a couple of paragraphs for each post where the first contained the comment data and the second contained the comment content. It came out pretty nice for something quick and dirty.
At this point, I was feeling pretty good about things. So I expanded my post selection from just the handful of test posts to the entire set of posts. PDF creation failed with “undefined”. Ugh. Looking through the code, I didn’t see any time when the error should have returned with “undefined”. I modified the call to TCPDF in the tool PHP file again so I could confirm that was the line where the crash was occurring but the result after the error statement still was “undefined”. By deselecting posts blocks at a time, I was able to isolate a certain few posts as being culprits. I could get the “undefined” by simply choosing one of those posts. And those posts all had PNG files in them. I then Googled and discovered TCPDF is known to have problems with PNG files. Well that sucks! But fortunately, I only had a handful of posts that had PNG files so I modified those posts to contain JPG files instead (while retaining the links to the original PNG files). It was just a pain to dig through the uploads folder to find PNG files, and then figure out which posts used those files. With the PNG images replaced, I could confirm that the posts that caused the failure before were no longer causing a failure. Sweet.
Back to the whole set – now it works, right? Nope! The error is now at least clear: Allowed memory size of ### bytes exhausted where the # of bytes was some number I don’t remember. It was somewhere around 32M. I searched for how to resolve this one and found a number of ideas but none of them seemed like clear winners other than talking to your hosting provider. Oh, crap, like dealing with Yahoo/Aabaco is going to get me anywhere now. Fortunately, it was about this time the the new hosting provider told me they had been able to move the files and database over. Unfortunately, the domain transfer was still in progress. But back to fortunately, I could hack my computer’s host file to skip domain resolution so that the new location would be the only one seen. That worked and I was able to port over all of my above changes (which I had made after the snapshot for the move) to the new hosting provider. And I am happy to report the memory error went away after switching to the new provider. So another reason to move from Yahoo: limited memory available to a PHP WordPress plugin.
(Actually, for the sake of completeness, I did add the line ‘ini_set(“memory_limit”,”128M”);’ to both kalins_pdf_create.php and tcpdf.php during debugging and I hadn’t removed them. So it’s possible that they are still useful to increasing the memory, or they may actually be unnecessarily restrictive in lowering the memory available from a theoretical max of what the hosting provider allows. Since I wasn’t trying to make this production worthy, I didn’t do lots of comparisons after I got it working – I just went with whatever worked!)
I was finally able to get the whole site saved off to a PDF. Very exciting. 439 pages. But I thought it was important now to go through all of the pages and confirm that I got what I wanted. I’m glad I did because about one third of the way in, a post abruptly cut off. Looking on the site, it looked fine. But when I checked out the post source code, I found a missing close quote in a URL target. So the browser was able to ignore the HTML error but TCPDF could not. Easy fix to the HTML and then the PDF export had no problem with that post – or any others.
It was about this time that I started getting cocky about my capabilities. And that led to me being more picky about how the posts looked. One thing I noticed early on but hadn’t bothered me until now was when two pictures appeared in a post in vertical (portrait) orientation and were aligned side by side with the tops and bottoms matching and a space in between them, they came out in the PDF without the space between them and with the tops and bottoms offset a bit. I looked through the TCPDF code but the part that does the alignment gets too thick for me to follow without a proper debugging environment set up. So I went with the hack/trial and error route in tcpdf.php. Just start adding numbers in places and see what happens! I found that adding 4 to “imgh” took care of the image top alignment issue and adding 5 to “this->x” when it defines where to start the next object after the image is fixed solved the missing space issue. Now that’s a nice looking PDF!
But where to print it!? Again, back to Blog2Print and there’s no facility for using your own PDF while the greater site XanEdu doesn’t seem to offer an option either. I did consider Blurb for this but the cost was pretty high and I still didn’t feel confident that the service was set up for printing the kind of thing I wanted. I ended up settling on Lulu. They too are pretty focused on printing books for reading rather than archive blog content. But they were much cheaper and the print setup process felt more flexible to suit books meant for something other than reading.
I ran into trouble right away, though, when uploading the PDF to Lulu. They wanted the document to be setup for 8.25×10.75 where Kalin’s PDF tool uses the TCPDF default of 8.5×11. So back to the coding! I modified kalins_pdf_create where it creates a new document to include a pagelayout (instead of the undefined default “PDF_PAGE_FORMAT”) that is an array of two values – 8.25 multiplied by 25.4 and 10.75 multiplied by 25.4. The default PDF unit is set up as “mm” so it was important to match that. And finally, for book printing, I realized that an even number of pages is important so I modified the code to add in a blank page if the page count was odd by the end of the process. (Which has to come after adding in the TOC because that could change the odd to even or even to odd.) And now I was able to finally print my final PDF. Awesome.
I need to mention here that even with the changes to the memory, running on the new hosting provider, and the removal of PNG images, I found that I would still get “undefined” as an error result instead of a success about half of the time. However, the PDF did generate correctly. My guess is that the large size of my PDF was simply causing the process to take longer and therefore, if the server was busy doing other things, the process might timeout and be perceived as failure even though it actually was working fine. When server load was light, the process would run fast enough to not cause any timeout issues.
Okay, back to uploading for printing and… Another failure. This time, the issue was the fonts were not embedded. Or for the love of… Back I went to the TCPDF documentation and no clear answer as to how to embed fonts. By the way, the TCPDF documentation goes wide on scope but the details are seriously lacking. The whole documentation is essentially one HTML web page. Examples are listed on a different page and are brief. And formatting is atrocious – really hard to make heads or tales of what you are seeing. Anyway, the only description about embedding is that you can change the degree of embedding on the SetFont calls. I tried it but didn’t get anywhere. I also noticed that I could create a PDF/A which would have been good but got lots of other errors on that – seems like a whole ‘nother can of worms. And I found that I could just use the existing PDF I had, and “print” it to a PDF using the Acrobat creator I usually use to make PDFs. When you print using that, you can select the quality of document created and specify that it embed fonts. One thing to be careful of though is I had to define a new paper size because the document I had worked hard to create with 8.25×10.75 was going to end up printing on regular 8.5×11 virtual paper.
With this new PDF in hand (more than twice as big), I uploaded to Lulu and was able to pass the upload test! Whew. On to the next hurdle. Page count. Limited to 440 pages. My original print was 439 pages plus that extra page to bring the count to an even number and holy crap, I just made it! Good thing I didn’t have one more post on the blog! The last step was getting the cover sorted out. My original intent was that each page be printed with the blog’s theme so that it looked like each page in the book was simply printed from the blog. It would have been nice if each blog post began a page and for those that ran over, to have the style on the subsequent pages be as though the style from the bottom had simply continued. But I had given up on that desire early on. (I actually did look into trying to get background images written through TCPDF but that was a fast dead end.) Now, thinking about a cover, I returned to that idea and figured I’d use the theme as the cover. So I did a screen grab from any random post and pasted the content into Photoshop. I scaled the content to fit the document and deleted the post content. Then I filled around the edges and added in the extra half inch for bleed on top, bottom, and right side. (Not the left because there’s no bleed where the cover meets the bound edge.) For the bound edge and back, I just went with the basic color of the blog which transitions nicely from the front cover. And I got lucky with the cover font discovering that the Bembo, the third in the list, when bolded just happened to be a near exact match for the Theme’s title font.
And with that I clicked order. Can’t wait for the book arrive!