Clean Manuscripts Using Markdown

Modern word processors work with the What You See is What You Get paradigm. While that’s mostly true for simple documents, inconsistencies are often introduced into complex documents. This includes but is not limited to:

  • Formatting differences introduced by Copy and Paste operations.
  • Different styles applied throughout. E.g. Some paragraphs are 1.2 line spaces, while others are set to 1.15 line spaces.
  • Selection errors. E.g. The user selected more than expected while formatting text, such as Italics.

Some of these examples may go undetected until the work is published. These inconsistencies further complicate fault finding an EPUB, as they lead to additional styles and tags being introduced into the code.

Markdown is a human markup language that permits users to use formatting like Bold, Italics, Monospace, and Strike Through without any knowledge of the underlying code. This simple syntax can be leveraged to generate clean manuscripts.

Sample Markdown Text

# Clean Manuscripts Using Markdown

Modern word processors often subscribe to the [What You See is
What You Get](http://en.wikipedia.org/wiki/WYSIWYG) paradigm.
While true for simple documents, inconsistencies are easy to
find in complex documents. These include but are not limited to:

* Formatting differences introduced by _Copy_ and _Paste_
  operations.
* Different styles applied throughout. E.g. Some paragraphs are
  1.2 line spaces, while others are set to 1.15 line spaces.
* Selection errors. E.g. The user selected more than expected
  prior to formatting text, such as _Italics_.

[Markdown](http://en.wikipedia.org/wiki/Markdown) is a human
markup language that permits users to use formatting like
**Bold**, _Italics_, `Monospace`, and ~~Strike Through~~ without
any knowledge of the underlying code. The simple Markdown’s
syntax can be leveraged to generate _clean_ manuscripts.

Fortunately, Google Docs offers extensions that converts documents to and from Markdown on demand. The overall process is straightforward:

  • Convert the manuscript to Markdown.
  • Confirm the formatting.
  • Create a new document with the desired styles.
  • Import the Markdown manuscript into the new document.
  • Export to other formats as needed.

Note

There are a myriad of tools that offer this functionality. There are even plug-ins available for Microsoft Word that converts back and forth from Markdown. Given the ease of access, this tutorial focuses on a Google Docs specific solution.

The Details

Markdown strips out complex formatting but maintains those typically found within manuscripts. Once imported back into Google Docs, the default styles are applied to create a clean document.

The following table outlines CSS/XHTML code generation using Calibre based on various sources.

SourceCSS StylesStyle TypesHMTL Tags
Microsoft Word395276
Google Docs (ODF)264300
Google Docs (DOCX)344302
Markdown121153

The more CSS/XHTML tags you have, the more complex the EPUB and chances for error. In addition:

  • The tags generated have no meaningful names; they are assigned in the order in which they were processed. 
  • Tags may overlap one another, making it difficult to determine which set of tags to correct.
  • Inconsistencies sometimes apply to a narrow area of the manuscript.
  • Some sources will naturally yield complex conversions.

The above makes altering the document, fault-finding and debugging more complex. Since Markdown generates a minimalist EPUB, you’ll end up with a consistent product that can be easily adapted to any style you choose.

Sample Code From an EPUB

<body class="c">

  <div id="id.3ygebqi" style="height:0pt"></div>

  <h1 class="c11" id="calibre_pb_124"><span class="c12">
   LICENCE</span></h1>

  <p class="c13"><span class="c30">The Portrait </span>
  <span class="c15">is a work of fiction. All characters,
  organisations and events appearing in this work are
  fictitious. Any resemblances to reality are purely
  coincidental.</span></p>

  <p class="c13"><span class="c15">All images and text herein
  are </span><span class="c38">copyrighted</span>
  <span class="c15">:</span></p>

Convert to Markdown

To begin, use the Docs to Markdown extension on Google Docs to:

  • Open or import the document to clean. 
  • Select Extensions, Docs to Markdown, and Convert. A side panel will appear.
  • Check on Suppress info comment and Use reckless mode (no alerts) to reduce the number of generated comments.
  • Click on Markdown to generate the markup.
  • Copy and Paste the output into a text editor like Notepad (Windows), or vi (BSD/Linux/Unix). 
  • Save the file with a .md extension.

Confirm the Markup

Since the conversion process is rarely perfect, it’s recommended that you:

  • Open the file created with a .md extension.
  • Remove XHTML comments. I.e., <!-- and -->
  • Confirm header elements H1 (#), H2, (##), or H3 (###) are correct.
  • Remove additional line spacing.
  • Confirm the application of formatting:
    • Bold (**)
    • Italics (_)
    • Strikethrough (~~)
    • Monospace (`)

Note

Remember to save your sanitised Markdown document.

Here is an example of an inconsistency related to the use of Italics:

Inconsistencies with Italics

Clara expected that her acquaintance would surrender to its
power tonight. _You never know, she might prove to be a
valuable ally_.

“_In my day, werewolves were considered a disease_,” Ethereal said
in that distinctive voice that sucked the life from a room.
“_Remind you of anyone else?_”

I keep punctuations outside of italics. However, the last line encompasses the question mark entirely. This might be missed during the review process and be noticed by readers after release.

Here is an example hidden formatting using Bold.

Hidden Formatting

Evelyn Chartres** **(_Nom de plume_)

The space between Evelyn Chartres and (Nom de plume) is bolded. Introducing a change between those elements could bleed bolded text.

Subtitles are not supported by Markdown and show up as another header. It is recommended that you:

  • Give them a lower header level. For example:
    • Chapter Titles. H2 (##)
    • Subtitles. H3 (###)
  • Remove any bookmarking elements for H3. E.g. {#subtitle}

Note

Remove H3 elements when the Table of Contents is generated.

Create a Document and Set Defaults

At this stage, you have four options:

  • Use the Google Docs defaults.
  • Use an existing Google Docs document with defaults in place.
  • Create a new Google Docs using a predefined template From template gallery.
  • Create a new Google Docs document and establish defaults.

To set up your own defaults, it’s easiest to create a skeleton of the document. From there you can:

  • Highlight the text matching your Style.
  • Select the Style you want to adjust from the Styles drop-down menu.
  • Click on Update ‘Style’ to Match.

This includes but is not limited to:

  • Fonts. Font type, size, and formatting.
  • Paragraph. Line spacing, indentation, and justification.

Additionally, you may wish to adjust:

  • Page attributes.
  • Default language.

Once the defaults are established, move on to the next step.

Import from Markdown

Importing from Markdown can be done in several ways:

This tutorial focuses on the Google Docs Markdown to Docs (GdocifyMd) add-on. This tool imports Markdown formatted documents and uses existing styles to ensure conformity.

Note

Google Docs does have support for limited Markdown formatting. However, Copy and Paste operations that include Markdown are ignored. I.e., It only works if you type in the markup yourself.

To import Markdown formatted text back into Google Docs, you must:

  • From the Extensions menu, select Markdown to Docs (GdocifyMd) and click on Gdocify Markdown to open new window.
  • Copy and Paste content into the area marked with Markdown content here.
  • Click on the Gdocify! Button.
  • Newly formatted content will appear in the background.
  • Close the GDocifyMD window.

Note

This add-on is limited to fifty (50) conversions per month. 

The document can be Downloaded in any format Google Docs supports. This includes Microsoft Word documents to prepare a manuscript for print, or for use by Kindle Create to generate a Kindle Ebook.

That’s it!

CC BY-SA 4.0 Clean Manuscripts Using Markdown by Evelyn Chartres is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.



Comments

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Search