Docxpresso, upon which the DXCloud REST API is built, is a general tool to generate online reports and business documentation in PDF, ODF, Word and RTF formats that, among other possibilities, allows for the merging, insertion and styling of its contents using HTML5 and CSS.
Even if you can generate with Docxpresso pretty sophisticated documents using HTML5 code, Docxpresso is not meant as a tool to exactly reproduce an arbitrary web page. If nice formatting is a must you may need to carefully craft the HTML and CSS code so you render a version of the web page that better adapts to the limitations of paged media.
As it is described in the explanation of the general "replace" property there are two main ways to merge data using a template variable, and they both apply in the particular case of HTML content:
In order to better support some general document components not covered by the HTML5 standard Docxpresso extends it in a simple way to include:
Moreover we have adapted to what we considered as their natural "paged document" equivalents some of the standard HTML5 tags like:
This also applies to a few CSS properties that have been reinterpreted to better suite the needs of standard paged media.
Let us get now deeper into details.
Before getting into the details let us do an example that will cover much of the ground that will be explained in detail in what follows.
Let us first consider the following HTML code:
<html>
<head>
<style>
p {font-family: Verdana; font-size: 10pt}
h1 {color: #b70000; margin-bottom: 12pt; font-family: "Century Gothic"; page-break-before: always}
footnote {font-family: Verdana; font-size: 8pt}
.headerTable {width: 15cm; border: none}
.headerImage {width: 5cm}
.headerTitle {width: 10cm; vertical-align: middle;}
.headerTitle p {font-size: 12pt; font-weight: bold; color: #567896; font-family: "Century Gothic"}
.docFooter {border-top: 1px solid #777; color: #555; text-align: right; font-family: Verdana; font-size: 10pt}
.red {color: #b70000; font-size: 8pt; font-weight: bold}
</style>
</head>
<body>
<h1>Sample document generated with HTML5</h1>
<p>This example only aims to illustrate how <strong>HTML5PDF</strong> renders a sample
web page in different document formats.</p>
<p>We include a footnote<footnote>Just some random text with a
<span class="red"> little formatting</span>.</footnote>
and a simple pie chart so we get a little more sophisticated example:</p>
<chart type="pie" style="width: 15cm">
<legend />
<category name="First" value="30" />
<category name="Second" value="20" />
<category name="Third" value="25" />
<category name="Fourth" value="10" />
</chart>
<h1>Another page</h1>
<p>This is just to check how the header and footer are included in every
single page with the correct page numbering.</p>
</body>
</html>
This just looks like ordinary HTML code it it were not for a few weird tags like chart, footnote or page.
Let us see what we can achive with a basically empty template with just one variable {{content}} and the following JSON code:
{
"template": "insert here the given template base64 encoded",
"output": "odt",
"replace": [
{
"vars": [
{
"var": "content",
"value": ["insert here the previous HTML code with quotes escaped"],
"block-type": true
}
]
}
]
}
The parsed HTML5 tags include:
The parsed CSS properties include:
Here we include for your convenience a few commented Docxpresso scripts that mainly use HTML5 + CSS to generate standard document elements.
It is very simple to generate nice tables from plain HTML (this time we will omit the explicit JSON code that is exactly the same that in the first example):
<html>
<head>
<style>
body {font-family: Calibri; font-size: 11pt}
.niceTable {border-collapse: collapse}
.niceTable td {border: 1px solid #657899; padding: 2px 5px; width: 5cm; margin: 0}
.niceTable th {vertical-align: bottom; border-bottom: 1px solid #657899 !important; padding: 2px 5px; width: 5cm; font-weight: bold; margin: 0}
.niceTable th.firstCol {font-style: italic; border: none; text-align: right; background-color: white}
.niceTable td.firstCol {font-style: italic; border: none; border-bottom: 1px solid #ffffff !important; text-align: right; background-color: white}
.odd {background-color: #d5e0ff}
</style>
</head>
<body>
<p>Just a nicely formatted table:</p>
<table class="niceTable">
<tr>
<th class="firstCol">Table title</th>
<th>Column 1</th>
<th>Column 2</th>
</tr>
<tr class="odd">
<td class="firstCol">Row 1</td>
<td class="odd">Cell_1_1</td>
<td class="odd">Cell_1_2</td>
</tr>
<tr>
<td class="firstCol">Row 2</td>
<td>Cell_2_1</td>
<td>Cell_2_2</td>
</tr>
</table>
</body>
</html>
The table formatting is inspired in one of the typical Word table formats. Notice that some of the CSS styles are redundant, this is due to assure nice rendering in all possible formats. The .pdf, .odt and .doc formats do not require, for example, the reiterative inclusion of the odd class attribute in table cells.
Although HTML5 is by all means an amazingly complete standard it has not been designed with paged media in mind, so it lacks certain elements that are common in standard documents:
The Docxpresso extensions of plain HTML5 are thought to remedy, in the most possibly simple way, those deficiencies. What follows is the result of this goal.
Of all the current included HTML5 extensions this is with no doubt the more extensive one.
Before getting into the details of the associated XML schema let us run a simple example that will give you a pretty clear taste of what is going on:
<style>
.centeredText {text-align: center; font-family: Georgia; font-size: 16pt; color: #5689dd}
</style>
<p style="margin-bottom: 20pt">A 3D column bar chart in HTML format:</p>
<p class="centeredText">Sales (K$)</p>
<p class="centeredText">
<chart type="3Dcolumn" stacked="true" style="width: 10cm;">
<legend legend-position="bottom"/>
<component type="floor" fill-color="#fff0f0" />
<series>
<ser name="Europe" />
<ser name="America" />
<ser name="Asia" />
</series>
<categories>
<category name="2010">
<data value="650" />
<data value="470" />
<data value="400" />
</category>
<category name="2011">
<data value="680" />
<data value="540" />
<data value="430" />
</category>
<category name="2012">
<data value="650" />
<data value="600" />
<data value="600" />
</category>
<category name="2013">
<data value="750" />
<data value="640" />
<data value="580" />
</category>
</categories>
</chart>
</p>
Let us see what we can achive with a basically empty template with just one title and the variable {{content}} and the following JSON code:
{
"template": "insert here the given template base64 encoded",
"output": "pdf",
"replace": [
{
"vars": [
{
"var": "content",
"value": ["insert here the previous HTML code with quotes escaped"],
"block-type": true
}
]
}
]
}
We would like to point out that currently the support for HTML charts in Word format output is "incomplete". For that case we recommend to use the replaceChartData option.
In principle one may control from HTML5 code all the customizable components of the Docxpresso chart public API, i.e.
That we pass to explain in further detail.
The <chart> element is the main chart wrapper that accepts the following childs and attributes:
Child elements
Attributes
The <legend> element controls the legend display properties:
Child elements
Attributes
The <axis> element allows to customize the x, y or z (only 3D charts) axis:
Child elements
Attributes
The <grid> element allows to customize the x, y or z (only 3D charts) grid lines:
Child elements
Attributes
The <transform3D> element allows for the customization of the wall and floor components of a chart:
Child elements
Attributes
The <transform3D> element allows to rotate and change the default perspective in 3D charts:
Child elements
Attributes
The <series> element is a wrapper element for the different series that compose the chart data.
This element is not allowed for 2 & 3D pie and donut charts.
Child elements
Attributes
The <ser> element contains the relevant data for a particular chart series.
This element is not allowed for 2 & 3D pie and donut charts.
Child elements
Attributes
The <caegories> element is a wrapper element for the different category elements that compose the chart data.
Child elements
Attributes
The <category> element contains the actual data. Its structure depends on the chart type.
Child elements
Attributes
The <data> element contains the actual value for each data point.
Child elements
Attributes
The general structure of a chart element may be summarized in this sample code:
<chart style="CSS styles"
type="column|bar|pie|donut|area|line|scatter|bubble|radar|filled-radar|column-line|3Dcolumn|3Dbar|3Dpie|3Ddonut|3Darea|3Dline|3Dscatter"
data-label-number="none|value|percentage"
label-position="avoid-overlap|center|top|top-right|right|bottom-right|bottom|bottom-left|left|top-left|inside|outside|near-origin"
label-position-negative="top-left|inside|outside|near-origin"
hole-size="integer"
pie-offset="integer"
angle-offset="integer"
stacked="boolean"
gap-width="integer"
overlap="integer"
percentage="boolean"
chart-interpolation="none|b-spline|cubic-spline"
spline-resolution="integer"
deep="boolean"
solid-type="cuboid|cylinder|cone|pyramid" >
<title color="hexadecimal color"
font-family="string"
font-size="float(pt|cm|in|mm)"
font-weight="normal|bold"
font-style="normal|italic"
stroke="solid|dash|none"
fill-color="hexadecimal color"
opacity="integer%"
stroke-width="float(pt|cm|in|mm)"
stroke-color="hexadecimal color"
stroke-opacity="integer%"
stroke-linejoin="round|bevel|middle|miter|none"
stroke-linecap="butt|round|square" >Title</title>
<legend name="" //only applies to bubble charts
legend-position="left|right|top|bottom"
color="hexadecimal color"
font-family="string"
font-size="float(pt|cm|in|mm)"
font-weight="normal|bold"
font-style="normal|italic"
stroke="solid|dash|none"
fill-color="hexadecimal color"
opacity="integer%"
stroke-width="float(pt|cm|in|mm)"
stroke-color="hexadecimal color"
stroke-opacity="integer%"
stroke-linejoin="round|bevel|middle|miter|none"
stroke-linecap="butt|round|square"/>
<grid dimension="x|y|z"
type="major|minor"
stroke="solid|dash|none"
stroke-width="float(pt|cm|in|mm)"
stroke-color="hexadecimal color"
stroke-opacity="integer%"
stroke-linejoin="round|bevel|middle|miter|none"
stroke-linecap="butt|round|square" />
<axis dimension="x|y|z"
visible="boolean"
logarithmic="boolean"
font-color="hexadecimal color"
font-size="float(pt|cm|in|mm)"
axis-position="start|end"
origin="float"
maximum="float"
minimum ="float"
label-arrangement="side-by-side|stagger-even|stagger-odd"
display-level="boolean"
axis-label-position="near-axis|near-axis-other-side|outside-end|outside-start"
reverse-direction="boolean"
text-overlap="boolean"
line-break="boolean"
stroke="solid|dash|none"
stroke-width="float(pt|cm|in|mm)"
stroke-color="hexadecimal color"
stroke-opacity="integer%"
stroke-linejoin="round|bevel|middle|miter|none"
stroke-linecap="butt|round|square"
interval-major="float"
interval-minor-divisor="integer"
tick-marks-major-inner="boolean"
tick-marks-minor-inner="boolean"
tick-marks-major-outer="boolean"
tick-marks-minor-outer="boolean" />
<component type="wall|floor"
fill-color="hexadecimal color"
opacity="integer%"
stroke="solid|dash|none"
stroke-width="float(pt|cm|in|mm)"
stroke-color="hexadecimal color"
stroke-opacity="integer%"
stroke-linejoin="round|bevel|middle|miter|none"
stroke-linecap="butt|round|square" />
<transform3D rotate-x="integer"
rotate-y="integer"
rotate-z="integer"
right-angled-axes="true|false"
perspective="integer" />
<!-- for pie and donut charts -->
<categories>
<category name=""
value=""
fill-color="hexadecimal color"
opacity="integer%"
stroke="solid|dash|none"
stroke-width="float(pt|cm|in|mm)"
stroke-color="hexadecimal color"
stroke-opacity="integer%" />
<category name=""
value=""
fill-color="hexadecimal color"
opacity="integer%"
stroke="solid|dash|none"
stroke-width="float(pt|cm|in|mm)"
stroke-color="hexadecimal color"
stroke-opacity="integer%" />
<category name=""
value=""
fill-color="hexadecimal color"
opacity="integer%"
stroke="solid|dash|none"
stroke-width="float(pt|cm|in|mm)"
stroke-color="hexadecimal color"
stroke-opacity="integer%" />
</categories>
<!-- for bubble charts -->
<series>
<ser name=""
fill-color="hexadecimal color"
opacity="integer%"
stroke="solid|dash|none"
stroke-width="float(pt|cm|in|mm)"
stroke-color="hexadecimal color"
stroke-opacity="integer%" />
</series>
<categories>
<category>
<data value="" />
<data value="" />
<data value="" />
</category>
<category>
<data value="" />
<data value="" />
<data value="" />
</category>
<category>
<data value="" />
<data value="" />
<data value="" />
</category>
</categories>
<!-- for all other charts -->
<series>
<ser name=""
fill-color="hexadecimal color"
opacity="integer%"
stroke="solid|dash|none"
stroke-width="float(pt|cm|in|mm)"
stroke-color="hexadecimal color"
stroke-opacity="integer%" />
<ser name=""
fill-color="hexadecimal color"
opacity="integer%"
stroke="solid|dash|none"
stroke-width="float(pt|cm|in|mm)"
stroke-color="hexadecimal color"
stroke-opacity="integer%" />
</series>
<categories>
<category name="">
<data value="" />
<data value="" />
</category>
<category name="">
<data value="" />
<data value="" />
</category>
<category name="">
<data value="" />
<data value="" />
</category>
</categories>
</chart>
The <date> tag allows for the insertion of the current date into the document.
A simple example of use is given by the following HTML:
<p>Today is the <date format="('day', '/', 'month', '/', 'year')" />.</p>
<p>If the requested document output is Word or Open Document format the date can be updated by the user.
The actual procedure may depend on the program used to open the file (MS Word, Libre Office, etcetera).</p>
Its corresponding schema is very simple.
Child elements
Attributes
The <endnote> and <footnote> tags allow for the insertion of endnotes and footnotes in the current document.
A simple example of use is given by:
<p>This is a very beautiful<footnote>Beauty is <em style="color:red">in the eye</em> of the beholder.</footnote> document with a footnote.</p>
Child elements
Attributes
The <math> tag allows for the insertion of math equation into the current document in MathML 1.0 format.
A simple example on how to do so is just given by the following HTML code:
<p>A document with some math inserted as extended HTML:</p>
<p style="text-align: center">
<math base-font-size="18">
<mrow>
<mrow>
<mstyle mathvariant="bold">
<mrow>
<mi>A</mi>
</mrow>
</mstyle>
<mo stretchy="false">=</mo>
<mfenced open="[" close="]">
<mrow>
<mtable>
<mtr>
<mtd>
<mrow>
<mi>a</mi>
</mrow>
</mtd>
<mtd>
<mrow>
<mi>b</mi>
</mrow>
</mtd>
</mtr>
<mtr>
<mtd>
<mrow>
<mi>c</mi>
</mrow>
</mtd>
<mtd>
<mrow>
<mi>d</mi>
</mrow>
</mtd>
</mtr>
</mtable>
</mrow>
</mfenced>
</mrow>
</mrow>
</math>
</p>
The support of Math in Word formats (.doc and .docx) is only partial because there are some limitations in the formatting of the equations.
Child elements
Attributes
The <page> tag allows for the insertion of page numbering into the document.
A simple example of use is given by:
<footer style="min-height: 1cm">
<p style="text-align: right; font-weight:bold; color: #777;"> Page <page/> of <page type="count"/></p>
</footer>
<p>A simple document with two pages.</p>
<p style="page-break-before: always">The second page.</p>
Its corresponding schema is given by:
Child elements
Attributes
The <tab> tag allows for the insertion of tabs in the document.
A simple example of use is given by:
<p>
First
<tab position="200" type="right" leader="dotted" />
Second
<tab position="200" type="right" leader="dotted"/>
Third.
</p>
Its corresponding schema is given by:
Child elements
Attributes
The <toc> tag allows for the insertion of a table of contents within the document.
A simple example of use is given by:
<style>
h1{font-family: Arial; font-size: 18pt; color: #b70000; page-break-before: always;}
h2{font-family: Arial; font-size: 16pt; color: #4456ff}
</style>
<p style="font-size: 16pt; font-weight: bold; color: #5566cc;">Table of Contents</p>
<toc>
<outline level="1" style="font-weight: bold" />
<outline level="2" style="color: #000077" />
</toc>
<h1>First title</h1>
<p>Sample text.</p>
<h2>Subtitle</h2>
<p>Another text.</p>
<h1>Second title</h1>
<p>Final text.</p>
Its corresponding schema is composed of two elements:
Child elements
Attributes
Child elements
Attributes