Skip to main content
Skip table of contents

Technical Documentation AsposeRender

Overview

The Aspose Document Conversion Service is a serverless application built using AWS Lambda. It leverages Aspose libraries to convert various document formats into PDF and provides the page count of the resulting PDF files. The project is written in C# with .NET Core 3.1 and integrates with Amazon S3 for file storage and retrieval.

Key Features

  • Converts multiple document formats (e.g., Word, Excel, PowerPoint, images, SVG, EPUB, Visio, etc.) into PDF.

  • Extracts and provides the page count of converted PDF files.

  • Supports integration with AWS Lambda for seamless serverless deployment.

  • Utilizes Amazon S3 for input and output file handling.

Prerequisites

  1. AWS Lambda Setup:

    • AWS CLI configured with appropriate credentials.

    • An existing Lambda function with permissions to access S3.

  2. Dependencies:

    • Install .NET Core 3.1 SDK.

    • Ensure the required NuGet packages are installed (see dependencies below).

Project Structure

The project contains the following key files and folders:

  • Function.cs: The main entry point for the AWS Lambda function. It handles the input, processes the file using Aspose libraries, and saves the output to S3.

  • aws-lambda-tools-defaults.json: Default configuration for Lambda deployment, including function name, role, and runtime environment.

  • Tests Folder: Contains unit tests to validate the functionality of the application

Core Components

1. AsposeRender.cs

This class encapsulates the logic for document conversion. It supports multiple formats and is responsible for:

  • Loading licenses for various Aspose libraries.

  • Converting input streams (files from S3) to PDFs.

  • Extracting page count information for the converted documents.

Key Methods:

  • AsposeLicenses(): Loads the license file for all Aspose libraries.

  • ConvertDocument(): Processes the input stream based on file extensions and converts them to PDF.

Supported formats include:

  • Word documents (.doc, .docx, .rtf, .txt, .xml, .odt).

  • Excel spreadsheets (.xls, .xlsx, .ods).

  • Presentations (.ppt, .pptx, .odp).

  • Images (.bmp, .gif, .jpeg, .jpg, .png, .tif, .tiff, .wmf).

  • PDFs, SVGs, EPUBs, and Visio files.

It outputs a RenderOutput object containing:

  • A memory stream of the converted PDF.

  • The page count.


2. Function.cs

This is the entry point for the AWS Lambda function. It orchestrates the entire process:

  • Event Input: Reads a CloudWatch Event input that triggers the function. The event contains metadata like:

    • Source S3 bucket and file key.

    • Destination key for the converted file.

    • File extension for processing.

  • Fetching Input from S3:

    • Retrieves the file from the specified S3 bucket.

    • Converts the file stream to a memory stream for processing.

  • Calling AsposeRender:

    • Utilizes the AsposeRender class for converting the document and extracting page count.

  • Saving Output to S3:

    • Uploads the converted PDF back to S3.

    • Adds custom metadata (x-amz-meta-pagecount) with the page count of the document.

  • Error Logging:

    • Captures exceptions and logs error details using LogErrorMessage.

3. ImageManipulation.cs

  • Functionality:

    • Handles image resizing and format conversion.

    • Includes methods for converting images to PDF, resizing with aspect ratio preservation, and analyzing image metadata.

  • Notable Methods:

    • CheckImageSizeStream: Analyzes image dimensions, determines aspect ratio, and classifies properties such as orientation and size.

    • ResizeImageFromStream: Resizes images while maintaining aspect ratio and optionally handles scaling or cropping.

    • SaveTheImageToStream: Converts and saves images in different formats like JPEG, PNG, TIFF, and others.

    • ConvertImageToPdfStream: Uses Aspose.Words to embed an image into a PDF document.

  • Error Handling:

    • Basic try-catch blocks are used to handle exceptions, but logging could be enhanced for better monitoring.

  • Disposal:

    • Implements the IDisposable interface to ensure proper resource management and prevent memory leaks.

4. Resorces folder

This folder contains files essential for the operation of the project. Here's a breakdown of the files and their significance:

4.1 Aspose.Total.lic

This is the license file for the Aspose.Total for .NET library. It includes licensing details, usage rights, and metadata required for enabling the full functionality of the Aspose library without limitations (e.g., watermarks or restricted features).

  • Key Elements:

    • LicensedTo: Specifies the organization or entity licensed to use the library (e.g., Today's Business Solutions Inc).

    • LicenseType: Indicates the type of license (Developer OEM for unlimited deployment locations).

    • Products: Lists the Aspose product suite licensed for usage (in this case, Aspose.Total for .NET).

    • SerialNumber: A unique identifier for the license.

    • SubscriptionExpiry: License validity until 2024-12-19, after which you may need to renew.

    • OEM: Denotes that this is a redistributable license, allowing deployment in multiple locations.

    • This ensures the library operates without restrictions, enabling features like PDF generation or document conversion.

  • License File Security:

    • Avoid exposing this file publicly or in source control repositories. Keep it in secure locations, such as encrypted storage or secure configuration systems.

4.2. Font Files

The folder contains three font files (sylfaen.ttf, symbol.ttf, and wingding.ttf) essential for rendering documents or images accurately when the Aspose library processes text or embeds content.

Font Details:

  1. sylfaen.ttf:

    • A TrueType font commonly used for multilingual text support, especially for languages like Georgian, Armenian, and others.

    • Likely included for rendering content in documents or PDFs requiring this font.

  2. symbol.ttf:

    • A standard font for symbolic characters (e.g., mathematical symbols, arrows).

    • Used when dealing with documents that include mathematical equations or specific symbols.

  3. wingding.ttf:

    • A font containing decorative symbols, shapes, and icons.

    • Useful for rendering documents with special bullet points, icons, or decorative text.

Usage:

  • These fonts are required for rendering documents or PDFs with consistent styling.

  • The Dockerfile installs these fonts into the system fonts directory (/usr/share/fonts/msttcore/) to ensure they are accessible during runtime.

  • They are automatically utilized by the Aspose library when rendering text using these specific fonts.

5. Dockerfile

Models Overview:

  1. RenderOutput.cs:

    • documentStream: Holds rendered document data in memory.

    • pageCount: Number of pages in the rendered document.

  2. RenderRequest.cs:

    • RenderRequest: Contains metadata (e.g., version, id, time) for a rendering request.

    • Detail: Holds file-specific details like Bucket, Key, Filename, File_extension, and ErrorMessage.

These models structure the input/output for rendering operations efficiently.

6. Dockerfile

The Dockerfile builds a .NET Core 3.1 Lambda function with the following key points:

  • Base Image:

    • Uses amazon/aws-lambda-dotnet:core3.1 for the Lambda runtime.

    • Installs libgdiplus to enable System.Drawing compatibility on Linux (a prerequisite for .NET Core's drawing capabilities).

  • Fonts:

    • Installs Microsoft core fonts using msttcore-fonts-installer.

  • Build and Deployment:

    • Restores, builds, and publishes the .NET application using a multi-stage Docker build.

    • Adds resources like fonts to the Docker container to support text rendering.

  • Run Command:

    • Executes the Lambda handler (AsposeTotal::AsposeTotal.Function::FunctionHandler).

Flow of Execution

  1. The Lambda function is triggered by an AWS event (e.g., an object uploaded to S3).

  2. FunctionHandler reads the event details and extracts the S3 bucket, file key, and metadata.

  3. The function fetches the document from S3 using the GetObjectAsync method.

  4. The document is processed by AsposeRender.ConvertDocument.

  5. The resulting PDF is saved back to an S3 bucket (e.g., <original-bucket-name>-converted).

Error Handling

The function employs:

  • Structured logging via LogErrorMessage for errors during conversion.

  • Exception handling to gracefully manage unsupported formats and S3 failures.

Supported File Formats

1. Word Documents

  • File Extensions: .doc, .docx, .rtf, .txt, .xml, .odt

  • Library Used: Aspose.Words

  • Conversion Logic:

    • The input stream is loaded into an Aspose.Words.Document instance.

    • The page count is fetched using the PageCount property.

    • The document is saved as a PDF using Save(outStream, SaveFormat.Pdf).

2. HTML Documents

  • File Extensions: .htm, .html

  • Library Used: Aspose.PDF

  • Conversion Logic:

    • The input stream is loaded using HtmlLoadOptions.

    • A new Aspose.PDF.Document is created and the page count is retrieved.

    • The document is saved as a PDF.

3. Presentation Files

  • File Extensions: .ppt, .pptx, .odp

  • Library Used: Aspose.Slides

  • Conversion Logic:

    • The input stream is loaded into a Presentation object.

    • The slide count is fetched using the Slides.Count property.

    • The presentation is saved as a PDF using Save(outStream, SaveFormat.Pdf).

4. Spreadsheet Files

  • File Extensions: .xls, .xlsx, .ods

  • Library Used: Aspose.Cells

  • Conversion Logic:

    • The input stream is loaded into a Workbook object.

    • Each worksheet is configured with a landscape orientation and Letter paper size.

    • The workbook is saved as a PDF.

    • The resulting PDF's page count is retrieved.

5. PDF Files

  • File Extensions: .pdf

  • Library Used: Aspose.PDF

  • Conversion Logic:

    • The input stream is loaded into an Aspose.PDF.Document.

    • The page count is retrieved using the Pages.Count property.

    • The original PDF is returned without modification.

6. Image Files

  • File Extensions: .bmp, .gif, .jpeg, .jpg, .png, .tif, .tiff, .wmf

  • Library Used: Aspose.Imaging

  • Conversion Logic:

    • The image is loaded using Aspose.Imaging.Image.Load.

    • The image is saved as a PDF using Image.Save(outStream, PdfOptions).

    • A default page count of 1 is returned for image files.

7. Scalable Vector Graphics

  • File Extensions: .svg

  • Library Used: Aspose.PDF

  • Conversion Logic:

    • The SVG file is loaded using SvgLoadOptions.

    • The document is saved as a PDF and the page count is retrieved.

8. Electronic Publications

  • File Extensions: .epub

  • Library Used: Aspose.PDF

  • Conversion Logic:

    • The EPUB file is loaded using EpubLoadOptions.

    • The document is saved as a PDF and the page count is retrieved.

9. Microsoft Publisher Files

  • File Extensions: .pub

  • Library Used: Aspose.PUB

  • Conversion Logic:

    • The .pub file is parsed into a PDF using PubFactory.

    • The resulting PDF's page count is retrieved.

10. Visio Files

  • File Extensions: .vsd

  • Library Used: Aspose.Diagram

  • Conversion Logic:

    • The .vsd file is loaded into a Diagram object.

    • The file is saved as a PDF using Save(outStream, SaveFileFormat.Pdf).

    • The PDF's page count is retrieved.

11. XML Paper Specification Files

  • File Extensions: .xps, .oxps

  • Library Used: Aspose.PDF

  • Conversion Logic:

    • The .xps file is loaded into an Aspose.PDF.Document using XpsLoadOptions.

    • The document is saved as a PDF and the page count is retrieved.

Project Dependencies

The project leverages the following NuGet packages:

Package

Version

Description

Amazon.Lambda.CloudWatchEvents

4.3.0

AWS Lambda integration for handling CloudWatch events.

Amazon.Lambda.Core

2.1.0

AWS Lambda core libraries.

Amazon.Lambda.Serialization.SystemTextJson

2.3.1

JSON serialization for AWS Lambda.

Aspose.Cells

24.11.0

Excel/Worksheet manipulation and PDF conversion.

Aspose.Diagram

24.2.8.0

Visio document handling and PDF conversion.

Aspose.Imaging

24.11.0

Image file manipulation and conversion to PDF.

Aspose.PDF

24.2.0

PDF creation, manipulation, and format conversions.

Aspose.PUB

23.12.0

Microsoft Publisher document parsing and conversion.

Aspose.Slides.NET

24.2.0

Presentation file manipulation and PDF conversion.

Aspose.Words

24.2.0

Word document manipulation and conversion to PDF.

AWSSDK.S3

3.7.104.29

AWS S3 integration for file storage and retrieval.

Deployment

The utility is designed for deployment on AWS Lambda and integrates with Amazon S3 for file operations.

Steps to Deploy

  1. Build the project in release mode using .NET Core 3.1.

  2. Deploy the compiled output to AWS Lambda as a .zip package.

  3. Configure the Lambda function with the necessary IAM permissions to access S3 and CloudWatch.

  4. Provide the required environment variables for S3 bucket details and other configurations.

Usage

  • Input files are uploaded to a designated S3 bucket.

  • The Lambda function processes the file, converts it to PDF, and calculates the page count.

  • The output PDF and page count metadata are stored back in the S3 bucket.

Error Handling

  • Each file type has specific error handling to ensure smooth operation and meaningful error reporting.

  • Unsupported file types trigger a graceful failure with appropriate logging.

Extensibility

This architecture allows the addition of new file formats or updates to existing logic by extending the switch statement or adding new handlers. The modular use of Aspose libraries ensures compatibility with a broad range of document types.

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.