Technical Documentation AsposeRender
Overview
The Aspose Document Conversion Service is a serverless application built using AWS Lambda. It leverages Aspose libraries to convert various document formats into PDF and provides the page count of the resulting PDF files. The project is written in C# with .NET Core 3.1 and integrates with Amazon S3 for file storage and retrieval.
Key Features
Converts multiple document formats (e.g., Word, Excel, PowerPoint, images, SVG, EPUB, Visio, etc.) into PDF.
Extracts and provides the page count of converted PDF files.
Supports integration with AWS Lambda for seamless serverless deployment.
Utilizes Amazon S3 for input and output file handling.
Prerequisites
AWS Lambda Setup:
AWS CLI configured with appropriate credentials.
An existing Lambda function with permissions to access S3.
Dependencies:
Install .NET Core 3.1 SDK.
Ensure the required NuGet packages are installed (see dependencies below).
Project Structure
The project contains the following key files and folders:
Function.cs: The main entry point for the AWS Lambda function. It handles the input, processes the file using Aspose libraries, and saves the output to S3.
aws-lambda-tools-defaults.json: Default configuration for Lambda deployment, including function name, role, and runtime environment.
Tests Folder: Contains unit tests to validate the functionality of the application
Core Components
1. AsposeRender.cs
This class encapsulates the logic for document conversion. It supports multiple formats and is responsible for:
Loading licenses for various Aspose libraries.
Converting input streams (files from S3) to PDFs.
Extracting page count information for the converted documents.
Key Methods:
AsposeLicenses()
: Loads the license file for all Aspose libraries.ConvertDocument()
: Processes the input stream based on file extensions and converts them to PDF.
Supported formats include:
Word documents (
.doc
,.docx
,.rtf
,.txt
,.xml
,.odt
).Excel spreadsheets (
.xls
,.xlsx
,.ods
).Presentations (
.ppt
,.pptx
,.odp
).Images (
.bmp
,.gif
,.jpeg
,.jpg
,.png
,.tif
,.tiff
,.wmf
).PDFs, SVGs, EPUBs, and Visio files.
It outputs a RenderOutput
object containing:
A memory stream of the converted PDF.
The page count.
2. Function.cs
This is the entry point for the AWS Lambda function. It orchestrates the entire process:
Event Input: Reads a CloudWatch Event input that triggers the function. The event contains metadata like:
Source S3 bucket and file key.
Destination key for the converted file.
File extension for processing.
Fetching Input from S3:
Retrieves the file from the specified S3 bucket.
Converts the file stream to a memory stream for processing.
Calling
AsposeRender
:Utilizes the
AsposeRender
class for converting the document and extracting page count.
Saving Output to S3:
Uploads the converted PDF back to S3.
Adds custom metadata (
x-amz-meta-pagecount
) with the page count of the document.
Error Logging:
Captures exceptions and logs error details using
LogErrorMessage
.
3. ImageManipulation.cs
Functionality:
Handles image resizing and format conversion.
Includes methods for converting images to PDF, resizing with aspect ratio preservation, and analyzing image metadata.
Notable Methods:
CheckImageSizeStream
: Analyzes image dimensions, determines aspect ratio, and classifies properties such as orientation and size.ResizeImageFromStream
: Resizes images while maintaining aspect ratio and optionally handles scaling or cropping.SaveTheImageToStream
: Converts and saves images in different formats like JPEG, PNG, TIFF, and others.ConvertImageToPdfStream
: UsesAspose.Words
to embed an image into a PDF document.
Error Handling:
Basic
try-catch
blocks are used to handle exceptions, but logging could be enhanced for better monitoring.
Disposal:
Implements the
IDisposable
interface to ensure proper resource management and prevent memory leaks.
4. Resorces folder
This folder contains files essential for the operation of the project. Here's a breakdown of the files and their significance:
4.1 Aspose.Total.lic
This is the license file for the Aspose.Total for .NET library. It includes licensing details, usage rights, and metadata required for enabling the full functionality of the Aspose library without limitations (e.g., watermarks or restricted features).
Key Elements:
LicensedTo: Specifies the organization or entity licensed to use the library (e.g., Today's Business Solutions Inc).
LicenseType: Indicates the type of license (Developer OEM for unlimited deployment locations).
Products: Lists the Aspose product suite licensed for usage (in this case, Aspose.Total for .NET).
SerialNumber: A unique identifier for the license.
SubscriptionExpiry: License validity until 2024-12-19, after which you may need to renew.
OEM: Denotes that this is a redistributable license, allowing deployment in multiple locations.
This ensures the library operates without restrictions, enabling features like PDF generation or document conversion.
License File Security:
Avoid exposing this file publicly or in source control repositories. Keep it in secure locations, such as encrypted storage or secure configuration systems.
4.2. Font Files
The folder contains three font files (sylfaen.ttf
, symbol.ttf
, and wingding.ttf
) essential for rendering documents or images accurately when the Aspose library processes text or embeds content.
Font Details:
sylfaen.ttf:
A TrueType font commonly used for multilingual text support, especially for languages like Georgian, Armenian, and others.
Likely included for rendering content in documents or PDFs requiring this font.
symbol.ttf:
A standard font for symbolic characters (e.g., mathematical symbols, arrows).
Used when dealing with documents that include mathematical equations or specific symbols.
wingding.ttf:
A font containing decorative symbols, shapes, and icons.
Useful for rendering documents with special bullet points, icons, or decorative text.
Usage:
These fonts are required for rendering documents or PDFs with consistent styling.
The Dockerfile installs these fonts into the system fonts directory (
/usr/share/fonts/msttcore/
) to ensure they are accessible during runtime.They are automatically utilized by the Aspose library when rendering text using these specific fonts.
5. Dockerfile
Models Overview:
RenderOutput.cs:
documentStream
: Holds rendered document data in memory.pageCount
: Number of pages in the rendered document.
RenderRequest.cs:
RenderRequest: Contains metadata (e.g.,
version
,id
,time
) for a rendering request.Detail: Holds file-specific details like
Bucket
,Key
,Filename
,File_extension
, andErrorMessage
.
These models structure the input/output for rendering operations efficiently.
6. Dockerfile
The Dockerfile
builds a .NET Core 3.1 Lambda function with the following key points:
Base Image:
Uses
amazon/aws-lambda-dotnet:core3.1
for the Lambda runtime.Installs
libgdiplus
to enableSystem.Drawing
compatibility on Linux (a prerequisite for .NET Core's drawing capabilities).
Fonts:
Installs Microsoft core fonts using
msttcore-fonts-installer
.
Build and Deployment:
Restores, builds, and publishes the .NET application using a multi-stage Docker build.
Adds resources like fonts to the Docker container to support text rendering.
Run Command:
Executes the Lambda handler (
AsposeTotal::AsposeTotal.Function::FunctionHandler
).
Flow of Execution
The Lambda function is triggered by an AWS event (e.g., an object uploaded to S3).
FunctionHandler
reads the event details and extracts the S3 bucket, file key, and metadata.The function fetches the document from S3 using the
GetObjectAsync
method.The document is processed by
AsposeRender.ConvertDocument
.The resulting PDF is saved back to an S3 bucket (e.g.,
<original-bucket-name>-converted
).
Error Handling
The function employs:
Structured logging via
LogErrorMessage
for errors during conversion.Exception handling to gracefully manage unsupported formats and S3 failures.
Supported File Formats
1. Word Documents
File Extensions:
.doc
,.docx
,.rtf
,.txt
,.xml
,.odt
Library Used: Aspose.Words
Conversion Logic:
The input stream is loaded into an
Aspose.Words.Document
instance.The page count is fetched using the
PageCount
property.The document is saved as a PDF using
Save(outStream, SaveFormat.Pdf)
.
2. HTML Documents
File Extensions:
.htm
,.html
Library Used: Aspose.PDF
Conversion Logic:
The input stream is loaded using
HtmlLoadOptions
.A new
Aspose.PDF.Document
is created and the page count is retrieved.The document is saved as a PDF.
3. Presentation Files
File Extensions:
.ppt
,.pptx
,.odp
Library Used: Aspose.Slides
Conversion Logic:
The input stream is loaded into a
Presentation
object.The slide count is fetched using the
Slides.Count
property.The presentation is saved as a PDF using
Save(outStream, SaveFormat.Pdf)
.
4. Spreadsheet Files
File Extensions:
.xls
,.xlsx
,.ods
Library Used: Aspose.Cells
Conversion Logic:
The input stream is loaded into a
Workbook
object.Each worksheet is configured with a landscape orientation and Letter paper size.
The workbook is saved as a PDF.
The resulting PDF's page count is retrieved.
5. PDF Files
File Extensions:
.pdf
Library Used: Aspose.PDF
Conversion Logic:
The input stream is loaded into an
Aspose.PDF.Document
.The page count is retrieved using the
Pages.Count
property.The original PDF is returned without modification.
6. Image Files
File Extensions:
.bmp
,.gif
,.jpeg
,.jpg
,.png
,.tif
,.tiff
,.wmf
Library Used: Aspose.Imaging
Conversion Logic:
The image is loaded using
Aspose.Imaging.Image.Load
.The image is saved as a PDF using
Image.Save(outStream, PdfOptions)
.A default page count of
1
is returned for image files.
7. Scalable Vector Graphics
File Extensions:
.svg
Library Used: Aspose.PDF
Conversion Logic:
The SVG file is loaded using
SvgLoadOptions
.The document is saved as a PDF and the page count is retrieved.
8. Electronic Publications
File Extensions:
.epub
Library Used: Aspose.PDF
Conversion Logic:
The EPUB file is loaded using
EpubLoadOptions
.The document is saved as a PDF and the page count is retrieved.
9. Microsoft Publisher Files
File Extensions:
.pub
Library Used: Aspose.PUB
Conversion Logic:
The
.pub
file is parsed into a PDF usingPubFactory
.The resulting PDF's page count is retrieved.
10. Visio Files
File Extensions:
.vsd
Library Used: Aspose.Diagram
Conversion Logic:
The
.vsd
file is loaded into aDiagram
object.The file is saved as a PDF using
Save(outStream, SaveFileFormat.Pdf)
.The PDF's page count is retrieved.
11. XML Paper Specification Files
File Extensions:
.xps
,.oxps
Library Used: Aspose.PDF
Conversion Logic:
The
.xps
file is loaded into anAspose.PDF.Document
usingXpsLoadOptions
.The document is saved as a PDF and the page count is retrieved.
Project Dependencies
The project leverages the following NuGet packages:
Package | Version | Description |
---|---|---|
Amazon.Lambda.CloudWatchEvents | 4.3.0 | AWS Lambda integration for handling CloudWatch events. |
Amazon.Lambda.Core | 2.1.0 | AWS Lambda core libraries. |
Amazon.Lambda.Serialization.SystemTextJson | 2.3.1 | JSON serialization for AWS Lambda. |
Aspose.Cells | 24.11.0 | Excel/Worksheet manipulation and PDF conversion. |
Aspose.Diagram | 24.2.8.0 | Visio document handling and PDF conversion. |
Aspose.Imaging | 24.11.0 | Image file manipulation and conversion to PDF. |
Aspose.PDF | 24.2.0 | PDF creation, manipulation, and format conversions. |
Aspose.PUB | 23.12.0 | Microsoft Publisher document parsing and conversion. |
24.2.0 | Presentation file manipulation and PDF conversion. | |
Aspose.Words | 24.2.0 | Word document manipulation and conversion to PDF. |
AWSSDK.S3 | 3.7.104.29 | AWS S3 integration for file storage and retrieval. |
Deployment
The utility is designed for deployment on AWS Lambda and integrates with Amazon S3 for file operations.
Steps to Deploy
Build the project in release mode using .NET Core 3.1.
Deploy the compiled output to AWS Lambda as a .zip package.
Configure the Lambda function with the necessary IAM permissions to access S3 and CloudWatch.
Provide the required environment variables for S3 bucket details and other configurations.
Usage
Input files are uploaded to a designated S3 bucket.
The Lambda function processes the file, converts it to PDF, and calculates the page count.
The output PDF and page count metadata are stored back in the S3 bucket.
Error Handling
Each file type has specific error handling to ensure smooth operation and meaningful error reporting.
Unsupported file types trigger a graceful failure with appropriate logging.
Extensibility
This architecture allows the addition of new file formats or updates to existing logic by extending the switch
statement or adding new handlers. The modular use of Aspose libraries ensures compatibility with a broad range of document types.