ImageMAKER Discovery Assistant Product Specification
Discovery Assistant is a software product designed to process email, electronic documents and image files to produce TIFF/PDF, metadata, and extracted text.
Processed documents can be exported from Discovery Assistant as TIFF/PDF with extracted text and metadata, and loaded into case management tools such as Summation or Concordance for review and production.
Discovery Assistant is capable of handling millions of documents and scaling across multiple machines.
The EDD Discovery Process
The goal of the electronic discovery process is to produce the smallest set of meaningful documents, in a usable format, that meets the required criteria.
The e-discovery process includes the following activities:
Product Features Overview
Discovery Assistant supports the following product features:
Discovery Assistant Downloads:
Review and search tools:
Discovery Assistant allows the following functions to be batched:
All processed data is stored within a project directory.
Files can be loaded into a Discovery Assistant project in one of 4 ways:
To batch load large file sets, download and install the TeraBite utility. This program allows the user to create an ordered file list for loading into Discovery Assistant.
Documents are categorized as follows:
During the conversion process, files are moved from the 'all files' category through to the 'stamped' category through a series of steps. Each step is represented by a tab, which contains a subset of the 'all files' list - representing what stage the conversion process has reached for those files.
Processing can be stopped and re-started at any time.
At any point during the batch process the user can click on a processed file to view:
Supported File Types
Discovery Assistant processes a wide variety of input file types. These include: Microsoft Word documents, Excel spreadsheets, PowerPoint presentations, Outlook email files (PST, MSG), Outlook Express Files (EML), Lotus Notes files (NSF), WordPerfect documents, rich text format files (RTF), Microsoft Visio files, Corel Draw files, Cad Cam files, Lotus 1, 2 and 3 spreadsheets, text files, HTML documents, Adobe Acrobat documents (PDF), compressed archives (ZIP), images (TIF, JPG, BMP, ect), scanned files and more.
Discovery Assistant uses the native application to petrify documents so any documents that possess a print or printto command on your system can be processed. At time of installation, Discovery Assistant produces a list that identifies any document type with a ‘print’ or ‘printto’ file association.
Discovery Assistant contains a conditional filter to exclude executable files, hidden files, system files, and an optional feature to not process email attachments or sub-directories.
Common Supported file types:
De-duplication can be done at the file level, or message level (multiple attachments). Global De-duplication is supported across multiple projects.
Uses MD5 Hash code to identify duplicates, and a full binary confirm on matches.
Extraction of Metadata and Text
Discovery Assistant supports full metadata extraction from source documents, including MS Office specific tags, Microsoft Outlook email specific tags, and Lotus Notes specific tags. Standard email tags include Date Sent, Time Sent, Subject, Text Body, Html Body, Filename, Author, File Size, File Date, File Time, email header information, To, From. Lotus Notes specific metadata includes IMLog, Appointment, Bookmark, Notice, Phone Message, Return Receipt, and Task Form.
Document text is extracted at time of printing, either from the document itself, or from the print stream. This ensures 100% accuracy. If the document text cannot be extracted (perhaps because the document is an image file) then the user has the option to OCR the document after conversion to TIFF (petrification). Discovery Assistant uses the Microsoft Office OCR engine to extract text from image files (99% accuracy). If Word 2003 or Word 2007 OCR engine is not available, then Discovery Assistant uses its own built-in OCR engine.
The full set of extractable Metadata fields is:
Discovery Assistant supports a variety of export options including:
Supported Output resolutions and page size (determined at time of conversion):
Additional export file types:
Export File Naming Schemes:
Discovery Assistant generates reports for each type of batch process. Reports are stored with the project, and can be reviewed chronologically using Windows Explorer file manager.
Reports indicate any warnings, errors, time to complete the function, and any additional information that might be necessary to track the progress of one or more files through the process.
Reports are generated at time of:
Blank Page Removal (Page Deblanking)
Spreadsheet files by default are set to print the whole sheet, not the default print range. The printing process does generate a large number of blank pages.
Discovery Assistant can batch process files to detect and eliminate blank pages. The process is extremely fast, and can go through 1000’s of pages in a second looking for blank pages.
Document ID's and Bates Numbers
Converted documents can be assigned both a Bates Number and a Document ID. Assigned ID’s support up to 20 alpha-numeric characters.
Bates Range values are automatically assigned to parent items when Bates Numbers are assigned in ‘child next’ order.
Discovery Assistant includes the ability to stamp converted files with specific information. The user is able to control the stamp content from the Stamp Options. Possible stamps are: Bates number, document ID, file name, file path, file type, page number, number of pages and many more. Discovery Assistant supports a user selectable option to shrink the document image so that stamps do not obscure image data.
Stamp fields can be place in one of 6 coordinate positions. Top left, Top Center, Top Right, Bottom Left, Bottom Center, Bottom Right.
Stamp text fields can include a combination of the following data fields (plus user defined strings)
Any field type tracked and displayed by Discovery Assistant can also be sorted. Sort fields include status, filename, source path, size, date/time to name just a few.
Printing a Hard Copy
Stamped or Unstamped processed TIFF / Postscript files can be spooled to a standard printer. Metadata pages can be printed as slip sleeves.
Some files cannot be converted, perhaps because they are password protected, corrupt, or unprintable (such as WAV files). These files can be ‘passed through’ as a converted file by creating a placeholder file. The placeholder file includes the file’s metadata fields and a user configurable message.
Quality Control Review Module
The QC module allows users to review the conversion results, and to re-queue any incorrectly converted documents, and/or to manually replace TIFF files.
Functional features include:
Documents can be reviewed one page at a time. Text scrolls to match the current page.
Manual TIFF Replacement
Users can manually replace converted TIFF files using the ‘Replace TIFF’ function in the Quality Control Review Module. Users can create alternate TIFF renderings by printing source documents to the ‘Print To Tiff’ print driver supplied with Discovery Assistant. TIFF and Text files for the current document can then be manually replaced.
Native Text Extraction
Discovery Assistant extracts the text directly from the native document at time of import, or if not easily accessible, at time of conversion. Foreign language sets are handled as MBCS characters (Multi-byte Character Set)
OCR Text Extraction
Discovery Assistant extracts text at time of import (most email messages), or at time of conversion (most email attachments). If the file is an image file, then the text can only be extracted by doing OCR. By default Discovery Assistant uses the OCR engine installed with Office 2003 or Office 2007 (Nuance Omnipage). If you have Office 2000, then a simpler OCR engine is used as the default (recommend upgrading to Office 2003).
OLE Embedded Object Extraction
If you select the Options ‘Extract OLE objects’ in the Options tab / MS Office / Extract OLE embedded objects, Discovery Assistant will identify and extract all OLE objects embedded in an office document.
Typically OLE objects include embedded spreadsheets, or embedded pictures or documents. These embedded objects may be displayed as an icon, or a portion of the document only. By extracting OLE objects, you are assured of getting every searchable string in the document.
Support for Foreign Character Sets
Discovery Assistant supports MBCS (Multi-Byte Character Strings) in extracted Metadata. Printed images will retain the original font and font characteristics.
Intelligent Process Monitoring
Discovery Assistant monitors file conversion status in real time, and includes intelligence to 'auto-close' and 'auto-kill' stuck applications. This ensures that the conversion process runs un-attended for hundreds of thousands of conversions.
5 timeout values (user configurable) are continuously monitored to ensure that no print job gets stuck: Job Start, First Page, Next Page, Max Pages, and Total Job.
Parent Child Relationships
Parent-Child relationships are fully maintained throughout the process. The user interface includes a ‘goto parent’, and ‘list children’ buttons that are accessible at all times. Unlimited nesting of parent/child relationships is fully supported.
Bates Number and Document ID number assignments track parent/child relationships.
De-duplication and Global De-duplication can be configured to only skip duplicates if parent item is also a skipped duplicate, so no document will be orphaned in the output set.
Exported MetaData fields include multiple Parent/Child group range fields.
Advanced Email Handling
Discovery Assistant fully supports email and attachment extraction from Outlook PST and MSG files, Lotus Notes NSF databases, Outlook Express EML files, and iCalendar and vCalendar files (ICS/VCS).
Specialized code is in place to support:
Reconized Microsoft Outlook message types include:
Reconized Lotus Notes message types include:
Upgrade path to support multiple servers
For large batch jobs that are too big for a single machine, we provide an upgrade path that allows multiple servers to attach to the same conversion database to help process jobs and improve throughput.
Additional tools required are:
Microsoft Excel Spreadsheet formatting control
Discovery Assistant incorporates special spreadsheet controls to improve the quality of printed spreadsheets. Some of these controls include un-hiding cells columns and rows, printing formulas out separate, automatic resizing, setting scaling to fit, defining relative scale and suppressing headers/footers.
Microsoft Word formatting control
Specialized handling for Microsoft Word files includes:
Microsoft PowerPoint formatting control
Specialized handling for Microsoft PowerPoint files includes:
Built-in Review and Search Tool
ImageMAKER has developed a separate distributable HTML based Review Tool with support for indexed search. This tool must be licensed separately for distribution to clients.
Clients can review produced documents using a simple HTML browser interface. There is no need for clients to purchase and learn a case management tool in order to review data.
Indexed search can be done on the data by downloading and installing an inexpensive third party search tool that supports indexed searching. Indexed searching allows users to perform complex search queries with very fast response rates.
Discovery Assistant is rated at one gigabyte per day per machine. One gig of data averages out to approximately 70,000 pages, and about 5 gigs of converted files storage space.
Actual per page conversion speed is rated at 3,500 pages per hour of straight conversion (20 hours a day). We factor in 4 hours of overhead per day to handle the other house-keeping tasks, like file import / de-duplication / deblanking / bates labeling / exporting etc.
In addition to straight conversion is the time required to:
Discovery Assistant comfortably handles the conversion of up to 100,000 files per project. If you are attempting to convert a million files, recommendation is to break the job down into 10 separate projects.
These projects can then be farmed out to up to 10 separate machines.
Internal DataBase Format
The Discovery Assistant internal database format is XML. The XML Project File is loaded into memory for processing, and updated back to the disk after significant project changes, and regularly during the document conversion process.
Discovery Assistant provides an add-on Reporting Tool to convert from XML project files to XLS, MDB, SQL, or text for review and long-term documentation.
Discovery Assistant was developed using the latest Microsoft .NET software tools. Code is written in C# (main application), C++ (object modules), and C (print driver components).
Windows 7 note: The .Net Framework 1.1 is not supported by Microsoft on Windows 7, but in our experience, Discovery Assistant, which currently uses that framework, has worked flawlessly on both 32- and 64-bit versions.
ImageMAKER custom development services are designed to insure that if you are relying on our product to meet your customer’s needs, that we are there to help you through any problems.
Our custom development services include hot fixes, custom features and modifications, and development of new custom modules and export formats.
Maintenance & Support Services
Our Maintenance Support Services ensure that you get the answers you need, when you need them. Includes immediate access to senior development personnel, product fixes and updates.
ImageMAKER Development Inc.
ImageMAKER Development Inc. consists of a team of experienced and highly qualified individuals that over the last 15 years has developed market leading OEM conversion software for fax, unified messaging, document conversion, -delivery and -storage applications, including Microsoft Mail, Microsoft Small Business Server, Lotus Domino FAX Server and products from T-Mobile, Tiscali, Eastman Kodak, FileNet, Kofax, Canon, EasyLink (Mail.com), MessageClick, MCI, IBM, Hewlett Packard, Cable & Wireless, Motorola AirCommunications, Nortel Networks, Cisco, Lucent Technologies & Octel (now Avaya), Nippon Telephone and Telegraph International, Telecom Finland (Telia Sonera), Alcatel and many more with over 55 million client installs around the world.
ImageMAKER Development current focus is on the delivery of innovative solutions for the next generation of Electronic Data Discovery applications by leveraging its market proven core document conversion technology.more details »
Detailed alphabetic list Of terms used on ImageMAKER website. more details »