AI Training

Tips and tricks to train the AI to answer text questions about your products and services. To upload visual data for the Visual Widget, go here.

How to train a customized AI

Step 1: Organize the data that you will use to train the AI

Although you can upload various types of files for AI training, understanding how the AI works will help you to get better results. A key distinction in data types is between structured and unstructured data. Structured data, like databases and spreadsheets, is organized in a predefined format with clear relationships, while unstructured data, such as text or word processing documents, lacks a predefined structure. Vizaport utilizes OpenAI, the platform for ChatGPT, which supports can learn from both structure and unstructured data. You may find some questions are better answered with structure – such as using spreadsheets to load product information. Or you may find some questions are better answered with unstructured data – such as a text file with frequently asked questions. The key is testing and learning, so be sure to follow all three steps.

Structured Data

Spreadsheets, JSON files, XML files, etc.

Unstructured Data

Documents, PDFs, text files, etc.

Step 2: Upload documents or extra data from web pages. After logging in, use the AI Training data loader.

Select your files or drag-and-drop them into the step for AI Training. If you are using the Visual Widget and have selected an industry, please note that you will have a previous step before AI Training. For web pages, enter a URL to extract data and train it on a web page. For files, you may upload one or multiple files at one time. For guidance and tips to prepare file types, please see our help below for: PDF, DOC, XLS, PPT, TXT, RTF, CSV, JSON, XML. Or below for extracting from web pages.

AI Training

Step 3: Test the results. Return to Step 2 if you need to make changes or add new files.

In the last step of the process (Configure Widget), you’ll find a preview of your Chat Widget. Test the widget by asking questions. You’ll learn what works and what doesn’t work, and can iterate on your AI training by changing your data or adding new data.

Structured Data – File Types

Organized and formatted information that is stored in a predefined manner, such as spreadsheets

.csv

CSV

File Type: Comma-separated values, plain text tabular data. Exportable from spreadsheets.

AI Training Tips: CSV format is the preferred method for loading data from spreadsheets. If you have data in Excel or Google Sheets, you can export as CSV. Vizaport converts Excel files to CSV anyway, so you have greater control if you export to CSV in advance and check the results before uploading. Ensure that your data is a simple table with rows and columns, and your first row contains your column names.

.xls, .xlsx, .xlsm

XLS

File Type: Excel spreadsheet files.

AI Training Tips: Ensure that your data is a simple table with rows and columns, and your first row contains your column names. Excel files will be automatically converted to CSV files upon upload to Vizaport, so you can choose to have greater control and convert to CSV prior to upload to check your input data.

.json

JSON

File Type: JSON (JavaScript Object Notation) is a lightweight data interchange format

AI Training Tips: JSON is ideal for AI training because it provides a simple, flexible, and human-readable format for organizing and exchanging structured data, making it ideal for handling complex datasets and promoting efficient communication between different components of AI systems. When possible, convert data to JSON for best results.

.xml

XML

File Type: eXtensible Markup Language. A versatile, text-based markup language.
AI Training Tips: XML offers a structured format for organizing diverse data types, providing clear human readability and easy data interchange. However, its verbosity and complexity can lead to larger file sizes, potentially slowing down data processing in AI applications. Where possible, convert XML to JSON first.

Unstructured Data – File Types

Information that lacks a predefined format, organization, or clear data model, such as text docs

.txt

TXT

File Type: Unformatted text files

AI Training Tips: Using .txt files for AI training offers simplicity and universality, as plain text is a widely supported format. It enables easy preprocessing and is suitable for various natural language processing tasks. However, the lack of inherent structure and metadata in plain text can pose challenges for some types of data (for example a list of products with details). If this is the case, choose a structured data format.

.rtf

RTF

File Type: Rich text format 

AI Training Tips: Pros of using RTF files for AI training include their support for rich formatting, enabling the inclusion of diverse textual elements. However, their verbosity and potential complexity may pose challenges in data processing and extraction for training purposes. To remove images and formatting that may be unnecessary for training, consider exporting files to TXT format.

.doc, docx, .docm

DOC

File Type: Word documents

AI Training Tips: Word documents are universal and content is likely available in this format. Vizaport support Word documents, although there may be extra formatting that is not required, so to save space (if your documents are large), you may consider exporting to an RTF format or a simple text (TXT) format. 

.ppt, pptx, .pptm

PPT

File Type: Powerpoint documents

AI Training Tips: PowerPoint presentations offer summarized textual content, facilitating the learning of associations between different elements by AI models. The organized format can aid in comprehending relationships within the text, although challenges may arise from inconsistencies and potential biases in the creator’s style, affecting the model’s adaptability.

.pdf

PDF

File Type: Portable Document Format (PDF) is used for presenting documents

AI Training Tips: PDFs are universal and content for training is likely available in this format. However, it can be difficult for Vizaport’s pre-processor to fully understand and extract all text correctly to train the AI model. Whenever possible, convert the information in PDFs to one of the other formats. For example, a PDF that has a table, could be loaded as a spreadsheet. PDFs with text content could be converted as a Word doc, or even better as a TXT file.

Web Page – Extract Data

Information from web pages can be easily added if the page can be accessed and parsed

.html

HTML

File Type: Web pages that return HTML

AI Training Tips: Test and learn. First, please note that Vizaport processes a single page when given a URL. You can continue to add URLs for separate pages to train. If you have many pages, please consider a web site dump into a single file to upload (e.g. XML or CSV). Second, results vary from web site to web site. Try a single page and then test to see the results, if it works, continue to add pages. Some web sites will have issues, due to connection issues, or pages that are difficult to parse and understand the text content. If this happens and you are not getting good results, extract the data from the web page and use a format such as text files (see above) and upload the data.

docs.google.com

Google Apps (Docs, Sheets, Slides)

File Type: Google apps that are published

AI Training Tips: Google apps such as Docs, Sheets and Slides are supported but need to be published first. This does not mean sharing publicly.  These steps may change, but are currently as follows:

  1. In Google app, open the File menu in the top left
  2. Select Share -> Publish to Web
  3. Click on the Publish button and it will generate a link (ending in /pub or /pubhtml)
  4. Copy this link and insert it as the URL into your Vizaport data loader

.json, .xml

Structured Data (JSON, XML)

File Type: Web pages that return JSON or XML

AI Training Tips: Web sites that return JSON or XML as structured data are supported as an upload, or as a URL that can extract the data from the web site. The details of these file types is covered in the Structured Data section.