Help - Text - Browse
JTLanguage help browser.
Text Importing and Exporting
Introduction
In creating JTLanguage, providing for importing and exporting text conveniently and flexibly was very important to me. Therefore, there are a variety of ways to get text into or out of JTLanguage. In this document, we will look at the text importing and exporting mechanisms.
Overview
When displaying a textual content page, or lesson, group, or course pages for that matter, at the bottom of the main content page you will find "Import" and "Export" button links. These will take you to Import or Export pages that will let you import or export the current item. These make use of what I call the "Format" mechanism in JTLanguage, which is used as the core of these import and export mechanisms. On the Import or Export page you will select a particular format to use in the importing or exporting. At the time of this writing, there are three different formats that pertain to importing or exporting textual content, called: "Patterned Line", "Patterned Block", and "Extract".
Each of these formats make use of a "pattern" you provide, which tells JTLanguage about the format expected in importing, or to be used in exporting the text. Because JTLanguage can’t anticipate all the kinds of formatting you might encounter, it makes use of a very flexible pattern mechanism to let you specify a pattern string that represents the formatting. We’ll talk more about this latter in this document.
The "Patterned Line" format treats the input or output as individual lines in which all the desired languages for one item are grouped together, usually in one line, but possibly in separate lines, depending on the pattern. The pattern used indicates the format for these lines.
The "Patterned Block" format breaks up the text into block of lines for each language. The pattern used indicates the format of both the blocks and lines.
The "Extract" format adds to either of the line or block patterned formats the means to extract the sentences, words, and/or characters from the text. For example, if you use the "Extract" format for the export of a Text content item, instead of outputting the text in the normal way, it will instead send the extracted sentences, words, or characters to separate sentences, words, or characters content items, creating them if necessary. If you use the "Extract" format in importing a text or lesson item, it will use the line or block pattern format to bring the text into the text content item, and then will send the extracted sentences, words, or characters to separate Sentences, Words, or Characters content items, again creating them if necessary.
These formats also have several other options, as depicted in the Import or Export pages. For example, if you want to break up long lists into smaller units, you can do so. For Export, you can select or deselect the items you want exported via checkboxes.
If you use the Patterned Line or Patterned Block formats with a Lesson, Group, or Course, you can import or export the text for entire lessons, groups, or courses. In the case of importing, if you are importing a hierarchy, you will need to use a comment directive mechanism to identity course, group, lesson, and content boundaries. This comment directive mechanism also lets you provide titles, descriptions, annotations, and associate media with items.
First, let’s look at the pattern mechanism.
Patterns
Patterns are another mechanism in JTLanguage where you can become something of a programmer. A "pattern" is a cryptic string with special placeholders indicating where specific parts of study item go. For example, this is the default pattern for the Patterned Line format:
%{t}\t%{h}
The "%{" and "}" delimit a special pattern token to be used in matching input or output. The content between the braces identifies the part of the study item used. For example, "%{t}" represents the first target language item, and "%{h}" represents the first host language item. Anything else in the pattern represents literal characters to be matched or output. The special "\" character marks a special character. For example, "\t" represents a tab spacing character, and "\n" a newline character.
Usually, I personally prefer to use a more visible character such as a "|" for separating language items in a line, so the pattern I typically use is:
%{t}|%{h}
Thus, if my target and host languages are German and English, and I’m using the Patterned Line format, I could use this pattern with a list of study items I’ve formatted like this:
eins|one zwei|two drei|three
JTLanguage would then be able to distinguish the target and host language items.
With the Patterned Line format, as long as the text uses a consistent and unambiguous line format, I could probably create a pattern that would match it.
For another example, suppose I have a vocabulary list I found in this format, where the target and host items are on separate lines:
eins one zwei two drei three
I could change my pattern to the following:
%{t}\n%{h}
The "\n" token indicates that a line break is expected there. Note how the Patterned Line format implicitly assumes a newline character at the end of the pattern.
For the "Patterned Block" format, the pattern is a little more complicated, as it has to represent both lines and blocks of lines. The default pattern is:
"%p{%{t}}\n%p{%{h}}
In this pattern, the sub-pattern between a "%p{" and "}" describe the lines of a block. So in this case, input like the following would be expected, in which the first block of lines is in the target language, followed by a blank line (matched by the "\n" newline character), then followed by a second block of lines in the host language:
eins zwei drei one two three
There are also other matching or substitution tokens. Here is the complete list for both Patterned formats:
Token | Description |
---|---|
t | First target language item. |
t1-t3 | Additional target language items. |
h | First host language. |
h1-h3 | Additional host language items. |
d | Skip a number. |
s | Skip a string. |
(num)s | Skip (num) characters of a string. |
o | Output an ordinal number which is incremented for each item. |
tag | For use with courses, or plans to select or set up a node or study list using the tag as the title. |
title | The node or content title. |
description | The node or content description. |
label | The node or content label. |
node | The node title. |
contentType | The content type. |
contentSubType | The content subtype. |
The following are just for the Patterned Block format:
Token | Description |
---|---|
p | Specify a block format. |
lt | Input or output target language name. |
lt1-lt3 | Input or output additional target language names. |
lh | Input or output host language name. |
lh1-lh3 | Input or output additional host language names. |
For example, if you are using output from another source that might include other stuff you don’t need, you could use the s or n tokens to ignore that stuff. Say your input has a number you don’t need, like this:
0,eins,one 1,zwei,two 2,drei,three
You could use the Patterned Line format with this pattern:
%{d},%{t},%{h}
Import Options
We’ll take a look at the options for importing, for both the Patterned Line and Patterned Block formats. Note that some options only appear depending on the state of previous options, or might only appear based on the import/export target/source.
Label | Type | Description |
---|---|---|
Select format | drop-down | Select the format type. |
Delete before import | checkbox | Delete contents and children before importing. |
Pattern | string | Pattern string. |
Use comments | checkbox | Treat lines starting with the comment prefix as comments or directives. |
Comment prefix | string | The comment prefix, by default "#". |
Exclude prior items | checkbox | If checked, exclude items if they appear earlier in the study list or course. |
Merge | checkbox | Merge new items with existing items. |
Translate missing items | checkbox | Translate missing items using Google translate. |
Subdivide items into smaller groups | checkbox | Subdivide longer lists into smaller units. If not checked, the following 4 option will not appear. |
Subdivide to study lists only | checkbox | If checked, subdivide items to study list hierarchy only, not new lessons or groups. Otherwise, new lessons or group will |
Master name | drop-down | Select a lesson master to use for new lessons or groups. |
Study items per leaf | integer | How many study items for one study list. This is the subdivide threshold. |
Minor subdivide count | integer | How many lessons or immediate study list parents. |
Major subdivide count | integer | How many groups or study list grandparent . |
Make public | checkbox | If importing lesson, group, or course nodes, mark them public if checked, otherwise leave unchanged. |
Filter out duplicates | checkbox | If importing and Merge is not checked, if this option is checked, duplicate items will not be imported. This will make use of the anchor languages, which indicate which languages to use in checking for duplicates. |
Anchor languages | checkboxes | If Filter out duplicates is checked, and Merge is not checked, and any of these languages are checked, try to avoid duplicates by merging language items that are the same. |
Import type | drop-down | If "File" import from a file select via the "Import File" field below. If "Text" import from "Import Text" field that appears below. |
Import File | file select | Select a file to import. Displayed only if Import Type is "File". |
Import Text | text edit | Text to import. Displayed only if Import Type is "Text". |
I know having so many options makes this complicated, and perhap intimidating. The option settings will be saved once you actually do the import or export, such that you won't have to redo them the next item, unless you actually want or need to change something. This is another case where I choose power and flexibility at the expense of complexity.
Export Options
Now we’ll take a look at the options for exporting, for both the Patterned Line and Patterned Block formats.
Label | Type | Description |
---|---|---|
Select format | drop-down | Select the format type. |
Pattern | string | Pattern string. |
Use comments | checkbox | Treat lines starting with the comment prefix as comments or directives. |
Comment prefix | string | The comment prefix, by default "#". |
Ordinal | number | Starting value for ordinal field. |
Export type | drop-down | If "File" exports to a file, selected when the export is started. If "Text" exports directly to the browser page in the "Export output" field that appears below. |
Extract Options
The Extract format has some options in addition to those of the Patterned Line and Patterned Block formats:
Label | Type | Description |
---|---|---|
Text input format | drop-down | For importing, select the Line or Block format to use for the text. |
Delete before export | checkbox | If checked, delete any prior content from the targeted sentences, words, or characters content items. |
Translate missing items | checkbox | Translate sentences, words, or characters, if needed. |
Extract sentences | checkbox | If checked, extract sentences from the text. |
Sentences target key name | drop-down | Select the target content item to receive the sentences. |
Extract words | checkbox | If checked, extract words from the text. |
Words target key name | drop-down | Select the target content item to receive the words. |
Extract characters | checkbox | If checked, extract characters from the text. This only applies to character-based languages like Chinese. |
Characters target key name | drop-down | Select the target content item to receive the characters. |
Include media | checkbox | If checked, include media for the extracted items. |
Lookup dictionary media | checkbox | If no media present, look-up dictionary media. |
Synthesize missing audio | checkbox | If no audio present, synthesize it. |
Additional Notes About Extract Format
Note that the extract format is normally used on either a Text or Sentences content item. If used on a Sentences content item, the Extract sentences option doesn’t apply.
For languages that don’t use spaces to separate words, the word extraction depends on the dictionary database. The extractor will look for the longest matching item in the dictionary, which might be a phrase. If the dictionary doesn’t have entries that correspond to the text, unfortunately the word extraction won’t work.
Note that the character extraction only applies to character-based languages like Chinese.
Tricks and Other Tidbits
If you are using the "Exclude prior items" in importing, there is a little trick you can use to avoid importing commonly used words like "the" and "a" (and their equivalents in other languages). Create a lesson at the start of your course with a Words vocabulary study list and mark it private. Put in the Words study list all such words you don't want to see imported in later lessons. Then, when you import later lists, these words will be excluded. Later you could then delete the private lesson, though perhaps first saving it for latter use via the Export mechanism.
Other Format Types
In addition to the Patterned Line, Patterned Block, and Extract text format, there are other format types, mainly for importing and exporting courses, groups, lessons, content items, masters, markup templates, and other things. These could be textual or binary formats. They could be used for backing up and restoring items.
The JTLanguage Chunky format is the main format to use for backing up and restoring items. It’s default file name extension is ".jtc".
The JTLanguage XML format represents JTLanguage objects in an XML format. XML is a standard textual markup language. This can be used to backup courses, groups, lessons, content items, masters, markup templates, and other things, but will not include any media files. It’s default file name extension is ".xml".
The JTLanguage Package format predates the JTLanguage Chunky format and supercedes it. It uses the ZIP format to package up courses, groups, lessons, content items, masters, markup templates, and other things, using the JTLanguage XML format for the internal objects, and including the media objects directly into the ZIP file. It’s default file name extension is ".jtp". I kept it around mainly for back-compatibility, but it might be useful if you want to see the media file hierarchy, by renaming the file to .zip and extracting it with an archive extractor that supports the ZIP format. Note that there is a media download mechanism available at each level in the course hierarchy, if you just want to download media files.