Tutorial Yoshikoder 0.36 (Incompleto)
Tutorial Yoshikoder 0.36 (Incompleto)
Tutorial Yoshikoder 0.36 (Incompleto)
Yoshikoder 0.36
Yoshikoder is a simple multilingual content analysis engine. Below is a short tutorial to get you
going. After the tutorial is a set of notes on particular aspects of the program. Finally there is a
frequently asked questions list.
The Opening Window
When Yoshikoder opens you should a three panel window will open. On Windows it looks like
1 de 7 24-03-2007 10:56
Yoshikoder 0.36 http://people.iq.harvard.edu/~wlowe/code/yk/userdocs/YoshikoderUse...
The top left panel headed 'Dictionary' will contain whatever content dictionary you are currently
working with. The name of the dictionary appears as the root node of a tree next to a blue book
icon in the panel. Yoshikoder starts by presenting an empty dictionary called 'Untitled' that
contains no categories.
The top right panel is the document panel, and contains the document you are currently working
with. Yoshikoder starts without a document, so the label says 'No Document'. When a document
is loaded this label will reflect your document's name.
Sometimes it is useful to make one of these panels, usually the concordance panel, larger. All
the divisions between panels are movable. Rearrange them to suit your data by dragging the
2 de 7 24-03-2007 10:56
Yoshikoder 0.36 http://people.iq.harvard.edu/~wlowe/code/yk/userdocs/YoshikoderUse...
First we'll make a new dictionary. On the menu bar click on File-New-Dictionary, and enter a
new name e.g. 'Foo'.
A dictionary is not much use without categories: Now click on File-New-Category, to add a
content category. You will be presented with a dialog box that looks like this:
You must choose a name for the new category and type it into the upper box at the prompt. It is
also possible, though not required, to associate this new category with a numerical score. The
purpose of the score is to place the category on a numerical scale. Every piece of text that falls
into this category will then be associated with this number. We shall see more of this feature
later when we come to the reporting functions. For the moment, you might leave the score
After you press 'OK', the dictionary panel changes to show your new category as a leaf node.
Feel free to add some more categories to the dictionary. You can add categories to the
dictionary root node, or to existing categories to make sub-categories, and sub-sub-categories.
Constantly pulling down menus to create new categories an be tiresome, so you can use two
other methods. Right-clicking (on Windows) or control-clicking on Mac OSX) whilst inside the
dictionary panel will launch a popup menu you can use instead of the main menu. Alternatively
you can use the toolbar buttons loated just below the menubar. Pressing the first button will
launch the new category dialog.
Categories in your dictionary represent the conceptual structure of your content domain, but
they are not yet connected to text. To connect a category to some text, we need to define
patterns. To create a pattern, select a category node (not the root node because that is the
dictionary itself), and click on File-New-Pattern. Alternatively, you can simply press the second
button on the toolbar. You will be presented with a dialog that looks like this:
3 de 7 24-03-2007 10:56
Yoshikoder 0.36 http://people.iq.harvard.edu/~wlowe/code/yk/userdocs/YoshikoderUse...
Now type in a word you think indicates the presence in text of the concept named by your
category. For example, if your category was called 'Positive Emotions' then 'love' might be a
suitable pattern. Feel free to add more patterns to any of the patterns in the dictionary.
Sometimes Yoshikoder will complain that it 'cannot compile' what you have typed into the
pattern dialog. This is because it is expecting a 'regular expression', rather than just a simple
string. Regular expressions are elements of a linguistic pattern-matching language that allows
you to specify more than one word or phrase in a single pattern, rather like the Windows
'wildcard' characters, except much more powerful. Regular expressions are well worth taking
the time to learn about, and you can read about how to use regular expression below. For now,
to avoid any 'cannot compile' problems, you can just avoid putting punctuation in your patterns.
You can create patterns in uppercase or lowercase and get the same matches in text: Yoshikoder
will treat "Content", "CONTENT", and "content" as matching all the same words in a
Now that there's some structure to your dictionary, we can load a document and analyse it
according to your new category system.
Loading a Document
From the menu bar click on Document-Open Document and pick a document from your files.
This document must be a text file. Yoshikoder cannot read proprietary document formats such
as Microsoft Word. However most software, including MS Word, can save your document as
After you've chosen a file an encoding window will appear so you can make sure it is in the
right form to be worked with. Below is an encoding dialog for a document containing a short
poem in Russian.
4 de 7 24-03-2007 10:56
Yoshikoder 0.36 http://people.iq.harvard.edu/~wlowe/code/yk/userdocs/YoshikoderUse...
On the left you see the first section of the chosen document. On the right are settings for the
document encoding and a choice of fonts to display it.
For english language documents, it it will seldom matter which encoding you choose on the
right hand list. However, if you are dealing with other languages you may at first see nonsense
characters in the preview screen. This is because Yoshikoder is set to expect the wrong
document encoding. You can correct this by clicking on the name of the correct encoding. You
can work through several possible encodings until the text looks right. Then press 'OK'.
Yoshikoder will now expect future documents to be in your chosen encoding.
The font list is present because not all fonts can display all characters. For example not many
fonts can represent all of Simplified Chinese, and some even have trouble with German and
There is longer discussion of document encodings in the sections below. For now we shall
assume that you are successfully viewing your document in the document window.
Now that you have a dictionary and document, you can see where your patterns occur. Select a
pattern and click on View-Highlight. You can also press the magnifying glass button on the
toolbar. Everywhere the pattern matches a piece of text in he document will be highlighted
yellow in the document.
If you select a category to highlight, all the patterns that appear underneath that category node
in the dictionary panel will be highlighted, including those patterns in subcategories.
5 de 7 24-03-2007 10:56
Yoshikoder 0.36 http://people.iq.harvard.edu/~wlowe/code/yk/userdocs/YoshikoderUse...
It is often useful when constructing a dictionary to be able to see the local context of particular
patterns. This is often called keyword-in-context. The list of patterns and theor contexts is
called a concordance. To examine a concordance for a pattern, select that pattern and click on
View-Concordance. You can also press the rightmost button on the toolbar.
The concordance panel will then fill up with all the contexts of that pattern. Yoshikoder looks a
fixed number of characters either side of the pattern to create a concordance. You can change
this amount in in the preferences panel by clicking View-Preferences.
If you select a category and create a concordance, you will get a concordance for all the patterns
beneath that category, including those in subcategories.
6 de 7 24-03-2007 10:56
Yoshikoder 0.36 http://people.iq.harvard.edu/~wlowe/code/yk/userdocs/YoshikoderUse...
7 de 7 24-03-2007 10:56