Linguistics 570

HW #1a

 

Following is a set of Turkish words and their English glosses:

 

el                                              '(the) hand'

eller                                          'hands'

elim                                          'my hand'

eve                                           'to (the) house'

ellerimiz                                    'our hands'

evlerde                                     'in (the) houses'

evden                                       'from (the) house'

ellerim                                      'my hands'

ellerinize                                   'to your(pl.) hands

evlerim                         'my houses'

elin                                           'your(sing.) hand

evimiz                                       'our house'

evde                                         'in (the) house'

evimde                                     'in my house'

evlerimiz                                   'our houses'

evlerimizden                              'from our houses'

evleriniz                                    'your(pl.) houses

evim                                         'my house'

ellerimden                                 'from my hands'

evler                                         'houses'

eline                                         'to your(sing.) hand

ellerin                                       'your(sing.) hands'

elimden                         'from my hand'

evine                                        'to your(sing.) house'

 

Design an FST that will minimally accept the Turkish words shown above, and that will output the English glosses for each of the morphemes (the glosses do not need to be output in “English” order).  Be sure to design the transducer to be as efficient as possible (i.e., avoid redundant and empty arcs).  Test your FST against the following three strings and show the English output for these two strings:  ev, evlerimde, elinize

 

(Please note:  the level of granularity for the FST can be at the level of the morpheme.  In other words, you do not have to have arcs for each of the sounds that compose a morpheme.)

 

due date:  5 p.m., Friday, September 26th

Submit Hardcopy in Professor Lewis’s box in the Linguistic’s Office (Padelford A210), including the output from your test.  If you are an online student or otherwise unable to come to campus, please scan your answer and submit via CollectIt, or FAX to Professor Lewis’s attention at +1-206-685-7978.


Linguistics 570

HW #1b

 

 

For this homework assignment, you will find a website, tokenize the “words” contained on that website, and output a sorted list of the 30 most frequent words, ordered by frequency.

 

Here’s what you need to do for the assignment:

 

1.      Go to Literature.org, choose a chapter from one of the books listed there.  Whatever chapter you choose, be sure that it consists mostly of English text and has at least 3,000 words.

2.      Save the text for the page to a file.

3.      Write a program in Perl, Python or Java that reads the file, and generates output that contains a list of all the word types on the page with token counts.  The output should tab delimited and consist of a separate word type and count on each line, e.g.,

 

able      5

                        the        325

                        to         250

                        look     10

 

4.      Take the output generated, sort it, and output only the first 30 most frequent words.  The sort and truncation functions should be done in a shell script external to your application.

5.      Submit a copy of your program using CollectIt.

 

Although you are encouraged to work with other students on this assignment, and are welcome to ask for help and advice if anything is unclear, the code you turn in must be your own.

 

due date:  11:59 p.m., Sunday, September 28th

Submit the following via CollectIt (in one tar, gz, or zip file):

  1. hw1b.pl or (hw1b.java and hw1b.class) or hw1b.py
  2. A copy of the Web page you downloaded
  3. A shell script named hw1b.sh that runs your program
  4. output1b.txt (containing the output as described)
  5. comments.txt – Any comments or explanations.  Can be left empty if none.