Story

Reply to comment

Yes, there are some: there

Yes, there are some: there is igerman98_all.xml.bz2 - German lemma list in XML format based on ispell word list from Niels Ott's BananaSplit.
Or, you could generate the list from your text with the TreeTagger tool. For each text token it assigns a tag, that tells us this is a noun, this is an articel.

But the problem with the dictionaries is: Most dictionaries include also the compound words, while we don't want them in our dictionary. The splitter needs the words in its very basic form. If you have compound words in your dictionary, the splitter doesn't break them up further ...

So I decided to create myself a list. It was easy. I started from the 500 most used search terms on my website. And then i splitted them manually. It was easy and did not take long.

Reply

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <i> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <p> <h2> <h3> <h4> <b> <object> <param> <embed> <img> <hr> <table> <tr> <td> <blockquote> <small> <hr>
  • Lines and paragraphs break automatically.

More information about formatting options

By submitting this form, you accept the Mollom privacy policy.

More stories

Für alle Freunde von historischen Veranstaltungen in der Garderobe des 18....
Der zweite BAROCKTANZ-KALENDER erscheint Ende November Das besondere...
La mise en images d'une trentaine de danses offre à tous les passionnés la...