University of Szeged, Faculty of Natural Sciences and Information Technology, Institute of Informatics

www.inf.u-szeged.hu/hlt

Profile activities related to language and speech technology

Speech recognition

▪ speech recognition with hidden Markov models

▪ isolated word and continuous speech recognition

▪ dictation systems

▪ medical dictation system

▪ preprocession technologies

▪ speaker adaptation

▪ speaker normalization

▪ speech recognition using neural networks, in hybrid and tandem architectures

▪ programs for speech correction therapy (SpeechMaster)

Speech processing

▪ building language resources, reference databases

▪ collection and processing of corpora based on different attributes; wordnet building (conceptual network utilities)

▪ segmentation (recognition of paragraph, sentence and token boundaries in various European languages)

▪ recognition and labeling of proper nouns and special tokens (language-independent, with the help of regular phrases and dictionaries)

▪ part-of-speech analysis and disambiguation (exception dictionaries, adaptation of open source code systems; bi-gram and tri-gram HMM disambiguators; hybrid (rule and probability-based) and heuristic algorithms)

▪ syntactic parsing (learning algorithms, rule-based formal languages, methods based on attribute grammars)

▪ word-sense disambiguation (with heuristic and learning algorithms)

▪ information extraction (with the help of semantic frameworks, the generalizations of the semantic framework model)

In research

The Institute of Informatics does research on various fields of computing: theoretical computer science, formal languages and automata, optimalization, software maintenance, analysis, metrics, and also conducts research in computational linguistics.

The most significant part of these are carried out by members of the Artificial Intelligence Research Group, since one of the primary functions of the group is to conduct this type of research. The Language Technology Group is made by the addition of some colleagues from other departments.

The Language Technology Group came into existence in 1999 catalyzing and enhancing developments in the field, participating in Research and Development projects.

In education

▪ Computer Economist BSc. (Msc. accreditation pending)

▪ Computer Engineer BSc. (MSc. accreditation pending)

▪ Computer Program Designer BSc. and MSc.

▪ Informatics teacher BSc. and MSc. (minor)

Approximately 5-600 regular and 50-100 corresponding first-year informatics students start their BSc programs each year in the Institute.

Non-market based activities

▪ MTBA (Hungarian Telephone Speech Database)

▪ MRBA (Hungarian Reference Speech Database)

▪ co-author of Szeged Treebank: free for education and research purposes

▪ Hungarian Corpus of Proper Names (NER) – free

▪ Hungarian Wordnet BCS: will be free for education and research purposes

Major tendered and contracted work since 2005

▪ NKFP 6/074/2005 – Examination of National and Ethnic Identity by Means of Computerised Content-analysis of Narratives pertaining to Historic Events

▪ NKFP Jedlik Ányos 2007, TUDORKA7 – Development of efficient knowledge management tool with the integration of linguistic and graph theory tools, for Hungarian and Union customs institutions

▪ NKFP Jedlik Ányos 2007, TEXTREND – Automatic trend analysis based on textual sources from the Internet, with academic and economic applications

Number of staff: 50 people

Number of PhD students: 20 people