This is the second volume of the series "Usage-Based Linguistic Informatics", a product of the 21st century COE program held at Tokyo University of Foreign Studies (TUFS). The project has an objective to realize an integration of theoretical and applied linguistics on the basis of computer sciences. With a view to practically applying the results of linguistic analysis to language education, the promotion of individual language research has become a high-priority issue. A new field of linguistic research is intended to be developed by elucidating the state of linguistic usage based on the analysis of large amounts of linguistic data. The volume, thus, consists mainly of language-specific corpus-based analyses on sentence structures in ten different languages such as Nuuchahnulth, Korean, Chinese, Malay, Turkish, Arabic, Russian, French, English and Spanish. It also includes papers that deal with various theoretical issues in contrastive linguistics and typology.
UBLI has conducted field surveys since 2002 and built spoken language corpora for French, Spanish, Italian (Salentino dialect), Russian, Malaysian, Turkish, Japanese, and Canadian multilinguals. This volume features new research presented at the UBLI second workshop on Corpus Linguistics Research Domain, which was held on September 14, 2006. The first part consisting of eleven presentations to this workshop shows a wide range of subjects within the area of corpus-based research, such as dictionary, linguistic atlas, dialect, translation, ancient texts, non-standard texts, sociolinguistics, second language acquisition, and natural language processing. The second part of this volume comprises ten additional contributions to both written and spoken corpora by the members and research assistants of UBLI.
The papers published in this volume were originally presented at the Third North American Symposium on Corpus Linguistics and Language Teaching held on 23-25 March 2001 at the Park Plaza Hotel in Boston, Massachusetts. Each paper analyses some aspect of language use or structure in one or more of the many linguistic corpora now available. The number of different corpora investigated in the book is a real testament to the progress that has been made in recent years in developing new corpora, particularly spoken corpora, as over half of the papers deal either wholly or partially with the analysis of spoken data. This book will be of particular interest to undergraduate and graduate students and scholars interested in corpus, socio and applied linguistics, discourse analysis, pragmatics, and language teaching.
Language Arts & Disciplines by Stefan Thomas Gries
The volume adopts the methodological perspective of Corpus Linguistics, the rapidly evolving branch of linguistics based on the computerized analysis of language used in authentic settings. The chapters provide new data, methods, and insights to many classic topics of Cognitive Linguistics and pave the way for further integration of usage-based techniques of analysis within this exciting paradigm.
This book is organized in three sections. The three articles in Section one introduce the disciplines of Contrastive Linguistics (CL) and Translation Studies (TS), tracing their evolution in recent history and outlining the role played by the computer corpus in revitalising and redirecting research in each discipline. The six articles in Section two are a series of case studies, showing the range of variables that have to be taken into consideration in CL and TS. The four articles in Section three all deal with practical issues of corpus exploitation, both the software tools that can be used to support analysis and the ways in which multilingual and monolingual corpora can be used to improve teaching and translation materials. -- pref.
This work explores recent trends in cross-linguistic lexical studies. Topics include: lexis and contrastive linguistics; the revival of contrastive linguistics; multilingual corpora; theoretical and methodological issues; and types of cross-linguistic correspondence.
Grammar, Comparative and general by Wolfgang Viereck
This book describes an approach to lexis and grammar based on the concept of phraseology and of language patterning arising from work on large corpora. The notion of 'pattern' as a systematic way of dealing with the interface between lexis and grammar was used in Collins Cobuild English Dictionary (1995) and in the two books in the Collins Cobuild Grammar Patterns series (1996; 1998). This volume describes the research that led to these publications, and explores the theoretical and practical implications of the research. The first chapter sets the work in the context of work on phraseology. The next two chapters give several examples of patterns and how they are identified. Chapters 4 and 5 discuss and exemplify the association of pattern and meaning. Chapters 6, 7 and 8 relate the concept of pattern to traditional approaches to grammar and to discourse. Chapter 9 summarizes the book and adds to the theoretical discussion, as well as indicating the applications of this approach to language teaching. The volume is intended to contribute to the current debate concerning how corpora challenge existing linguistic theories, and as such will be of interest to researchers in the fields of grammar, lexis, discourse and corpus linguistics. It is written in an accessible style, however, and will be equally suitable for students taking courses in those areas.
Corpus-based methods will be found at the heart of many language and speech processing systems. This book provides an in-depth introduction to these technologies through chapters describing basic statistical modeling techniques for language and speech, the use of Hidden Markov Models in continuous speech recognition, the development of dialogue systems, part-of-speech tagging and partial parsing, data-oriented parsing and n-gram language modeling. The book attempts to give both a clear overview of the main technologies used in language and speech processing, along with sufficient mathematics to understand the underlying principles. There is also an extensive bibliography to enable topics of interest to be pursued further. Overall, we believe that the book will give newcomers a solid introduction to the field and it will give existing practitioners a concise review of the principal technologies used in state-of-the-art language and speech processing systems. Corpus-Based Methods in Language and Speech Processing is an initiative of ELSNET, the European Network in Language and Speech. In its activities, ELSNET attaches great importance to the integration of language and speech, both in research and in education. The need for and the potential of this integration are well demonstrated by this publication.