lexical computing
We provide large high-quality word databases, lexical data, word lists and
lexicons in many languages. Our data are generated from large databases of
authentic text called text corpora. The largest corpora contain texts with a
total length of 60,000,000,000 words. Such data allow us to generate databases
of millions or even hundreds of millions of items while preserving accuracy and
reliability. Our customers are software developers, dictionary and language
teaching material publishers and anyone who needs reliable language data. The
databases we supply can be enriched with related linguistic data such as
synonyms, collocations, example sentences and morphological and statistical
information. We also provide solutions in the area of full-text search,
terminology extraction, document classification and categorization, data mining
and information retrieval. Data samples Word frequency lists: English, Spanish,
French, Arabic, Russian, Portuguese, Hindi. Bigram databases: English, Spanish,
German, Russian Lexical Computing is a research company founded by Adam
Kilgarriff in 2003. It works at the intersection of corpus and computational
linguistics and is committed to an empiricist approach to the study of language,
in which corpora play a central role: for a very wide range of linguistic
questions, if a suitable corpus is available, it will help us understand. The
flagship product of Lexical Computing is Sketch Engine, a leading corpus
management and corpus query tool used by linguists, lexicographers, translators
and publishers worldwide. Its unique feature – the Word Sketch – and its derived
functionalities together with the scalability, multilingual support and ability
to handle the largest available corpora make Sketch Engine stand out in the
crowd of corpus software. Lexical Computing is a supplier of word databases,
lexicons, n-gram databases and similar language data for use in other software
or for lexicographic projects. Data provided by Sketch Engine and services from
Lexical Computing are based on a suite of more than 650 text corpora with a size
of up to 60 billion words and covering over 90 languages.