Suppose we want to combine a BERT-based named entity recognition (NER) model with a rule-based NER model built on top of spaCy. Although BERT's NER exhibits extremely high performance, it is usually combined with rule-based approaches for practical purposes. In such cases, what often bothers us is that tokens of spaCy and BERT are different, even if the input sentences are the same. For example, let's say the input sentence is "John Johanson 's house"; BERT tokenizes this sentence like ["john", "johan", "##son", "'", "s", "house"] and spaCy tokenizes it like ["John", "Johanson", "'s", "house"]. To combine the outputs, we need to calculate the correspondence between the two different token sequences. This correspondence is the "alignment".

Free and Open Source Machine Translation API, entirely self-hosted. Unlike other APIs, it doesn't rely on proprietary providers such as Google or Azure to perform translations.

GPT-Neo is the code name for a series of transformer-based language models loosely styled around the GPT architecture that we plan to train and open source. Our primary goal is to replicate a GPT-3 sized model and open source it to the public, for free.

The Apache Tika™ toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF). All of these file types can be parsed through a single interface, making Tika useful for search engine indexing, content analysis, translation, and much more.

This service uses linguistic analysis to detect and interpret emotions, social tendencies, and language style cues found in text.

Detect the language of text.

TextBlob is a Python (2 and 3) library for processing textual data. It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, translation, and more.

Last week, while working on new features for our product, I had to find a quick and efficient way to extract the main topics/objects from a sentence. Since I’m using Python, I initially thought that it’s going to be a very easy task to achieve with NLTK. However, when I tried its default tools (POS tagger, Parser…), I indeed got quite accurate results, but performance was pretty bad. So I had to find a better way. Like I did in my previous post, I’ll start with the bottom line – Here you can find my code for extracting the main topics/noun phrases from a given sentence. It works fine with real sentences (from a blog/news article). It’s a bit less accurate compared to the default NLTK tools, but it works much faster!

Google Translator Toolkit is a powerful and easy-to-use editor that helps translators work faster and better.

Down in the depths of your organisation, you have a treasure-trove of valuable data. But how hard is it for your users to retrieve it? Salvage your data with a natural language interface - ask your app English questions, get clear answers and reports back.

1–10 (43)   Next >   Last >|