Bookmark

Advanced NLP with spaCy · A free online course

https://course.spacy.io/en/, posted Sep '23 by peter in development free language learning nlp toread

spaCy is a modern Python library for industrial-strength Natural Language Processing. In this free and interactive online course, you'll learn how to use spaCy to build advanced natural language understanding systems, using both rule-based and machine learning approaches.

Bookmark

Damn Cool Algorithms: Levenshtein Automata

blog.notdot.net/2010/07/Damn-Cool-Algorithms-Levenshtein-Automata, posted 2022 by peter in development nlp reference text toread

The basic insight behind Levenshtein automata is that it's possible to construct a Finite state automaton that recognizes exactly the set of strings within a given Levenshtein distance of a target word. We can then feed in any word, and the automaton will accept or reject it based on whether the Levenshtein distance to the target word is at most the distance specified when we constructed the automaton. Further, due to the nature of FSAs, it will do so in O(n) time with the length of the string being tested. Compare this to the standard Dynamic Programming Levenshtein algorithm, which takes O(mn) time, where m and n are the lengths of the two input words! It's thus immediately apparrent that Levenshtein automaton provide, at a minimum, a faster way for us to check many words against a single target word and maximum distance - not a bad improvement to start with!

Of course, if that were the only benefit of Levenshtein automata, this would be a short article. There's much more to come, but first let's see what a Levenshtein automaton looks like, and how we can build one.

Bookmark

A list of free data matching and record linkage software

https://github.com/J535D165/data-matching-software, posted 2021 by peter in development free list nlp opensource software

This is a list of (Fuzzy) Data Matching software. The software in this list is open source and/or freely available.

The term data matching is used to indicate the procedure of bringing together information from two or more records that are believed to belong to the same entity. Data matching has two applications: (1) to match data across multiple datasets (linkage) and (2) to match data within a dataset (deduplication). See the Wikipedia page about data matching for more information.

Similar terms: record linkage, data matching, deduplication, fuzzy matching, entity resolution

Bookmark

How to calculate the alignment between BERT and spaCy tokens effectively and robustly

https://gist.github.com/tamuhey/af6cbb44a703423556c32798e1e1b704, posted 2021 by peter in development free language nlp opensource software toread

Suppose we want to combine a BERT-based named entity recognition (NER) model with a rule-based NER model built on top of spaCy. Although BERT's NER exhibits extremely high performance, it is usually combined with rule-based approaches for practical purposes. In such cases, what often bothers us is that tokens of spaCy and BERT are different, even if the input sentences are the same. For example, let's say the input sentence is "John Johanson 's house"; BERT tokenizes this sentence like ["john", "johan", "##son", "'", "s", "house"] and spaCy tokenizes it like ["John", "Johanson", "'s", "house"]. To combine the outputs, we need to calculate the correspondence between the two different token sequences. This correspondence is the "alignment".

Bookmark

LibreTranslate: Free and Open Source Machine Translation API

https://github.com/uav4geo/LibreTranslate, posted 2021 by peter in api free language nlp opensource software

Free and Open Source Machine Translation API, entirely self-hosted. Unlike other APIs, it doesn't rely on proprietary providers such as Google or Azure to perform translations.

Bookmark

EleutherAI - GPT-Neo

https://www.eleuther.ai/gpt-neo, posted 2021 by peter in ai free nlp opensource

GPT-Neo is the code name for a series of transformer-based language models loosely styled around the GPT architecture that we plan to train and open source. Our primary goal is to replicate a GPT-3 sized model and open source it to the public, for free.

Bookmark

Apache Tika

https://tika.apache.org/, posted 2020 by peter in free language nlp opensource search software

The Apache Tika™ toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF). All of these file types can be parsed through a single interface, making Tika useful for search engine indexing, content analysis, translation, and much more.

Bookmark

Tone Analyzer

https://tone-analyzer-demo.mybluemix.net/, posted 2016 by peter in ai demo language nlp online text writing

This service uses linguistic analysis to detect and interpret emotions, social tendencies, and language style cues found in text.

Bookmark

wooorm/franc

https://github.com/wooorm/franc, posted 2014 by peter in development free language nlp opensource python software

Detect the language of text.

Bookmark

TextBlob: Simplified Text Processing â€” TextBlob 0.5.0 documentation

https://textblob.readthedocs.org/en/latest/, posted 2013 by peter in development free language nlp python software toread

TextBlob is a Python (2 and 3) library for processing textual data. It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, translation, and more.

1–10 (46) Next > Last >|

Advanced NLP with spaCy · A free online course

Damn Cool Algorithms: Levenshtein Automata

A list of free data matching and record linkage software

How to calculate the alignment between BERT and spaCy tokens effectively and robustly

LibreTranslate: Free and Open Source Machine Translation API

EleutherAI - GPT-Neo

Apache Tika

Tone Analyzer

wooorm/franc

TextBlob: Simplified Text Processing â€” TextBlob 0.5.0 documentation

Hello,

More Sites and Experiments