Bookmark
An Efficient Way to Extract the Main Topics from a Sentence | The Tokenizer
thetokenizer.com/2013/05/09/efficient-way-to-extract-the-main-topics-of-a-sentence/, posted 2013 by peter in language nlp python toread
Last week, while working on new features for our product, I had to find a quick and efficient way to extract the main topics/objects from a sentence. Since I’m using Python, I initially thought that it’s going to be a very easy task to achieve with NLTK. However, when I tried its default tools (POS tagger, Parser…), I indeed got quite accurate results, but performance was pretty bad. So I had to find a better way. Like I did in my previous post, I’ll start with the bottom line – Here you can find my code for extracting the main topics/noun phrases from a given sentence. It works fine with real sentences (from a blog/news article). It’s a bit less accurate compared to the default NLTK tools, but it works much faster!
Bookmark
translate.google.com/toolkit, posted 2013 by peter in conversion free language nlp online
Google Translator Toolkit is a powerful and easy-to-use editor that helps translators work faster and better.
Bookmark
Delver - a natural language interface to your app
delver.io/, posted 2013 by peter in development language nlp software toread
Down in the depths of your organisation, you have a treasure-trove of valuable data. But how hard is it for your users to retrieve it? Salvage your data with a natural language interface - ask your app English questions, get clear answers and reports back.
Bookmark
High Scalability - High Scalability - DuckDuckGo Architecture - 1 Million Deep Searches a Day and Growing
highscalability.com/blog/2013/1/28/duckduckgo-architecture-1-million-deep-searches-a-day-and-gr.html, posted 2013 by peter in development nlp scalability search
This is an interview with Gabriel Weinberg, founder of Duck Duck Go and general all around startup guru, on what DDG’s architecture looks like in 2012.
Bookmark
BBC News - Phone call translator app to be offered by NTT Docomo
www.bbc.co.uk/news/technology-20004210, posted 2012 by peter in japan language mobile nlp voice
An app offering real-time translations is to allow people in Japan to speak to foreigners over the phone with both parties using their native tongue.
NTT Docomo - the country's biggest mobile network - will initially convert Japanese to English, Mandarin and Korean, with other languages to follow.
Even though the translations are bound to be hilariously bad sometimes, this may still be useful in some situations.
Bookmark
Is Writing Style Sufficient to Deanonymize Material Posted Online? « 33 Bits of Entropy
33bits.org/2012/02/20/is-writing-style-sufficient-to-deanonymize-material-posted-online/, posted 2012 by peter in language nlp privacy science
So what exactly did we achieve? Our research has dramatically increased the number of authors that can be distinguished using writing-style analysis: from about 300 to 100,000. More importantly, the accuracy of our algorithms drops off gently as the number of authors increases, so we can be confident that they will continue to perform well as we scale the problem even further. Our work is therefore the first time that stylometry has been shown to have to have serious implications for online anonymity.
Bookmark
Pattern | CLiPS
www.clips.ua.ac.be/pages/pattern, posted 2011 by peter in development free nlp python software
Pattern is a web mining module for the Python programming language.
It bundles tools for data retrieval (Google + Twitter + Wikipedia API, web spider, HTML DOM parser), text analysis (rule-based shallow parser, WordNet interface, syntactical + semantical n-gram search algorithm, tf-idf + cosine similarity + LSA metrics) and data visualization (graph networks).
The module is bundled with 30+ example scripts.
Bookmark
The Easy Way to Extract Useful Text from Arbitrary HTML - AI Depot
ai-depot.com/articles/the-easy-way-to-extract-useful-text-from-arbitrary-html/, posted 2011 by peter in ai development nlp python scraping
This article shows you how to write a relatively simple script to extract text paragraphs from large chunks of HTML code, without knowing its structure or the tags used. It works on news articles and blogs pages with worthwhile text content, among others…
Bookmark
Python Package Index : jellyfish 0.1.2
pypi.python.org/pypi/jellyfish/0.1.2, posted 2010 by peter in development free language math nlp python
Jellyfish is a python library for doing approximate and phonetic matching of strings.
...
String comparison: * Levenshtein Distance * Damerau-Levenshtein Distance * Jaro Distance * Jaro-Winkler Distance * Match Rating Approach Comparison * Hamming Distance
Phonetic encoding:
* American Soundex * Metaphone * NYSIIS (New York State Identification and Intelligence System) * Match Rating Codex
Bookmark
Natural Language Toolkit
www.nltk.org/, posted 2010 by peter in ai development free language nlp python software
Open source Python modules, linguistic data and documentation for research and development in natural language processing and text analytics, with distributions for Windows, Mac OSX and Linux.
|< First < Previous 11–20 (46) Next > Last >|