linguistics

Python NLP British English vs American English

我的梦境 提交于 2021-02-04 13:48:26
问题 I'm currently working on NLP in python. However, in my corpus, there are both British and American English(realize/realise) I'm thinking to convert British to American. However, I did not find a good tool/package to do that. Any suggestions? 回答1: I've not been able to find a package either, but try this: (Note that I've had to trim the us2gb dictionary substantially for it to fit within the Stack Overflow character limit - you'll have to rebuild this yourself). # Based on Shengy's code: #

Extracting Related Date and Location from a sentence

a 夏天 提交于 2020-08-10 23:00:31
问题 I'm working with written text (paragraphs of articles and books) that includes both locations and dates. I want to extract from the texts pairs that contain locations and dates that are associated with one another. For example, given the following phrase: The man left Amsterdam on January and reached Nepal on October 21st I would have an output such as this: >>>[(Amsterdam, January), (Nepal, October 21st)] I tried splitting the text through "connecting words" (such as "and" for example) and

Extracting Related Date and Location from a sentence

▼魔方 西西 提交于 2020-08-10 22:58:52
问题 I'm working with written text (paragraphs of articles and books) that includes both locations and dates. I want to extract from the texts pairs that contain locations and dates that are associated with one another. For example, given the following phrase: The man left Amsterdam on January and reached Nepal on October 21st I would have an output such as this: >>>[(Amsterdam, January), (Nepal, October 21st)] I tried splitting the text through "connecting words" (such as "and" for example) and

stanford corenlp not working

北战南征 提交于 2020-01-22 15:13:45
问题 I'm using Windows 8, and running python in eclipse with pyDev. I installed Stanford coreNLP (python version) from the site: https://github.com/relwell/stanford-corenlp-python When I try to import corenlp, I get the following error message. Traceback (most recent call last): File "C:\Users\Ghantauke\workspace\PythonTest2\test.py", line 1, in <module> import corenlp File "C:\Python27\lib\site-packages\corenlp\__init__.py", line 13, in <module> from corenlp import StanfordCoreNLP, ParserError,

stanford corenlp not working

▼魔方 西西 提交于 2020-01-22 15:12:27
问题 I'm using Windows 8, and running python in eclipse with pyDev. I installed Stanford coreNLP (python version) from the site: https://github.com/relwell/stanford-corenlp-python When I try to import corenlp, I get the following error message. Traceback (most recent call last): File "C:\Users\Ghantauke\workspace\PythonTest2\test.py", line 1, in <module> import corenlp File "C:\Python27\lib\site-packages\corenlp\__init__.py", line 13, in <module> from corenlp import StanfordCoreNLP, ParserError,

Duplicate elimination of similar company names

自古美人都是妖i 提交于 2020-01-15 03:28:14
问题 I have a table with company names. There are many duplicates because of human input errors. There are different perceptions if the subdivision should be included, typos, etc. I want all these duplicates to be marked as one company "1c": +------------------+ | company | +------------------+ | 1c | | 1c company | | 1c game studios | | 1c wireless | | 1c-avalon | | 1c-softclub | | 1c: maddox games | | 1c:inoco | | 1cc games | +------------------+ I identified Levenshtein distance as a good way

Snowball Stemming: defining Regions

眉间皱痕 提交于 2020-01-03 21:09:32
问题 I'm trying to understand the snoball stemming algorithmus. The algorithmus is using two regions R1 and R2 that are definied as follows: R1 is the region after the first non-vowel following a vowel, or is the null region at the end of the word if there is no such non-vowel. R2 is the region after the first non-vowel following a vowel in R1, or is the null region at the end of the word if there is no such non-vowel. http://snowball.tartarus.org/texts/r1r2.html Examples are b e a u t i f u l |<-

Qt Linguist - set translator for application Qt *.ui files

佐手、 提交于 2020-01-03 02:40:11
问题 I wrote a tiny, simple example to change applications language after choosing a language in menu. Although connect DOES work (qDebug() prints good messages) it doesnt change a text on my QLabel. I created GUI using QtDesigner. NOTE: All of those files are in the same directory. Im using Qt5. Heres my code: * .pro: QT += core gui greaterThan(QT_MAJOR_VERSION, 4): QT += widgets TARGET = qt_pl_en TEMPLATE = app SOURCES += main.cpp\ mainwindow.cpp HEADERS += mainwindow.h FORMS += mainwindow.ui

Algorithm to take a number and output its English word

依然范特西╮ 提交于 2019-12-23 17:33:54
问题 I want to make a program in C which will ask the user to input a number and then it will print that number in English. For example: if(INPUT == 1) then print ONE if(INPUT == 2) then print TWO and so on. It can be made using switch-case and if else but it makes the code lengthy. For few numbers it's fine but if we have to write up to 100 then it will be lengthy. Is there a short algorithm or idea for this? 回答1: You can use the below, but this prints only upto thousands. I did this to solve

Tabulating characters with diacritics in R

十年热恋 提交于 2019-12-23 07:12:39
问题 I'm trying to tabulate phones (characters) occurrences in a string, but diacritics are tabulated as characters on their own. Ideally, I have a wordlist in International Phonetic Alphabet, with a fair amount of diacritics and several combinations of them with base characters. I give here a MWE with just one word, but the same goes with list of words and more types of combinations. > word <- "n̥ana" # word constituted by 4 phones: [n̥],[a],[n],[a] > table(strsplit(word, "")) ̥ a n 1 2 2 But the