Using Machine translation, can I obtain a very compressed version of a sentence, eg. I would really like to have a delicious tasty cup of coffee would be translated to I want coffee Does any of the NLP engines provide such a functionality?
I got a few research papers that does paraphase generation and sentence compression. But is there any library which has already implemented this?
If your intention is to make your sentences brief without losing important idea from that sentences then you can do that by just extracting triplet subject-predicate-object.
Talking about tools/engine, I recommend you to use Stanford NLP. Its dependency parser output already provides subject and object(if any). But you still need to do some tuning to get desired result.
You can download Stanford NLP and learn sample usage here
I found paper related to your question. Have a look at Text Simplification using Typed Dependencies: A Comparison of the Robustness of Different Generation Strategie
Here is what i find:
A modified implementation of the model described in Clarke and Lapata, 2008, "Global Inference for Sentence Compression: An Integer Linear Programming Approach".
Paper: https://www.jair.org/media/2433/live-2433-3731-jair.pdf
Source: https://github.com/cnap/sentence-compression (written in JAVA)
Input: At the camp , the rebel troops were welcomed with a banner that read 'Welcome home' .
Output: At camp , the troops were welcomed.
Update: Sequence-to-Sequence with Attention Model for Text Summarization.
To start with try using watson NaturalLanguageUnderstanding/Alchemy libraries. Using which I was able to extract important keywords from my statements, example :
Input : Hey! I am having issues with my laptop screen
Output : laptop screen issues hardware.
not just rephrasing but using NLU you can get the following details of your input statement, like for above statement you can get details for the following categories:
Language like “en”, Entities, Concepts, Keywords like "laptop screen“ , “issues” with details like relevance, text, keyword emotion, sentiment. Categories with details like labels, relevance score. SemanticRoles with details like sentence, it's subject, action and object
Along with this you can use the tone analyzer to get the prominent tone of the statement like, fear, anger, happy, disgust etc.
following is the code sample for watson libraries note : waston libs are not free but gives one month trial, so you can start with this and then once you get hold of the concepts then switch to other open source libraries and figure out similar libraries and functions
NaturalLanguageUnderstanding service = new NaturalLanguageUnderstanding(
NaturalLanguageUnderstanding.VERSION_DATE_2017_02_27,
WatsonConfiguration.getAlchemyUserName(),
WatsonConfiguration.getAlchemyPassword());
//ConceptsOptions
ConceptsOptions conceptOptions = new ConceptsOptions.Builder()
.limit(10)
.build();
//CategoriesOptions
CategoriesOptions categoriesOptions = new CategoriesOptions();
//SemanticOptions
SemanticRolesOptions semanticRoleOptions = new SemanticRolesOptions.Builder()
.entities(true)
.keywords(true)
.limit(10)
.build();
EntitiesOptions entitiesOptions = new EntitiesOptions.Builder()
.emotion(true)
.sentiment(true)
.limit(10)
.build();
KeywordsOptions keywordsOptions = new KeywordsOptions.Builder()
.emotion(true)
.sentiment(true)
.limit(10)
.build();
Features features = new Features.Builder()
.entities(entitiesOptions)
.keywords(keywordsOptions)
.concepts(conceptOptions)
.categories(categoriesOptions)
.semanticRoles(semanticRoleOptions)
.build();
AnalyzeOptions parameters = new AnalyzeOptions.Builder()
.text(inputText)
.features(features)
.build();
AnalysisResults response = service
.analyze(parameters)
.execute();
System.out.println(response);
You can use a combination of "stop word removal" and "Stemming and lemmatization". Stemming and lemmatization is a process that returns all the words in the text to their basic root, you can find the full explanation here ,I am using Porter stemmer look it up in google. After the Stemming and lemmatization, stop words removal is very easy here is my stop removal method :
public static String[] stopwords ={"a", "about", "above", "across", "after", "afterwards", "again", "against", "all", "almost",
"alone", "along", "already", "also","although","always","am","among", "amongst", "amoungst", "amount", "an", "and",
"another", "any","anyhow","anyone","anything","anyway", "anywhere", "are", "around", "as", "at", "back","be","became",
"because","become","becomes", "becoming", "been", "before", "beforehand", "behind", "being", "below", "beside", "besides",
"between", "beyond", "bill", "both", "bottom","but", "by", "call", "can", "cannot", "cant", "co", "con", "could", "couldnt",
"cry", "de", "describe", "detail", "do", "done", "down", "due", "during", "each", "eg", "eight", "either", "eleven","else",
"elsewhere", "empty", "enough", "etc", "even", "ever", "every", "everyone", "everything", "everywhere", "except", "few",
"fifteen", "fify", "fill", "find", "fire", "first", "five", "for", "former", "formerly", "forty", "found", "four", "from",
"front", "full", "further", "get", "give", "go", "had", "has", "hasnt",
"have", "he", "hence", "her", "here", "hereafter", "hereby", "herein", "hereupon", "hers", "herself",
"him", "himself", "his", "how", "however", "hundred", "ie", "if", "in", "inc", "indeed", "interest", "into",
"is", "it", "its", "itself", "keep", "last", "latter", "latterly", "least", "less", "ltd", "made", "many",
"may", "me", "meanwhile", "might", "mill", "mine", "more", "moreover", "most", "mostly", "move", "much", "must",
"my", "myself", "name", "namely", "neither", "never", "nevertheless", "next", "nine", "no", "nobody", "none",
"noone", "nor", "not", "nothing", "now", "nowhere", "of", "off", "often", "on", "once", "one", "only", "onto",
"or", "other", "others", "otherwise", "our", "ours", "ourselves", "out", "over", "own","part", "per", "perhaps",
"please", "put", "rather", "re", "same", "see", "seem", "seemed", "seeming", "seems", "serious", "several", "she",
"should", "show", "side", "since", "sincere", "six", "sixty", "so", "some", "somehow", "someone", "something",
"sometime", "sometimes", "somewhere", "still", "such", "system", "take", "ten", "than", "that", "the", "their",
"them", "themselves", "then", "thence", "there", "thereafter", "thereby", "therefore", "therein", "thereupon",
"these", "they", "thickv", "thin", "third", "this", "those", "though", "three", "through", "throughout", "thru",
"thus", "to", "together", "too", "top", "toward", "towards", "twelve", "twenty", "two", "un", "under", "until",
"up", "upon", "us", "very", "via", "was", "we", "well", "were", "what", "whatever", "when", "whence", "whenever",
"where", "whereafter", "whereas", "whereby", "wherein", "whereupon", "wherever", "whether", "which", "while",
"whither", "who", "whoever", "whole", "whom", "whose", "why", "will", "with", "within", "without", "would", "yet",
"you", "your", "yours", "yourself", "yourselves","1","2","3","4","5","6","7","8","9","10","1.","2.","3.","4.","5.","6.","11",
"7.","8.","9.","12","13","14","A","B","C","D","E","F","G","H","I","J","K","L","M","N","O","P","Q","R","S","T","U","V","W","X","Y","Z",
"terms","CONDITIONS","conditions","values","interested.","care","sure","!","@","#","$","%","^","&","*","(",")","{","}","[","]",":",";",",","<",">","/","?","_","-","+","=",
"a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z",
"contact","grounds","buyers","tried","said,","plan","value","principle.","forces","sent:","is,","was","like",
"discussion","tmus","diffrent.","layout","area.","thanks","thankyou","hello","bye","rise","fell","fall","psqft.","http://","km","miles"};
In my project I used paragraph as my text input:
public static String removeStopWords(String paragraph) throws IOException{
Scanner paragraph1=new Scanner( paragraph );
String newtext="";
Map map = new TreeMap();
Integer ONE = new Integer(1);
while(paragraph1.hasNext()) {
int flag=1;
fixString=paragraph1.next();
fixString=fixString.toLowerCase();
for(int i=0;i<stopwords.length;i++) {
if(fixString.equals(stopwords[i])) {
flag=0;
}
}
if(flag!=0) {
newtext=newtext+fixString+" ";
}
if (fixString.length() > 0) {
Integer frequency = (Integer) map.get(fixString);
if (frequency == null) {
frequency = ONE;
} else {
int value = frequency.intValue();
frequency = new Integer(value + 1);
}
map.put(fixString, frequency);
}
}
return newtext;
}
I have used Stanford NLP library you can download if from here. I hope that I have helped you in some way.
来源:https://stackoverflow.com/questions/7857648/sentence-compression-using-nlp