How to abbreviate long names in a dataframe for R?

问题

I'm working with a dataframe that has really long names that is more than 25 characters. I'm trying to make a bar graph (with plotly) with all of these organizations name, but the names get cut off because they're super long. I've already tried to the margins like the following:

plot_ly(x = number, y = org_name, type = 'bar') %>% 
layout(margin = list(l = 150))

It works but the bar graph doesn't look nice so the alternative I'm trying to do is abbreviate any organization's name that are longer than 25 characters. However, I'm having a hard time doing so. One way I tried to abbreviate it is to create a new column called abbrv, use substring to get the first 25 characters of the organization name and then do "...", and then put it in the column. While for the organization's name that isn't greater than 25, I would just put an NA in the abbrv column like the following:

for(i in dataframe.name$org_name){
 if(nchar(i) > 25){
 dataframe.name$abbrv <- paste0(substring(i, 0, 25), "...")
 }
 else{
  dataframe.name$abbrv <- "NA"
}

The only thing with this way is now that I have the abbrv column (if it works), how will I make sure that plotly displays the abbrv column if the organization name is greater than 25 characters and if it doesn't then it displays the normal organization name.

Anyways, I talked enough about that, but that was one approach I tried to do, but it doesn't quite work since the abbrv column puts "NA" for ALL of the rows in the column, no matter how long the organization's names are. Another approach I was trying to do is use the replace function such as:

for(i in dataframe.name$org_name){
 if(nchar(i) > 25){
   dataframe.name[i].replace(
     to_replace=i,
     value= abbreviate(i)
   )
}

But I get errors for that one as well. At this point, I'm not even sure what to do and how to abbreviate the long names in my dataframe? I'm really lost and confused on what to do and how to exactly abbreviate the long names. If anyone can help me out, that'll be great! Thanks.

*******Edit*******

So now I'm using this code:

for(i in 1:nrow(dfname)){
 if(nchar(dfname$orgname[i]) > 25){
   dfname$abbrv.column <- substring(dfname$orgname[i], 0, 25)
 }  
 else{
   dfname$abbrv.column <- dfname$orgname
 }
}

This isn't quite working though because all of the entries are the same organization name

回答1:

dataframe.name$abbr is a vector of all abbreviations in the dataframe, not just a single name.

It is the reason all entries in dataframe.name$abbr are being set to NA; the last name is in the dataframe is 25 characters or less, so all entries in dataframe.name$abbr are assigned NA.

@brettljausn has a decent suggestion: just do away with the NAs completely and only truncate where the character count exceeds 25.

Something like this should work a treat:

dataframe.name$abbrv <- substring( dataframe.name$org_name, 0, 25 )

I would try to use abbreviate first though:

dataframe.name$abbrv <- abbreviate( dataframe.name$org_name )

来源：https://stackoverflow.com/questions/47568080/how-to-abbreviate-long-names-in-a-dataframe-for-r

标签

dataframe

abbreviation