How to abbreviate long names in a dataframe for R?

烈酒焚心 提交于 2019-12-18 08:58:35

问题


I'm working with a dataframe that has really long names that is more than 25 characters. I'm trying to make a bar graph (with plotly) with all of these organizations name, but the names get cut off because they're super long. I've already tried to the margins like the following:

plot_ly(x = number, y = org_name, type = 'bar') %>% 
layout(margin = list(l = 150))

It works but the bar graph doesn't look nice so the alternative I'm trying to do is abbreviate any organization's name that are longer than 25 characters. However, I'm having a hard time doing so. One way I tried to abbreviate it is to create a new column called abbrv, use substring to get the first 25 characters of the organization name and then do "...", and then put it in the column. While for the organization's name that isn't greater than 25, I would just put an NA in the abbrv column like the following:

for(i in dataframe.name$org_name){
 if(nchar(i) > 25){
 dataframe.name$abbrv <- paste0(substring(i, 0, 25), "...")
 }
 else{
  dataframe.name$abbrv <- "NA"
}

The only thing with this way is now that I have the abbrv column (if it works), how will I make sure that plotly displays the abbrv column if the organization name is greater than 25 characters and if it doesn't then it displays the normal organization name.

Anyways, I talked enough about that, but that was one approach I tried to do, but it doesn't quite work since the abbrv column puts "NA" for ALL of the rows in the column, no matter how long the organization's names are. Another approach I was trying to do is use the replace function such as:

for(i in dataframe.name$org_name){
 if(nchar(i) > 25){
   dataframe.name[i].replace(
     to_replace=i,
     value= abbreviate(i)
   )
}

But I get errors for that one as well. At this point, I'm not even sure what to do and how to abbreviate the long names in my dataframe? I'm really lost and confused on what to do and how to exactly abbreviate the long names. If anyone can help me out, that'll be great! Thanks.

*******Edit*******

So now I'm using this code:

for(i in 1:nrow(dfname)){
 if(nchar(dfname$orgname[i]) > 25){
   dfname$abbrv.column <- substring(dfname$orgname[i], 0, 25)
 }  
 else{
   dfname$abbrv.column <- dfname$orgname
 }
}

This isn't quite working though because all of the entries are the same organization name


回答1:


dataframe.name$abbr is a vector of all abbreviations in the dataframe, not just a single name.

It is the reason all entries in dataframe.name$abbr are being set to NA; the last name is in the dataframe is 25 characters or less, so all entries in dataframe.name$abbr are assigned NA.

@brettljausn has a decent suggestion: just do away with the NAs completely and only truncate where the character count exceeds 25.

Something like this should work a treat:

dataframe.name$abbrv <- substring( dataframe.name$org_name, 0, 25 )

I would try to use abbreviate first though:

dataframe.name$abbrv <- abbreviate( dataframe.name$org_name )


来源:https://stackoverflow.com/questions/47568080/how-to-abbreviate-long-names-in-a-dataframe-for-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!