Is there a way to count the number of characters per word for a string, returning values separated by a comma?

﹥>﹥吖頭↗ 提交于 2020-01-23 01:35:09

问题


I have a list of strings in cells - 1000s of them - and I need to work out the characters per word but separated by word - preferably in 1 swift formula...

For Example: 1. "Black Cup With Handle" > Formula I need > 5,3,4,6

  1. "Giant Bear Statue" > Formula I need > 5,4,6

I need this for a recurring task which has been macro'd in a very inefficient way to count words into columns (of which we need to use up to 20 for the just encase) but this needs to be tackled.

Usually, we count the spaces and layer nested serach() formulas to piggyback onto one and other to break down the structure then character counts the individual words...

I could alternatively the macro to substitute the spaces for commas and used text to columns but that still leaves me with a prolonged counting process for what im looking for

we obviously use =LEN(A1)-LEN(SUBSTITUTE(A1," ","")) to count the spaces in the word

we currently then use =SEACRH() function combined with =MID() functions (and some bizarre numbers) to reveal each word into its own individual cell

then =LEN once again bu on all individual words - very long-winded

Im hoping to find a shorter way to do this but feeling there may not be a dynamic enough way to do it with formula alone, hoping someone can prove me wrong!


回答1:


You'll have different options depending on your Excel version.


OPTION 1: TEXTJOIN

I think you are looking for a TEXTJOIN function. Just bare in mind that you can only use this the more later versions of Excel (see link to documentation) and it could work like this:

Formula in B1:

=TEXTJOIN(",",TRUE,LEN(FILTERXML("<t><s>"&SUBSTITUTE(A1," ","</s><s>")&"</s></t>","//s")))

NOTE: It's an array formula and you need to enter it using CtrlShiftEnter

To make it so that you won't need to use the above key-combo, we can include an INDEX:

=TEXTJOIN(",",TRUE,INDEX(LEN(FILTERXML("<t><s>"&SUBSTITUTE(A1," ","</s><s>")&"</s></t>","//s")),))

Additional Information:

FILTERXML

This function takes (as per documentation) two required arguments:

  • A string in valid XML
  • A string in valid XPath

Because we want to return an array of elements (words) from the cell, we need to SUBSTITUTE the spaces for end-tags (</..>) and concatenate that with a start-tag (<..>) at the start of the string and another end-tag at the end.

I'll have to rely on an XML explaination on the tags as to why <?><?> works and it's meaning, because as far as my testing goes I could swap the letters around or replace by another letter with the same results as long as the final Xpath would resemble the same character. It would be great if someone would be able to complement this answer with a better explanation on this matter.


TEXTJOIN

If you are a Office 365 subscriber or own Excel 2019 you can make use of this function. There are (as per documentation) at least 3 required arguments:

  • A delimiter which must be a text string, either empty, or one or more characters enclosed by double quotes, or a reference to a valid text string. If a number is supplied, it will be treated as text.
  • The second argument can hold either TRUE or FALSE and determines whether or not you want to exclude/include empty values
  • The third argument is the text item to be joined. A text string, or array of strings, such as a range of cells.

Now this is where we can join the two functions together, FILTERXML returning an array which we can use in TEXTJOIN.


INDEX + LEN

I'll have to explain the use of these functions together. I don't think LEN and INDEX will need much of an introduction on their own, but together they work quite nicely. Natively there will be a force called implicit intersection that will prevent LEN from returning an array of values when you pass an array of values to the function, in this case through our FILTERXML.

Normally you would disable this mechanism using a key combination of: CtrlShiftEnter, better known as CSE.

Now what INDEX does is disabling this implicit intersection making LEN able to return an array, removing the need to CSE the formula. INDEX is one of the functions that has this "power". A more in depth explanation on implicit intersection can be found here


OPTION 2: UDF

Without access to TEXTJOIN I think you'll need to have a look at using an UDF, possibly looking like below:

Function TEXTJOIN(rng As Range) As String
    TEXTJOIN = Join(Application.Evaluate("LEN({""" & Join(Split(rng, " "), """,""") & """})"), ",")
End Function

You can call this in B1 like so: =TEXTJOIN(A1)


Additional Information:

The UDF consists out of three main mechanisms that work together:

JOIN

This funciton takes two parameters, where the first one is required:

  • First parameter is a one-dimensional array containing substrings
  • The second (optional) parameter is a string character used to separate the substrings in the returned string. If omitted, the space character (" ") is used. If delimiter is a zero-length string (""), all items in the list are concatenated with no delimiters.

The function returns a string value


SPLIT

This function takes a string and delimits it by a specified character/substring. It takes the following arguments:

  • 1st: A required string expression containing substrings and delimiters. If expression is a zero-length string(""), Split returns an empty array, that is, an array with no elements and no data.
  • 2nd: The optional delimiter which is a string character used to identify substring limits. If omitted, the space character (" ") is assumed to be the delimiter. If delimiter is a zero-length string, a single-element array containing the entire expression string is returned.
  • 3rd: An optional limit, a number of substrings to be returned; -1 indicates that all substrings are returned.
  • 4th: Compare, also optional, is a numeric value indicating the kind of comparison to use when evaluating substrings. See Settings section for values.

In this case we would only need the first two arguments.


Application.Evaluate

This is IMO one of the most handy mechanisms you can use to pull of a returned array of values without having to loop through items/cells. It may get slow when you feed the function a large array formula, but in this case it will be fine. The funtion converts a Microsoft Excel name into an object or value, and when we pass it an formula, it thus will return the results. In this particular case it will return an array.




回答2:


I'm not totally clear on what end-result you're looking for, or whether you're okay with a VBA solution, but this is my interpretation:

Function lengths(txt As String) As String
  Dim wrd
  For Each wrd In Split(txt)
    If lengths <> "" Then lengths = lengths & ","
    lengths = lengths & Len(wrd)
  Next wrd
End Function

Paste the code into a VBA module and then, for example, if A1 contained Black Cup With Handle then in another cell you could use =length(A1) which would return 5,3,4,6.




回答3:


If you want to use a VBA code you could try this:

Option Explicit

Sub test()

    Dim arr As Variant
    Dim str As String
    Dim i As Long, j As Long, LastRow As Long

    With ThisWorkbook.Worksheets("Sheet1")

        LastRow = .Cells(.Rows.Count, "A").End(xlUp).Row

        For i = 1 To LastRow

            str = .Range("A" & i).Value

            arr = Split(str, " ")

            For j = LBound(arr) To UBound(arr)
                .Cells(i, j + 3).Value = Len(arr(j))
            Next j

        Next i

    End With

End Sub

Results:



来源:https://stackoverflow.com/questions/58673369/is-there-a-way-to-count-the-number-of-characters-per-word-for-a-string-returnin

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!