问题
I am trying to write a unittest for a function that utilizes a generator. Below is my code:
def extract_data(body):
for i in body:
a = re.sub('<[^<]+?>', '', str(i))
b = re.sub('view\xc2\xa0book\xc2\xa0info', '', str(a))
c = re.sub('key', '', str(b))
d = re.sub('\xc2', ' ', str(c))
e = re.sub('\xa0', '', str(d))
yield e
My unittest code:
def test_extract_data(self):
sample_input = ['<tr><h1>keyThis</h1><h2>\xc2</h2><h3>\xa0</h3><h4>view\xc2\xa0book\xc2\xa0info</h4><h5>Test Passes</h5></tr>']
expected_res = 'This Test Passes'
res = extract_data(sample_input)
self.assertEqual(expected_res, res)
This test passes without issue if the extract_data function uses a return instead of yield. How do I write the test for the generator?
回答1:
I figured out what I needed to do. I needed to make the res into a list. and that was it. A lot simpler than I expected. so this is what it looks like now:
class TestScrapePage(unittest.TestCase):
def test_extract_data(self):
sample_input = ['<tr><h1>keyThis</h1><h2>\xc2</h2><h3>\xa0</h3><h4>view\xc2\xa0book\xc2\xa0info</h4><h5>Test Passes</h5></tr>']
expected_res = ['This Test Passes']
res = list(extract_data(sample_input))
self.assertEqual(expected_res, res)
if __name__ == '__main__':
unittest.main()
回答2:
Your code, slightly altered to not require unittest:
import re
def extract_data(body):
for i in body:
a = re.sub('<[^<]+?>', '', str(i))
b = re.sub('view\xc2\xa0book\xc2\xa0info', '', str(a))
c = re.sub('key', '', str(b))
d = re.sub('\xc2', ' ', str(c))
e = re.sub('\xa0', '', str(d))
yield e
def test_extract_data():
sample_input = ['<tr><h1>keyThis</h1><h2>\xc2</h2><h3>\xa0</h3><h4>view\xc2\xa0book\xc2\xa0info</h4><h5>Test Passes</h5></tr>']
expected_res = 'This Test Passes'
res = extract_data(sample_input)
return expected_res == res
print(test_extract_data())
This prints False
The problem is that when you do return
, the function, in your case, returns a str
. However, when you do yield
, it returns a generator
type object whose next()
function returns a str
. So, for example:
import re
def extract_data(body):
for i in body:
a = re.sub('<[^<]+?>', '', str(i))
b = re.sub('view\xc2\xa0book\xc2\xa0info', '', str(a))
c = re.sub('key', '', str(b))
d = re.sub('\xc2', ' ', str(c))
e = re.sub('\xa0', '', str(d))
yield e
def test_extract_data():
sample_input = ['<tr><h1>keyThis</h1><h2>\xc2</h2><h3>\xa0</h3><h4>view\xc2\xa0book\xc2\xa0info</h4><h5>Test Passes</h5></tr>']
expected_res = 'This Test Passes'
res = extract_data(sample_input)
return expected_res == next(res)
print(test_extract_data())
This prints True
.
To illustrate, at the Python command prompt:
>>> type("hello")
<class 'str'>
>>> def gen():
... yield "hello"
...
>>> type(gen())
<class 'generator'>
Your other option (possibly better, depending on your use case), is to test that are all of the results of the generator
are correct by converting the generator
object's results into a list
or tuple
, and then compare for equality:
import re
def extract_data(body):
for i in body:
a = re.sub('<[^<]+?>', '', str(i))
b = re.sub('view\xc2\xa0book\xc2\xa0info', '', str(a))
c = re.sub('key', '', str(b))
d = re.sub('\xc2', ' ', str(c))
e = re.sub('\xa0', '', str(d))
yield e
def test_extract_data():
sample_input = ['<tr><h1>keyThis</h1><h2>\xc2</h2><h3>\xa0</h3><h4>view\xc2\xa0book\xc2\xa0info</h4><h5>Test Passes</h5></tr>', '<tr><h1>keyThis</h1><h2>\xc2</h2><h3>\xa0</h3><h4>view\xc2\xa0book\xc2\xa0info</h4><h5>Test Passes Too!</h5></tr>']
expected_res = ['This Test Passes', 'This Test Passes Too!']
res = extract_data(sample_input)
return expected_res == list(res)
print(test_extract_data())
来源:https://stackoverflow.com/questions/37956435/write-unittest-for-function-with-yield