I have this code
site = hxs.select(\"//h1[@class=\'state\']\")
log.msg(str(site[0].extract()),level=log.ERROR)
The ouput is
//h1[@class='state']
in your above xpath you are selecting h1 tag that has class attribute state
so that's why it's selecting everything that comes in h1 element
if you just want to select text of h1 tag all you have to do is
//h1[@class='state']/text()
if you want to select text of h1 tag as well as its children tags, you have to use
//h1[@class='state']//text()
so the difference is /text() for specific tag text and //text() for text of specific tag as well as its children tags
below mentioned code works for you
site = ''.join(hxs.select("//h1[@class='state']/text()").extract()).strip()