beautiful soup关联选择的方法-ag真人游戏

beautiful soup是python的一个网页解析库，处理快捷; 支持多种解析器，功能强大。教程细致讲解beautiful soup的深入使用、节点选择器、css选择器、beautiful soup4的方法选择器等重要知识点，是学好爬虫的基础课程。

学习目标

掌握关联选择的方法的使用

1. 关联选择

在做选择的时候，有时候不能做到一步就选到想要的节点元素，例如示例中的第二个a节点

,
 and
;

需要先选中某一个节点元素，然后以它为基准再选择它的子节点、父节点、兄弟节点等，接下来我们来介绍如何选择这些节点元素。

1. 子节点

格式：soup.tag.contents
返回值：列表

示例：

html = '''

    
        hello
    
    
    
        
             foo
            bar
            ]ay
        
        
        
             foo
            bar
        
    

'''
from bs4 import beautifulsoup
soup = beautifulsoup(html, 'lxml')
# 获取p节点的子节点
print(soup.p.contents)
  
# 输出结果
['once upon a time there were three little sisters; and their names were\n', , ',\n', , ' and\n', , ';\nand they lived at the bottom of a well.']

格式：soup.tag.children
返回值：生成器

示例：

html = '''
  
      
          hello
      
      
      
          
               foo
              bar
              ]ay
          
          
          
               foo
              bar
          
      
  
  '''
from bs4 import beautifulsoup
soup = beautifulsoup(html, 'lxml')
# 获取p节点的每一个子节点
print(soup.p.children)
for i, child in enumerate(soup.p.children):
  	print(i, child)
    
# 输出结果

0 once upon a time there were three little sisters; and their names were
1 
2 ,
3 
4  and
5 
6 ;
and they lived at the bottom of a well.

2. 子孙节点

上面我们已经拿到了p节点的全部直系的子节点，如果我们想要获取p节点中的所有子孙节点的话，可以使用descendants属性。

格式：soup.p.descendants
返回值：生成器

示例：

html = '''
  
      
          hello
      
      
      
          
               foo
              bar
              ]ay
          
          
          
               foo
              bar
          
      
  
  '''
from bs4 import beautifulsoup
soup = beautifulsoup(html, 'lxml')
# 获取p节点所有的子孙节点
print(soup.p.descendants)
for i, child in enumerate(soup.p.descendants):
    print(i, child)
# 输出结果

0 once upon a time there were three little sisters; and their names were
1 
2 elsie
3 elsie
4 ,
5 
6 lacie
7  and
8 
9 tillie
10 ;
and they lived at the bottom of a well.

3. 父节点

上面我们都是在选择子节点和子孙节点，接下来我们使用parent属性获取某节点元素的父节点。

格式：soup.tag.parent
返回值：节点元素

示例：

html = '''
  
      
          hello
      
      
      
          
               foo
              bar
              ]ay
          
          
          
               foo
              bar
          
      
  
  '''
from bs4 import beautifulsoup
soup = beautifulsoup(html, 'lxml')
# 打印选取的a节点
print(soup.a)
# 获取a节点的父节点
print(soup.a.parent)
# 输出结果
# 选取的a节点
# 所选a节点的父节点
once upon a time there were three little sisters; and their names were
,
 and
;
and they lived at the bottom of a well.

4. 祖先节点

如果想要获取，祖先节点，可以调用parents属性。

格式：soup.tag.parents
返回值：生成器

示例：

html = '''
  
      
          hello
      
      
      
          
               foo
              bar
              ]ay
          
          
          
               foo
              bar
          
      
  
  '''
from bs4 import beautifulsoup
soup = beautifulsoup(html, 'lxml')
# 获取a节点的所有祖先节点
print(soup.a.parents)
# 打印a节点的所有祖先节点的类型
print(type(soup.a.parents))
# 获取a节点的所有祖先节点的内容
print(list(enumerate(soup.a.parents)))
# 输出结果


[(0, once upon a time there were three little sisters; and their names were
,
 and
;
and they lived at the bottom of a well.), (1, 
once upon a time there were three little sisters; and their names were
,
 and
;
and they lived at the bottom of a well.
...
), 
 (2, the dormouse's story
once upon a time there were three little sisters; and their names were
,
 and
;
and they lived at the bottom of a well.
...
), 
 (3, the dormouse's story
once upon a time there were three little sisters; and their names were
,
 and
;
and they lived at the bottom of a well.
...
)]

5. 兄弟节点

上面说明了子节点和父节点的获取方式，那如果想要获取同级的节点，应该怎么办呢？接下来我们来学习下，使用sibling获取兄弟节点。

获取后面一个节点
格式：soup.tag.next_sibling
返回值：节点元素

示例：

html = '''
  
      
          hello
      
      
      
          
               foo
              bar
              ]ay
          
          
          
               foo
              bar
          
      
  
  '''
from bs4 import beautifulsoup
soup = beautifulsoup(html, 'lxml')
# 获取a节点的后面一个节点
print(soup.a.next_sibling)
# 获取类型
print(type(soup.a.next_sibling))
# 输出结果
,

获取后面所有的节点
格式：soup.tag.next_siblings
返回值：生成器

示例：

# 获取a节点的后面所有节点
print(soup.a.next_siblings)
# 获取类型
print(type(soup.a.next_siblings))
# 获取所有内容
print(list(enumerate(soup.a.next_siblings)))
# 输出结果


[(0, ',\n'), 
 (1, ), 
 (2, ' and\n'), 
 (3, ), 
 (4, ';\nand they lived at the bottom of a well.')]

获取前面一个节点
格式：soup.tag.previous_sibling
返回值：节点元素

示例：

# 获取a节点的前一个节点
print(soup.a.previous_sibling)
# 获取类型
print(type(soup.a.previous_sibling))
# 输出结果
once upon a time there were three little sisters; and their names were

获取前面的所有节点
格式：soup.tag.previous_siblings
返回值：生成器

示例：

# 获取a节点的前面所有节点
print(soup.a.previous_siblings)
# 获取类型
print(type(soup.a.previous_siblings))
# 获取所有内容
print(list(enumerate(soup.a.previous_siblings)))
# 输出结果


[(0, 'once upon a time there were three little sisters; and their names were\n')]

2. 总结

节点选择器关联选择方法：

子节点
- soup.tag.contents
- soup.tag.children
子孙节点
- soup.tag.descendants
父节点
- soup.tag.parent
祖先节点
- soup.tag.parents
兄弟节点
- soup.tag.next_sibling
- soup.tag.next_siblings
- soup.tag…previous_sibling
- soup.tag…previous_siblings

beautiful soup教程

beautiful soup关联选择的方法-ag真人游戏

1. 关联选择

1. 子节点

hello

hello

2. 子孙节点

hello

3. 父节点

hello

4. 祖先节点

hello

5. 兄弟节点

hello

2. 总结

笔记导航