Word文档包含包装在三个对象级别内的格式化文本。最低级别-运行对象、中间级别-段落对象和最高级别-文档对象。 因此,我们无法使用普通文本编辑器处理这些文档。但是,我们可以使用python docx模块在python中操作这些word文档。
null
1. 第一步是安装这个第三方模块python docx。您可以使用pip“pip install python docx”或从 在这里 .这是答案 Github存储库。
2. 安装后导入“docx”而不是“python docx”。 3. 使用“docx.Document”类开始使用word文档。
代码#1:
# import docx NOT python-docx import docx # create an instance of a word document doc = docx.Document() # add a heading of level 0 (largest heading) doc.add_heading( 'Heading for the document' , 0 ) # add a paragraph and store # the object in a variable doc_para = doc.add_paragraph( 'Your paragraph goes here, ' ) # add a run i.e, style like # bold, italic, underline, etc. doc_para.add_run( 'hey there, bold here' ).bold = True doc_para.add_run( ', and ' ) doc_para.add_run( 'these words are italic' ).italic = True # add a page break to start a new page doc.add_page_break() # add a heading of level 2 doc.add_heading( 'Heading level 2' , 2 ) # pictures can also be added to our word document # width is optional doc.add_picture( 'path_to_picture' ) # now save the document to a location doc.save( 'path_to_document' ) |
输出:
注意第二页的分页符。 代码#2: 现在,要打开word文档,请创建一个实例,同时传递文档的路径。
# import the Document class # from the docx module from docx import Document # create an instance of a # word document we want to open doc = Document( 'path_to_the_document' ) # print the list of paragraphs in the document print ( 'List of paragraph objects:->>>' ) print (doc.paragraphs) # print the list of the runs # in a specified paragraph print ( 'List of runs objects in 1st paragraph:->>>' ) print (doc.paragraphs[ 0 ].runs) # print the text in a paragraph print ( 'Text in the 1st paragraph:->>>' ) print (doc.paragraphs[ 0 ].text) # for printing the complete document print ( 'The whole content of the document:->>>' ) for para in doc.paragraphs: print (para.text) |
输出:
List of paragraph objects:->>> [<docx.text.paragraph.Paragraph object at 0x7f45b22dc128>, <docx.text.paragraph.Paragraph object at 0x7f45b22dc5c0>, <docx.text.paragraph.Paragraph object at 0x7f45b22dc0b8>, <docx.text.paragraph.Paragraph object at 0x7f45b22dc198>, <docx.text.paragraph.Paragraph object at 0x7f45b22dc0f0>] List of runs objects in 1st paragraph:->>> [<docx.text.run.Run object at 0x7f45b22dc198>] Text in the 1st paragraph:->>> Heading for the document The whole content of the document:->>> Heading for the document Your paragraph goes here, hey there, bold here, and these words are italic Heading level 2
参考: https://python-docx.readthedocs.io/en/latest/#user-向导 .
© 版权声明
文章版权归作者所有,未经允许请勿转载。
THE END