1500字范文,内容丰富有趣,写作好帮手!
1500字范文 > python生成epub文件_将'epub'文件转换为文本

python生成epub文件_将'epub'文件转换为文本

时间:2021-02-19 04:21:36

相关推荐

python生成epub文件_将'epub'文件转换为文本

*** IMPORTANT ***

No further development will occur in this package as it has been supeseded by the actively maintained and quite spiffy! epubr package.

pubcrawl

Convert ‘epub’ Files to Text

Description

Convert ‘epub’ Files to Text

The ‘epub’ file format is really just a structured ‘ZIP’ archive with metadata, graphics and (usually) ‘HTML’ text. Tools are provided to turn an ‘epub’ file into a tidy data frame.

What’s Inside The Tin

The following functions are implemented:

epub_to_text: Convert an epub file into a data frame of plaintext chapters

NOTE

There are edge cases I’ve totally not covered yet. Feel free to jump in and make this a real, useful package!

TODO

Refactor so there aren’t so many heavy dependencies

[ ] Try to get

hgr on CRAN so it’s not a GH dep Moved the cleaner code into here

Better docs

Embed some epubs for examples and tests

Setup Travis, Appveyor, code coverage

Installation

devtools::install_github("hrbrmstr/pubcrawl")

Usage

library(pubcrawl)

library(tidyverse)

# current verison

packageVersion("pubcrawl")

## [1] '0.1.0'

An O’Reilly epub

epub_to_text("~/Data/R Packages.epub")

## # A tibble: 26 x 4

## path size date content

##

## 1 OEBPS/cover.html 315 -03-24 21:49:16 Cover

## 2 OEBPS/titlepage01.html 466 -03-24 21:49:16 "R Packages\n\nHadley Wickham"

## 3 OEBPS/copyright-page01.html 3286 -03-24 21:49:16 "R Packages\n\nby Hadley Wickham\n\n\n\nPrinted in the Unite…

## 4 OEBPS/toc01.html 17557 -03-24 21:49:16 "navPrefaceIn This Book\n\nConventions Used in This Book\n\nU…

## 5 OEBPS/preface01.html 17784 -03-24 21:49:16 "Preface\n\n\nIn This Book\n\nThis book will guide you from b…

## 6 OEBPS/part01.html 444 -03-24 21:49:16 Getting Started

## 7 OEBPS/ch01.html 1 -03-24 21:49:16 "Introduction\n\nIn R, the fundamental unit of shareable code…

## 8 OEBPS/ch02.html 28633 -03-24 21:49:18 "Package Structure\n\nThis chapter will start you on the road…

## 9 OEBPS/part02.html 454 -03-24 21:49:18 Package Components

## 10 OEBPS/ch03.html 28629 -03-24 21:49:18 "R Code\n\nThe first principle of using a package is that all…

## # ... with 16 more rows

A Project Gutenberg epub that comes with the package

epub_to_text(system.file("extdat", "augustine.epub", package="pubcrawl")) %>%

mutate(path = abbreviate(path))

## # A tibble: 10 x 4

## path size date content

##

## 1 OEBPS/@@@@@@@3296@3296-@3296--0 63804 -10-02 07:00:00 "THE CONFESSIONS\nOF\nSAINT AUGUSTINE\n\nBy Saint Augusti…

## 2 OEBPS/@@@@@@@3296@3296-@3296--1 68504 -10-02 07:00:00 "BOOK III\nTo Carthage I came, where there sang all aroun…

## 3 OEBPS/@@@@@@@3296@3296-@3296--2 80192 -10-02 07:00:00 "BOOK V\nAccept the sacrifice of my confessions from the …

## 4 OEBPS/@@@@@@@3296@3296-@3296--3 51898 -10-02 07:00:00 "O crooked paths! Woe to the audacious soul, which hoped,…

## 5 OEBPS/@@@@@@@3296@3296-@3296--4 80194 -10-02 07:00:00 "Anubis, barking Deity, and all The monster Gods …

## 6 OEBPS/@@@@@@@3296@3296-@3296--5 80718 -10-02 07:00:00 "The boy then being stilled from weeping, Euodius took up…

## 7 OEBPS/@@@@@@@3296@3296-@3296--6 65956 -10-02 07:00:00 "And Thou knowest how far Thou hast already changed me, w…

## 8 OEBPS/@@@@@@@3296@3296-@3296--7 57022 -10-02 07:00:00 "BOOK XII\nMy heart, O Lord, touched with the words of Th…

## 9 OEBPS/@@@@@@@3296@3296-@3296--8 69513 -10-02 07:00:00 "BOOK XIII\nI call upon Thee, O my God, my mercy, Who cre…

## 10 OEBPS/@@@@@@@3296@3296-@3296--9 21223 -10-02 07:00:00 "The Confessions of Saint Augustine, by Saint Augustine\n…

Code of Conduct

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。