blob: 7e7df6a4f04948f6ac11ced0541db8fef7452a1e (
plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
|
Metadata-Version: 2.1
Name: striprtf
Version: 0.0.28
Summary: A simple library to convert rtf to text
Home-page: https://github.com/joshy/striprtf
Author: Joshy Cyriac
Author-email: joshy@posteo.ch
License: BSD-3-Clause
Download-URL: https://github.com/joshy/striprtf/archive/v0.0.28.tar.gz
Keywords: rtf
Platform: UNKNOWN
Classifier: License :: OSI Approved :: BSD License
Description-Content-Type: text/markdown
License-File: LICENSE
# striprtf

## Purpose
This is a simple library to convert Rich Text Format (RTF) files to python strings.
A lot of medical documents are written in RTF format which is not ideal for parsing
and further processing. This library converts it to plain old text.
## How to use it
```python
from striprtf.striprtf import rtf_to_text
rtf = "some rtf encoded string"
text = rtf_to_text(rtf)
print(text)
```
If you want to use a different encoding than `cp1252` you can pass it via the `encoding`
parameter. This is only taken into account if no explicit codepage has been set.
```python
from striprtf.striprtf import rtf_to_text
rtf = "some rtf encoded string in latin1"
text = rtf_to_text(rtf, encoding="latin-1")
print(text)
```
Sometimes UnicodeDecodingErrors can happen because of various reasons.
In this case you can try to relax the encoding process like this:
```python
from striprtf.striprtf import rtf_to_text
rtf = "some rtf encoded string"
text = rtf_to_text(rtf, errors="ignore")
print(text)
```
## Online version
If you don't want to install or just try it out there is an [online version](https://striprtf.dev) available.
## PostgreSQL
There is also a [PostgreSQL version](https://github.com/MnhnL/pg_striprtf) available from [Raffael Mancini](https://github.com/raffael-mnhn).
## History
[Pyth](https://github.com/brendonh/pyth) was not working for the rtf files I
had. The next best thing was this gist:
https://gist.github.com/gilsondev/7c1d2d753ddb522e7bc22511cfb08676
~~Very few additions where made, e.g. better formatting of tables. ~~
In the meantime some encodings bugs have been fixed. :-)
|