polyglot.tokenize package

Subpackages

Submodules

polyglot.tokenize.base module

Basic text segmenters.

class polyglot.tokenize.base.Breaker(locale)[source]

Bases: object

Base class to segment text.

transform(sequence)[source]
class polyglot.tokenize.base.SentenceTokenizer(locale='en')[source]

Bases: polyglot.tokenize.base.Breaker

Segment text to sentences.

class polyglot.tokenize.base.WordTokenizer(locale='en')[source]

Bases: polyglot.tokenize.base.Breaker

Segment text to words or tokens.

Module contents

class polyglot.tokenize.WordTokenizer(locale='en')[source]

Bases: polyglot.tokenize.base.Breaker

Segment text to words or tokens.

class polyglot.tokenize.SentenceTokenizer(locale='en')[source]

Bases: polyglot.tokenize.base.Breaker

Segment text to sentences.