Lingualytics
Index
Module Index
Search Page
lingualytics.preprocessing.
remove_lessthan
Removes words less than a specific length.
s (pd.Series) – A pandas series.
length (int) – The minimum length a word should have.
remove_links
Removes links from the text.
remove_punctuation
Removes punctuation from the text.
punctuation (str) – All the punctuation characters you want to remove.
remove_stopwords
Removes stopwords from the text.
stopwords (list of str) – A list of stopwords you want to remove.
lingualytics.representation.
get_ngrams
Return a list of n-grams in descending order of their occurences.
n (int) – Length of n in n-grams.
delimiter (str) – The delimiter which separates any two words.
lingualytics.learner.
CustomDataset
Learner
data_dir (str) – Path of the dataset.
output_dir (str) – Path where the trained model and predictions will be saved.
dataset (str) – The dataset to use from list of our available datasets. Set to None to use your own dataset.
lr (float) – The learning rate for training.
num_train_epochs (int) – Number of epochs to train.
train_bs (int) – Batch size for training.
eval_bs (int) – Batch size while evaluating.
model_type (str) – The type of model to use from Huggingface.
model_name (str) – The name of the model to use from Huggingface.
save_steps (int) – Number of epochs to wait before saving the model again.
seed (int) – The seed to set at all places.
max_seq_length (int) – The maximum sequence length.
weight_decay (float) – Weight decay for training.
adam_epsilon (float) – Adam epsilon for training.
max_grad_norm (float) – Maximum gradient norm.
device (str) – Force the device ‘cpu’ or ‘gpu’ for Tensors
acc_and_f1
collate
convert_examples_to_features
download_dataset
evaluate
fit
Download and finetune the model on the dataset.
get_labels
load_and_cache_examples
read_examples_from_file
set_seed
setup_model
simple_accuracy
train