[Python/AI/ASR] Kospeech

kospeech_env/kospeech에서 다음 코드 실행

python ./bin/main.py model=conformer-small train=conformer_small_train train.dataset_path=/home/donghwan/kospeech_env/KsponSpeech

여기서부터 무한 에러 수정

1. hydra 버전 호환성 문제 (경고)

main.py에서 아래 부분을

@hydra.main(config_path=os.path.join('..', "configs"), config_name="train")

아래와 같이 변경

@hydra.main(config_path=os.path.join('..', "configs"), config_name="train", version_base="1.1")

2. ConfigStore 스키마 관련 문제 (경고)

audio/fbank와 model/conformer-small에서 스키마 충동을 피하기 위해, 설정 파일의 이름을 구체적으로 변경

main.py에서 아래 두 코드를

cs.store(group="audio", name="fbank", node=FilterBankConfig, package="audio")
cs.store(group="model", name="conformer-small", node=ConformerSmallConfig, package="model")

다음과 같이 변경

cs.store(group="audio", name="audio_fbank", node=FilterBankConfig, package="audio")
cs.store(group="model", name="conformer-small_model", node=ConformerSmallConfig, package="model")

3. 오류 메세지 : feed_forward_expansion_factor

Error merging 'model/conformer-small' with schema
Value 'int = 4' of type 'str' could not be converted to Integer
full_key: feed_forward_expansion_factor

다음과 같이 오류 메세지 반환하고 있는데 int 타입이 아니라 String 타입으로 인식해서 그렇다.

/home/donghwan/kospeech_env/kospeech/configs/model/conformer-small.yaml 파일에서 feed_forward_expansion_factor부분을 int =4에서 4로 바꾸면 해결된다.

architecture: conformer
teacher_forcing_step: 0.0
min_teacher_forcing_ratio: 1.0
joint_ctc_attention: false
feed_forward_expansion_factor: 4
conv_expansion_factor: 2
input_dropout_p: 0.1
feed_forward_dropout_p: 0.1
attention_dropout_p: 0.1
conv_dropout_p: 0.1
decoder_dropout_p: 0.1
conv_kernel_size: 31
half_step_residual: True
encoder_dim: 144
decoder_dim: 320
num_encoder_layers: 16
num_decoder_layers: 1
num_attention_heads: 4
decoder: None

4. transcripts.txt 파일 이동

[2024-10-14 15:39:49,306][kospeech.utils][INFO] - Operating System : Linux 5.4.0-150-generic
[2024-10-14 15:39:49,307][kospeech.utils][INFO] - Processor : x86_64
[2024-10-14 15:39:49,319][kospeech.utils][INFO] - device : NVIDIA GeForce RTX 2070 SUPER
[2024-10-14 15:39:49,319][kospeech.utils][INFO] - CUDA is available : True
[2024-10-14 15:39:49,319][kospeech.utils][INFO] - CUDA version : 12.1
[2024-10-14 15:39:49,319][kospeech.utils][INFO] - PyTorch version : 2.4.1+cu121
[2024-10-14 15:39:49,339][kospeech.utils][INFO] - split dataset start !!
Error executing job with overrides: ['model=conformer-small', 'train=conformer_small_train', 'train.dataset_path=/home/donghwan/kospeech_env/KsponSpeech']
Traceback (most recent call last):
  File "/home/donghwan/kospeech_env/kospeech/./bin/main.py", line 161, in main
    last_model_checkpoint = train(config)
                            ^^^^^^^^^^^^^
  File "/home/donghwan/kospeech_env/kospeech/./bin/main.py", line 89, in train
    epoch_time_step, trainset_list, validset = split_dataset(config, config.train.transcripts_path, vocab)
                                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/donghwan/kospeech_env/kospeech/kospeech/data/data_loader.py", line 274, in split_dataset
    audio_paths, transcripts = load_dataset(transcripts_path)
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/donghwan/kospeech_env/kospeech/kospeech/data/label_loader.py", line 31, in load_dataset
    with open(transcripts_path) as f:
         ^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '../../../data/transcripts.txt'

다음과 같이 오류가 발생하는데 transcripts.txt 파일은 아마 /home/donghwan/kospeech_env/kospeech/dataset/kspon에 존재할 것이다. 이 파일을 /home/donghwan/kospeech_env/kospeech/data여기 폴더로 옮기면 해결된다.

5. dataloader 오류

Error executing job with overrides: ['model=conformer-small', 'train=conformer_small_train', 'train.dataset_path=/home/donghwan/kospeech_env/KsponSpeech']
Traceback (most recent call last):
  File "/home/donghwan/kospeech_env/kospeech/./bin/main.py", line 161, in main
    last_model_checkpoint = train(config)
                            ^^^^^^^^^^^^^
  File "/home/donghwan/kospeech_env/kospeech/./bin/main.py", line 89, in train
    epoch_time_step, trainset_list, validset = split_dataset(config, config.train.transcripts_path, vocab)
                                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/donghwan/kospeech_env/kospeech/kospeech/data/data_loader.py", line 303, in split_dataset
    SpectrogramDataset(
  File "/home/donghwan/kospeech_env/kospeech/kospeech/data/data_loader.py", line 67, in __init__
    self.shuffle()
  File "/home/donghwan/kospeech_env/kospeech/kospeech/data/data_loader.py", line 106, in shuffle
    self.audio_paths, self.transcripts, self.augment_methods = zip(*tmp)
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: not enough values to unpack (expected 3, got 0)

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

dataloader.py에서 문제가 있다고 말한다. 인자 전달이 부족하다는 뜻.

/home/donghwan/kospeech_env/kospeech/kospeech/data로 이동하여 data_loader.py 파일을 확인 (265번째 줄)

    if config.train.dataset == 'kspon':
        train_num = 620000
        valid_num = 2545

이 부분을 train_num과 valid_num을 줄이면 된다. (본인이 맞게 줄이면 된다.)

train_num = 10000

valid_num = 2000

으로 설정해서 일단 진행했다.

5-1. 데이터 증강 비활성화

원래 데이터 증강을 해야 학습에 더 좋지만 학습 시간을 고려하여 비활성화 하였다.

/home/donghwan/kospeech_env/kospeech/kospeech/data로 이동하여 data_loader.py 파일을 확인 (92번째 줄)

    def _augment(self, spec_augment):
        """ Spec Augmentation """
        if spec_augment:
            logger.info("Applying Spec Augmentation...")

            for idx in range(self.dataset_size):
                self.augment_methods.append(self.SPEC_AUGMENT)
                self.audio_paths.append(self.audio_paths[idx])
                self.transcripts.append(self.transcripts[idx])

이 부분을 아래와 같이 수정하면 된다. spec_augment 앞에 not을 붙이면 비활성화가 된다.

    def _augment(self, spec_augment):
        """ Spec Augmentation """
        if not spec_augment:
            logger.info("Applying Spec Augmentation...")

            for idx in range(self.dataset_size):
                self.augment_methods.append(self.SPEC_AUGMENT)
                self.audio_paths.append(self.audio_paths[idx])
                self.transcripts.append(self.transcripts[idx])

6. conformer model에서 decoder 설정 오류

[2024-10-15 22:59:12,774][kospeech.utils][INFO] - Operating System : Linux 5.15.0-122-generic
[2024-10-15 22:59:12,775][kospeech.utils][INFO] - Processor : x86_64
[2024-10-15 22:59:12,789][kospeech.utils][INFO] - device : NVIDIA GeForce RTX 3080 Ti
[2024-10-15 22:59:12,789][kospeech.utils][INFO] - CUDA is available : True
[2024-10-15 22:59:12,789][kospeech.utils][INFO] - CUDA version : 12.1
[2024-10-15 22:59:12,789][kospeech.utils][INFO] - PyTorch version : 2.4.1+cu121
[2024-10-15 22:59:12,805][kospeech.utils][INFO] - split dataset start !!
[2024-10-15 22:59:13,760][kospeech.utils][INFO] - Applying Spec Augmentation...
[2024-10-15 22:59:13,941][kospeech.utils][INFO] - Applying Spec Augmentation...
[2024-10-15 22:59:14,230][kospeech.utils][INFO] - Applying Spec Augmentation...
[2024-10-15 22:59:14,419][kospeech.utils][INFO] - Applying Spec Augmentation...
[2024-10-15 22:59:14,872][kospeech.utils][INFO] - split dataset complete !!
Error executing job with overrides: ['model=conformer-small', 'train=conformer_small_train', 'train.dataset_path=/home/ssel/kospeech_env/KsponSpeech']
Traceback (most recent call last):
  File "./bin/main.py", line 162, in main
    last_model_checkpoint = train(config)
  File "./bin/main.py", line 90, in train
    model = build_model(config, vocab, device)
  File "/home/ssel/kospeech_env/kospeech/kospeech/model_builder.py", line 100, in build_model
    decoder_rnn_type=config.model.decoder_rnn_type,
omegaconf.errors.ConfigAttributeError: Key 'decoder_rnn_type' is not in struct
    full_key: model.decoder_rnn_type
    object_type=dict

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

conformer model은 decode를 사용하는게 일반적이므로 오류가 발생하게 된다.

kospeech_env/kospeech/configs/model/conformer-small.yaml 파일에서 다음과 같이 수정하면 된다.

architecture: conformer
teacher_forcing_step: 0.0
min_teacher_forcing_ratio: 1.0
joint_ctc_attention: false
feed_forward_expansion_factor: 4
conv_expansion_factor: 2
input_dropout_p: 0.1
feed_forward_dropout_p: 0.1
attention_dropout_p: 0.1
conv_dropout_p: 0.1
decoder_dropout_p: 0.1
conv_kernel_size: 31
half_step_residual: True
encoder_dim: 144
decoder_dim: 320
num_encoder_layers: 16
num_decoder_layers: 1
num_attention_heads: 4
decoder_rnn_type: lstm  # 또는 gru 추가된 부분
decoder: lstm  # 수정된 부분

7. 글자사전 변경

/home/donghwan/kospeech_env/kospeech에 전처리 과정에서 만든 my_character_vocabs.csv 글자 사전으로 변경해야한다.

/home/donghwan/kospeech_env/kospeech/data/vocab 여기로 이동하면 된다.

바꿔야 하는 파일은 다음과 같다.

/home/donghwan/kospeech_env/kospeech/bin/main.py (83번째 줄 : vocab = KsponSpeechVocabulary

/home/donghwan/kospeech_env/kospeech/bin/eval.py

/home/donghwan/kospeech_env/kospeech/bin/inference.py

8. 배치 사이즈 및 에포크 수 설정

torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 72.00 MiB. GPU 0 has a total capacity of 7.78 GiB of which 77.44 MiB is free. Including non-PyTorch memory, this process has 6.90 GiB memory in use. Of the allocated memory 6.70 GiB is allocated by PyTorch, and 66.71 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

위와 같이 메모리 부족으로 인한 문제가 발생하고 있다.

/home/donghwan/kospeech_env/kospeech/kospeech/trainer로 이동하여 __init__.py 파일에 아래 코드를 찾는다.

@dataclass
class TrainConfig:
    dataset: str = "kspon"
    dataset_path: str = "???"
    transcripts_path: str = "../../../data/transcripts.txt"
    output_unit: str = "character"

    batch_size: int = 32
    save_result_every: int = 1000
    checkpoint_every: int = 5000
    print_every: int = 10
    mode: str = "train"

    num_workers: int = 4
    use_cuda: bool = True
    num_threads: int = 2

    init_lr_scale: float = 0.01
    final_lr_scale: float = 0.05
    max_grad_norm: int = 400
    weight_decay: float = 1e-05
    total_steps: int = 200000

    seed: int = 777
    resume: bool = False

이 부분에서 batch_size를 수정하여 조절하면 된다.

@dataclass
class ConformerTrainConfig(TrainConfig):
    optimizer: str = "adam"
    reduction: str = "mean"
    lr_scheduler: str = 'transformer_lr_scheduler'
    optimizer_betas: tuple = (0.9, 0.98)
    optimizer_eps: float = 1e-09
    warmup_steps: int = 10000
    decay_steps: int = 80000
    weight_decay: float = 1e-06
    peak_lr: float = 0.05 / math.sqrt(512)
    final_lr: float = 1e-07
    final_lr_scale = 0.001
    num_epochs: int = 20

이 부분에서 num_epochs를 수정하여 조절하면 된다.

저작자표시 비영리 변경금지

'Kospeech' 카테고리의 다른 글

[Python/AI/ASR] Kospeech - Install (0)	2024.10.19
[Python/AI/ASR] Kospeech - 학습 파라미터 조절 (0)	2024.10.14
[Python/AI/ASR] Kospeech - Intro ~ Preprocess (0)	2024.10.14

hwanikim

[Python/AI/ASR] Kospeech - 에러 수정

'Kospeech' 카테고리의 다른 글

티스토리툴바

[Python/AI/ASR] Kospeech - 에러 수정

'Kospeech' 카테고리의 다른 글

관련글

티스토리툴바