keras – How to load data in tensorflow from subdirectories

keras – How to load data in tensorflow from subdirectories

You can use tf.keras.preprocessing.image_dataset_from_directory().

Your directory structure would be something like this but with many more classes:

main_directory/
...class_a/
......a_image_1.jpg
......a_image_2.jpg
...class_b/
......b_image_1.jpg
......b_image_2.jpg

I would suggest you split the dataset before this step as I think the data is split here randomly and not by stratified sampling(if your datasets are imbalanced then do this first and do not use the validation split to do it for you as I am not sure of the nature of how splitting is done as there is no mention of it).

Example:

train_dataset = image_dataset_from_directory(
    directory=TRAIN_DIR,
    labels=inferred,
    label_mode=categorical,
    class_names=[0, 10, 5],
    image_size=SIZE,
    seed=SEED,
    subset=None,
    interpolation=bilinear,
    follow_links=False,
)

Important things you have to set:

  1. Labels must be inferred where the labels of the images are generated based on the directory structure so it follows the order of the classes.

  2. Label mode has to be set to categorical which encodes the labels as a categorical vector.

  3. Class names you can set this yourself where you would have to list the order of the folders in the directory otherwise the order is based on alphanumeric ordering. What you can do here as you have lots of folders is use os.walk(directory) to get the list of the directories in the order that they are.

  4. Image size you can resize the images to be of the same size. Do so according to the model that you are using i.e., MobileNet takes in (224,224) so you can set this to (224,224).

More information here.

you can us ImageDataGenerator.flow_from_directory. Documentation is here.
Assume your sub directories reside in a directory called main_dir. Set the size of the images you want to process, below I used 224 X 224, also specified color images. class_mode is set to categorical so when you compile your model use categorical cross entropy as the loss. Then use the code below.

train_gen=ImageDataGenerator(validation_split=.2,rescale=1/255)
train_gen=train_gen.flow_from_directory(main_dir,  target_size=(256, 256),
    color_mode=rgb, class_mode=categorical, batch_size=32, shuffle=True,
    seed=123, subset=training)
valid_gen=train_gen.flow_from_directory(main_dir,  target_size=(224, 224),
    color_mode=rgb, class_mode=categorical, batch_size=32, shuffle=False,
    seed=123, subset=validation)
# make and compile your model then fit the model per below
history=model.fit(x=train_gen,  epochs=20, verbose=1, validation_data=valid_gen,
                 shuffle=True,  initial_epoch=0) 

keras – How to load data in tensorflow from subdirectories

Leave a Reply

Your email address will not be published.