7. Transfer learning of Faster RCNN object recognition model using TorchVision
The notebook (exercise) for this section (open in another tab).
Video (27 min)
Explanation of the General Object Recognition Model PDF
Implementation by YOLOv8 is here: by YOLOv8
In this notebook, we use underwater tuna images to perform transfer learning of a trained Faster R-CNN model provided by TorchVision, a library in PyTorch.
References:
[4]:
# Import modules
import datetime
import numpy as np
import matplotlib.pyplot as plt
import cv2 # cv2 is an image processing library
[5]:
# Settings for displaying images
plt.rcParams['axes.grid'] = False
plt.rcParams['xtick.labelsize'] = False
plt.rcParams['ytick.labelsize'] = False
plt.rcParams['xtick.top'] = False
plt.rcParams['xtick.bottom'] = False
plt.rcParams['ytick.left'] = False
plt.rcParams['ytick.right'] = False
plt.rcParams['figure.figsize'] = [10, 5]
7.1. Preparing a dataset
First, download a video of a tuna swimming. This video is from the NHK Creative Library. For the terms of use, please refer to this page.
[6]:
!wget -O tuna.mp4 https://www9.nhk.or.jp/das/movie/D0002031/D0002031658_00000_V_000.mp4
--2025-01-22 05:41:41-- https://www9.nhk.or.jp/das/movie/D0002031/D0002031658_00000_V_000.mp4
Resolving www9.nhk.or.jp (www9.nhk.or.jp)... 202.79.241.203, 101.102.235.203, 202.79.241.44, ...
Connecting to www9.nhk.or.jp (www9.nhk.or.jp)|202.79.241.203|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3628291 (3.5M) [video/mp4]
Saving to: ‘tuna.mp4’
tuna.mp4 100%[===================>] 3.46M 6.13MB/s in 0.6s
2025-01-22 05:41:42 (6.13 MB/s) - ‘tuna.mp4’ saved [3628291/3628291]
Eight frames from the video are converted to a NumPy array and stored as a list images
. In this case, the frames are randomly selected, but in practice, it is better to use the frames that are most clearly shown in the video.
[7]:
num_images = 8 # number of images exracted
np.random.seed(0) # Fix the seed of random numbers to reproduce results
# Video capture
vcapture = cv2.VideoCapture('tuna.mp4')
# Number of frames
num_frame = int(vcapture.get(cv2.CAP_PROP_FRAME_COUNT))
# Frame number to be retrieved
frames = np.sort(np.random.choice(num_frame, num_images, replace=False))
print('Frame number:', frames)
# If the output is not [above], run the following
# frames = np.array([198, 395, 511, 708, 885, 922, 1016, 1040])
Frame number: [ 198 395 511 708 885 922 1016 1040]
[8]:
# images: List of NumPy arrays of images
images = []
for frame in frames:
vcapture.set(cv2.CAP_PROP_POS_FRAMES, frame) #frameから読み込み read from "frame"
success, image = vcapture.read()
image = image[...,::-1] # cv2 colors are BGR, so revert to RGB
height, width = image.shape[:2]
# Align the height to 640
height, width = 640, 640 * width // height
image = cv2.resize(image, (width,height))
# Append to images
images.append(image)
# Display images
plt.imshow(image)
plt.show()








Creating Correct Bounding Boxes
Download TensorFlow’s useful functions for creating bounding boxes in Colab from github.
[9]:
!wget "https://raw.githubusercontent.com/tensorflow/models/master/research/object_detection/utils/colab_utils.py"
import colab_utils
--2025-01-22 05:41:46-- https://raw.githubusercontent.com/tensorflow/models/master/research/object_detection/utils/colab_utils.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 19472 (19K) [text/plain]
Saving to: ‘colab_utils.py’
colab_utils.py 100%[===================>] 19.02K --.-KB/s in 0s
2025-01-22 05:41:46 (90.3 MB/s) - ‘colab_utils.py’ saved [19472/19472]
Creating bounding boxes for tunas
Drag on the image to create bounding boxes around tunas, then click the 「submit」 and 「next image」 buttons.
Note: In this task, there are only two classes of objects: tuna and other. If you want to distinguish between objects of multiple classes, you must generate separate bounding boxes for them.
[10]:
gt_boxes = []
colab_utils.annotate(images, box_storage_pointer=gt_boxes)
'--boxes array populated--'
'--boxes array populated--'
'--boxes array populated--'
'--boxes array populated--'
'--boxes array populated--'
The created gt_boxes
is a list of NumPy arrays of type (number of boxes, 4), where each row is
[ymin,xmin,ymax,xmax]
Each number is a relative position of a 1.0 x 1.0 square of the whole image.
[11]:
gt_boxes
[11]:
[]
Run the following cell if you don’t want to do it yourself.
[33]:
#gt_boxes will be overwritten if this cell is executed.
gt_boxes = [
np.array([[7.73333549e-03, 2.91996482e-01, 5.37733335e-01, 4.01934916e-01],
[2.07733335e-01, 1.84696570e-01, 3.51066669e-01, 4.50307828e-01],
[1.77733335e-01, 3.79947230e-01, 4.66066669e-01, 4.98680739e-01],
[4.51066669e-01, 6.32365875e-01, 6.11066669e-01, 8.97097625e-01],
[7.02733335e-01, 9.23482850e-02, 8.86066669e-01, 4.66138962e-01],
[5.61079543e-01, 8.79507476e-04, 7.19412877e-01, 1.02022867e-01],
[6.32746210e-01, 8.24098505e-01, 7.67746210e-01, 1.00000000e+00],
[9.27746210e-01, 3.87862797e-01, 9.99412877e-01, 6.48197010e-01],
[8.34412877e-01, 5.00439754e-01, 9.64412877e-01, 7.46701847e-01]]),
np.array([[0.25441288, 0.16622691, 0.35441288, 0.34652595],
[0.41774621, 0.00879507, 0.54107954, 0.21987687],
[0.40274621, 0.14423923, 0.63107954, 0.47141601],
[0.38774621, 0.25769569, 0.63607954, 0.66314864],
[0.28941288, 0.66226913, 0.60941288, 0.87247142],
[0.63441288, 0.80914688, 0.76441288, 0.99912049],
[0.71441288, 0.24538259, 0.92441288, 0.42832014],
[0.88941288, 0.23658751, 0.99607954, 0.46701847]]),
np.array([[0.25274621, 0. , 0.36607954, 0.19085312],
[0.25441288, 0.2823219 , 0.36941288, 0.51187335],
[0.39274621, 0.03869833, 0.65941288, 0.61477573],
[0.47107954, 0.52594547, 0.69607954, 0.88566403],
[0.68441288, 0.75373791, 0.88441288, 0.98592788]]),
np.array([[0.25274621, 0.15655233, 0.40441288, 0.52418646],
[0.40441288, 0.39753738, 0.60774621, 0.65435356],
[0.40274621, 0.76077397, 0.54274621, 0.92612137],
[0.44941288, 0.83377309, 0.78107954, 0.99736148],
[0.79274621, 0.86455585, 0.99441288, 0.99824099]]),
np.array([[0.25441288, 0.32014072, 0.29774621, 0.38346526],
[0.25607954, 0.3878628 , 0.29941288, 0.46437995],
[0.29274621, 0.39841689, 0.34274621, 0.49868074],
[0.27774621, 0.54265611, 0.36774621, 0.63324538],
[0.31607954, 0.64643799, 0.42774621, 0.69305189],
[0.41774621, 0.61301671, 0.47107954, 0.69832894],
[0.39607954, 0.57519789, 0.46274621, 0.63588391],
[0.44441288, 0.58399296, 0.54607954, 0.67194371],
[0.48441288, 0.69041337, 0.55774621, 0.73526825],
[0.58274621, 0.73702726, 0.66607954, 0.77572559],
[0.57274621, 0.64995602, 0.61107954, 0.72647318],
[0.56107954, 0.61565523, 0.67941288, 0.7299912 ],
[0.63607954, 0.58487247, 0.68107954, 0.66842568],
[0.60107954, 0.48988566, 0.68441288, 0.60598065],
[0.57607954, 0.5180299 , 0.60441288, 0.59806508],
[0.42274621, 0.3649956 , 0.56607954, 0.49868074],
[0.38774621, 0.43887423, 0.46274621, 0.54177661],
[0.38441288, 0.35708004, 0.44441288, 0.43007916],
[0.56607954, 0.21811785, 0.60774621, 0.2823219 ],
[0.56107954, 0.17502199, 0.61274621, 0.22691293],
[0.52941288, 0.07475814, 0.58441288, 0.16534741],
[0.43607954, 0.10905893, 0.50774621, 0.19525066],
[0.47107954, 0.33861038, 0.52107954, 0.38698329]]),
np.array([[0.27607954, 0.32717678, 0.36774621, 0.41248901],
[0.32441288, 0.43271768, 0.38274621, 0.52066843],
[0.25441288, 0.53034301, 0.32607954, 0.57783641],
[0.26941288, 0.57871592, 0.34441288, 0.64028144],
[0.36774621, 0.60422164, 0.45941288, 0.68161829],
[0.42274621, 0.62796834, 0.53607954, 0.68337731],
[0.50774621, 0.64291996, 0.57774621, 0.68601583],
[0.54107954, 0.70360598, 0.60941288, 0.74230431],
[0.63441288, 0.69305189, 0.70774621, 0.76077397],
[0.67607954, 0.70360598, 0.76107954, 0.75989446],
[0.66107954, 0.59014952, 0.75941288, 0.70360598],
[0.59441288, 0.5760774 , 0.65774621, 0.67370273],
[0.53441288, 0.45030783, 0.66941288, 0.58487247],
[0.43107954, 0.4819701 , 0.50441288, 0.59894459],
[0.46441288, 0.54177661, 0.53274621, 0.62884785],
[0.44774621, 0.38170624, 0.52107954, 0.46437995],
[0.48774621, 0.34036939, 0.55774621, 0.40369393],
[0.59941288, 0.26121372, 0.68441288, 0.33421284],
[0.62607954, 0.45030783, 0.67941288, 0.50307828],
[0.56107954, 0.35532102, 0.63607954, 0.46086192],
[0.52607954, 0.16007036, 0.59274621, 0.235708 ],
[0.60607954, 0.12137203, 0.66607954, 0.1882146 ],
[0.60941288, 0.05980651, 0.67441288, 0.12313105],
[0.52107954, 0.01319261, 0.57774621, 0.09234828],
[0.60941288, 0.21899736, 0.67274621, 0.28496042]]),
np.array([[0.35607954, 0.11433597, 0.45441288, 0.19788918],
[0.28607954, 0.2348285 , 0.38441288, 0.34476693],
[0.28107954, 0.38610378, 0.40941288, 0.47229551],
[0.26774621, 0.50043975, 0.37441288, 0.56992084],
[0.35607954, 0.48460862, 0.42774621, 0.5408971 ],
[0.36607954, 0.52682498, 0.45274621, 0.57783641],
[0.31107954, 0.58751099, 0.41441288, 0.63940193],
[0.41941288, 0.64643799, 0.58607954, 0.71328056],
[0.48441288, 0.55936675, 0.60274621, 0.6473175 ],
[0.50774621, 0.5171504 , 0.60441288, 0.59894459],
[0.61774621, 0.64116095, 0.72441288, 0.70888303],
[0.66774621, 0.56904134, 0.74441288, 0.65699208],
[0.61274621, 0.43711522, 0.71607954, 0.54969217],
[0.48607954, 0.45030783, 0.57107954, 0.53210202],
[0.55441288, 0.36323659, 0.63607954, 0.46086192],
[0.42274621, 0.32365875, 0.49774621, 0.43095866],
[0.46107954, 0.37291117, 0.51107954, 0.44327177],
[0.43607954, 0.17326297, 0.50274621, 0.26561126],
[0.52441288, 0.19349164, 0.61941288, 0.29639402]]),
np.array([[0.35607954, 0.14863676, 0.44107954, 0.21899736],
[0.32941288, 0.34564644, 0.41441288, 0.45822339],
[0.34441288, 0.4819701 , 0.45274621, 0.54969217],
[0.39607954, 0.51011434, 0.47107954, 0.56904134],
[0.32941288, 0.59630607, 0.42274621, 0.63764292],
[0.32107954, 0.58223395, 0.41107954, 0.62005277],
[0.40274621, 0.59806508, 0.51107954, 0.64643799],
[0.51774621, 0.75021988, 0.65774621, 0.79419525],
[0.55107954, 0.69129288, 0.67941288, 0.74318382],
[0.56607954, 0.59102902, 0.63441288, 0.67458223],
[0.53607954, 0.52418646, 0.64107954, 0.62445031],
[0.51274621, 0.45470536, 0.60274621, 0.54793316],
[0.65441288, 0.44327177, 0.76607954, 0.54969217],
[0.69441288, 0.58135444, 0.78607954, 0.69041337],
[0.57441288, 0.37994723, 0.65607954, 0.46525945],
[0.44941288, 0.34212841, 0.54441288, 0.44766931],
[0.54274621, 0.21723835, 0.61941288, 0.29639402],
[0.60107954, 0.11257696, 0.67941288, 0.21459982],
[0.63774621, 0.00351803, 0.70607954, 0.10905893]])]
Let’s overlay an image and a rectangle.
[34]:
# A function to display an image and a rectangle simultaneously
import matplotlib.pyplot as plt
def image_box(idx):
img = images[idx]
height, width = img.shape[:2]
boxes = gt_boxes[idx]
plt.figure(figsize=(10, 10))
plt.imshow(img)
# Drawing rectangles
currentAxis = plt.gca()
for _, box in enumerate(boxes):
ymin, xmin, ymax, xmax = box * np.array([height,width,height,width])
coords = (xmin, ymin), xmax-xmin+1, ymax-ymin+1
currentAxis.add_patch(plt.Rectangle(*coords, fill=False, edgecolor='yellow', linewidth=2))
plt.show()
[35]:
image_box(0)

Preparing the dataset class
Rewrite the constructor of the class torch.utils.data.Dataset
and the two methods __len__
and __getitem__
to fit our dataset.
Define __getitem__
to return the following items.
image: a PIL Image of size (H, W)
target: a dict containing the following fields
boxes
(FloatTensor[N, 4]
): the coordinates of theN
bounding boxes in[x0, y0, x1, y1]
format, ranging from0
toW
and0
toH
labels
(Int64Tensor[N]
): the label for each bounding boximage_id
(Int64Tensor[1]
): an image identifier.area
(Tensor[N]
): The area of the bounding box. This is used during evaluation with the COCO metric, to separate the metric scores between small, medium and large boxes.iscrowd
(UInt8Tensor[N]
): instances withiscrowd=True
will be ignored during evaluation.
[36]:
import os
import numpy as np
import torch
import torch.utils.data
from PIL import Image
class TunaDataset(torch.utils.data.Dataset):
def __init__(self, images, gt_boxes, transforms=None):
self.imgs = images
self.annotations = gt_boxes
self.transforms = transforms
def __getitem__(self, idx):
img = self.imgs[idx]
height, width = img.shape[:2]
# Restore the boxes that are standardized to [0,1].
boxes = self.annotations[idx] * np.array([height, width, height, width])
boxes = boxes[:,[1, 0, 3, 2]] # y0 x0 y1 x1 -> x0 y0 x1 y1
num_objs = len(boxes)
# Convert to PIL for using torchvisoin
img = Image.fromarray(img)
# Convert boxes to a Torch tensor
boxes = torch.as_tensor(boxes, dtype=torch.float32)
# Class labels: there is only one now
labels = torch.ones((num_objs,), dtype=torch.int64)
image_id = torch.tensor([idx])
area = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0])
# Whether or not the objects overlap: Assumed not now.
iscrowd = torch.zeros((num_objs,), dtype=torch.int64)
# Create a dictionary of correct data
target = {}
target["boxes"] = boxes
target["labels"] = labels
target["image_id"] = image_id
target["area"] = area
target["iscrowd"] = iscrowd
# Data augmentation
if self.transforms is not None:
img, target = self.transforms(img, target)
return img, target # images and bounding boxes
def __len__(self):
return len(self.imgs)
To see how this dataset class works, let’s apply it to the first few datasets, where transforms are not used.
As shown below, dataset[i]
outputs PIL image and annotation data in dictionary format for the \(i\)th image.
[37]:
dataset = TunaDataset(images, gt_boxes,)
for i in range(8):
print(i)
print(dataset[i])
plt.imshow(dataset[i][0])
plt.show()
0
(<PIL.Image.Image image mode=RGB size=1137x640 at 0x783DE0F62850>, {'boxes': tensor([[3.3200e+02, 4.9493e+00, 4.5700e+02, 3.4415e+02],
[2.1000e+02, 1.3295e+02, 5.1200e+02, 2.2468e+02],
[4.3200e+02, 1.1375e+02, 5.6700e+02, 2.9828e+02],
[7.1900e+02, 2.8868e+02, 1.0200e+03, 3.9108e+02],
[1.0500e+02, 4.4975e+02, 5.3000e+02, 5.6708e+02],
[1.0000e+00, 3.5909e+02, 1.1600e+02, 4.6042e+02],
[9.3700e+02, 4.0496e+02, 1.1370e+03, 4.9136e+02],
[4.4100e+02, 5.9376e+02, 7.3700e+02, 6.3962e+02],
[5.6900e+02, 5.3402e+02, 8.4900e+02, 6.1722e+02]]), 'labels': tensor([1, 1, 1, 1, 1, 1, 1, 1, 1]), 'image_id': tensor([0]), 'area': tensor([42399.9961, 27703.4629, 24911.9980, 30822.3984, 49866.6562, 11653.3350,
17279.9980, 13576.5430, 23296.0039]), 'iscrowd': tensor([0, 0, 0, 0, 0, 0, 0, 0, 0])})

1
(<PIL.Image.Image image mode=RGB size=1137x640 at 0x783DDA6B0A90>, {'boxes': tensor([[ 189.0000, 162.8242, 394.0000, 226.8242],
[ 10.0000, 267.3576, 250.0000, 346.2909],
[ 164.0000, 257.7576, 536.0000, 403.8909],
[ 293.0000, 248.1576, 754.0000, 407.0909],
[ 753.0000, 185.2242, 992.0000, 390.0242],
[ 920.0000, 406.0242, 1136.0000, 489.2242],
[ 279.0000, 457.2242, 487.0000, 591.6243],
[ 269.0000, 569.2242, 531.0000, 637.4909]]), 'labels': tensor([1, 1, 1, 1, 1, 1, 1, 1]), 'image_id': tensor([1]), 'area': tensor([13120.0000, 18943.9961, 54361.5977, 73268.2656, 48947.1953, 17971.2031,
27955.2051, 17885.8652]), 'iscrowd': tensor([0, 0, 0, 0, 0, 0, 0, 0])})

2
(<PIL.Image.Image image mode=RGB size=1137x640 at 0x783DE0627A90>, {'boxes': tensor([[ 0.0000, 161.7576, 217.0000, 234.2909],
[ 321.0000, 162.8242, 582.0000, 236.4242],
[ 44.0000, 251.3576, 699.0000, 422.0242],
[ 598.0000, 301.4909, 1007.0000, 445.4909],
[ 857.0000, 438.0242, 1121.0000, 566.0242]]), 'labels': tensor([1, 1, 1, 1, 1]), 'image_id': tensor([2]), 'area': tensor([ 15739.7354, 19209.5977, 111786.6562, 58896.0000, 33792.0000]), 'iscrowd': tensor([0, 0, 0, 0, 0])})

3
(<PIL.Image.Image image mode=RGB size=1137x640 at 0x783DE064D250>, {'boxes': tensor([[ 178.0000, 161.7576, 596.0000, 258.8242],
[ 452.0000, 258.8242, 744.0000, 388.9576],
[ 865.0000, 257.7576, 1053.0000, 347.3576],
[ 948.0000, 287.6242, 1134.0000, 499.8909],
[ 983.0000, 507.3576, 1135.0000, 636.4243]]), 'labels': tensor([1, 1, 1, 1, 1]), 'image_id': tensor([3]), 'area': tensor([40573.8711, 37998.9336, 16844.8008, 39481.5977, 19618.1348]), 'iscrowd': tensor([0, 0, 0, 0, 0])})

4
(<PIL.Image.Image image mode=RGB size=1137x640 at 0x783DDA66EE10>, {'boxes': tensor([[364.0000, 162.8242, 436.0000, 190.5576],
[441.0000, 163.8909, 528.0000, 191.6242],
[453.0000, 187.3576, 567.0000, 219.3576],
[617.0000, 177.7576, 720.0000, 235.3576],
[735.0000, 202.2909, 788.0000, 273.7576],
[697.0000, 267.3576, 794.0000, 301.4909],
[654.0000, 253.4909, 723.0000, 296.1576],
[664.0000, 284.4243, 764.0000, 349.4909],
[785.0000, 310.0242, 836.0000, 356.9576],
[838.0000, 372.9576, 882.0000, 426.2909],
[739.0000, 366.5576, 826.0000, 391.0909],
[700.0000, 359.0909, 830.0000, 434.8242],
[665.0000, 407.0909, 760.0000, 435.8909],
[557.0000, 384.6909, 689.0000, 438.0242],
[589.0000, 368.6909, 680.0000, 386.8242],
[415.0000, 270.5576, 567.0000, 362.2909],
[499.0000, 248.1576, 616.0000, 296.1576],
[406.0000, 246.0242, 489.0000, 284.4243],
[248.0000, 362.2909, 321.0000, 388.9576],
[199.0000, 359.0909, 258.0000, 392.1576],
[ 85.0000, 338.8242, 188.0000, 374.0242],
[124.0000, 279.0909, 222.0000, 324.9576],
[385.0000, 301.4909, 440.0000, 333.4909]]), 'labels': tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]), 'image_id': tensor([4]), 'area': tensor([ 1996.7992, 2412.8003, 3648.0000, 5932.8008, 3787.7329, 3310.9331,
2943.9993, 6506.6650, 2393.6008, 2346.6658, 2134.3994, 9845.3340,
2735.9988, 7039.9971, 1650.1332, 13943.4629, 5615.9980, 3187.2007,
1946.6682, 1950.9324, 3625.5979, 4494.9336, 1760.0000]), 'iscrowd': tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])})

5
(<PIL.Image.Image image mode=RGB size=1137x640 at 0x783DDA734CD0>, {'boxes': tensor([[372.0000, 176.6909, 469.0000, 235.3576],
[492.0000, 207.6242, 592.0000, 244.9576],
[603.0000, 162.8242, 657.0000, 208.6909],
[658.0000, 172.4242, 728.0000, 220.4242],
[687.0000, 235.3576, 775.0000, 294.0242],
[714.0000, 270.5576, 777.0000, 343.0909],
[731.0000, 324.9576, 780.0000, 369.7576],
[800.0000, 346.2909, 844.0000, 390.0242],
[788.0000, 406.0242, 865.0000, 452.9576],
[800.0000, 432.6909, 864.0000, 487.0909],
[671.0000, 423.0909, 800.0000, 486.0242],
[655.0000, 380.4243, 766.0000, 420.9576],
[512.0000, 342.0242, 665.0000, 428.4243],
[548.0000, 275.8909, 681.0000, 322.8242],
[616.0000, 297.2242, 715.0000, 340.9576],
[434.0000, 286.5576, 528.0000, 333.4909],
[387.0000, 312.1576, 459.0000, 356.9576],
[297.0000, 383.6242, 380.0000, 438.0242],
[512.0000, 400.6909, 572.0000, 434.8242],
[404.0000, 359.0909, 524.0000, 407.0909],
[182.0000, 336.6909, 268.0000, 379.3576],
[138.0000, 387.8909, 214.0000, 426.2909],
[ 68.0000, 390.0242, 140.0000, 431.6242],
[ 15.0000, 333.4909, 105.0000, 369.7576],
[249.0000, 390.0242, 324.0000, 430.5576]]), 'labels': tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1]), 'image_id': tensor([5]), 'area': tensor([ 5690.6670, 3733.3345, 2476.7993, 3360.0000, 5162.6660, 4569.5996,
2195.1995, 1924.2668, 3613.8679, 3481.5996, 8118.3979, 4499.1992,
13219.2041, 6242.1357, 4329.6006, 4411.7319, 3225.6013, 4515.1997,
2047.9999, 5760.0000, 3669.3325, 2918.3994, 2995.2004, 3263.9993,
3040.0017]), 'iscrowd': tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0])})

6
(<PIL.Image.Image image mode=RGB size=1137x640 at 0x783DE0F296D0>, {'boxes': tensor([[130.0000, 227.8909, 225.0000, 290.8242],
[267.0000, 183.0909, 392.0000, 246.0242],
[439.0000, 179.8909, 537.0000, 262.0242],
[569.0000, 171.3576, 648.0000, 239.6242],
[551.0000, 227.8909, 615.0000, 273.7576],
[599.0000, 234.2909, 657.0000, 289.7576],
[668.0000, 199.0909, 727.0000, 265.2242],
[735.0000, 268.4243, 811.0000, 375.0909],
[636.0000, 310.0242, 736.0000, 385.7576],
[588.0000, 324.9576, 681.0000, 386.8242],
[729.0000, 395.3576, 806.0000, 463.6242],
[647.0000, 427.3576, 747.0000, 476.4243],
[497.0000, 392.1576, 625.0000, 458.2909],
[512.0000, 311.0909, 605.0000, 365.4909],
[413.0000, 354.8242, 524.0000, 407.0909],
[368.0000, 270.5576, 490.0000, 318.5576],
[424.0000, 295.0909, 504.0000, 327.0909],
[197.0000, 279.0909, 302.0000, 321.7576],
[220.0000, 335.6242, 337.0000, 396.4243]]), 'labels': tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]), 'image_id': tensor([6]), 'area': tensor([5978.6680, 7866.6670, 8049.0664, 5393.0664, 2935.4668, 3217.0662,
3901.8665, 8106.6660, 7573.3340, 5753.6001, 5256.5332, 4906.6680,
8465.0664, 5059.1992, 5801.5996, 5856.0000, 2560.0000, 4479.9990,
7113.6021]), 'iscrowd': tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])})

7
(<PIL.Image.Image image mode=RGB size=1137x640 at 0x783DE1AE9AD0>, {'boxes': tensor([[169.0000, 227.8909, 249.0000, 282.2909],
[393.0000, 210.8242, 521.0000, 265.2242],
[548.0000, 220.4242, 625.0000, 289.7576],
[580.0000, 253.4909, 647.0000, 301.4909],
[678.0000, 210.8242, 725.0000, 270.5576],
[662.0000, 205.4909, 705.0000, 263.0909],
[680.0000, 257.7576, 735.0000, 327.0909],
[853.0000, 331.3576, 903.0000, 420.9576],
[786.0000, 352.6909, 845.0000, 434.8242],
[672.0000, 362.2909, 767.0000, 406.0242],
[596.0000, 343.0909, 710.0000, 410.2909],
[517.0000, 328.1576, 623.0000, 385.7576],
[504.0000, 418.8242, 625.0000, 490.2909],
[661.0000, 444.4243, 785.0000, 503.0909],
[432.0000, 367.6242, 529.0000, 419.8909],
[389.0000, 287.6242, 509.0000, 348.4243],
[247.0000, 347.3576, 337.0000, 396.4243],
[128.0000, 384.6909, 244.0000, 434.8242],
[ 4.0000, 408.1576, 124.0000, 451.8909]]), 'labels': tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]), 'image_id': tensor([7]), 'area': tensor([4351.9995, 6963.1992, 5338.6665, 3216.0000, 2807.4668, 2476.8003,
3813.3340, 4480.0005, 4845.8667, 4154.6670, 7660.7979, 6105.6006,
8647.4639, 7274.6655, 5069.8662, 7296.0020, 4416.0015, 5815.4663,
5248.0005]), 'iscrowd': tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])})

Datasets using transforms
Nest, we create Datasets with transforms. Downloadying fnctions for training and evaluation from https://github.com/pytorch/vision.git
[38]:
%cd /content
!wget https://raw.githubusercontent.com/pytorch/vision/v0.15.2/references/detection/utils.py -O utils.py
!wget https://raw.githubusercontent.com/pytorch/vision/v0.15.2/references/detection/transforms.py -O transforms.py
!wget https://raw.githubusercontent.com/pytorch/vision/v0.15.2/references/detection/engine.py -O engine.py
!wget https://raw.githubusercontent.com/pytorch/vision/v0.15.2/references/detection/coco_eval.py -O coco_eval.py
!wget https://raw.githubusercontent.com/pytorch/vision/v0.15.2/references/detection/coco_utils.py -O coco_utils.py
/content
--2025-01-22 05:46:17-- https://raw.githubusercontent.com/pytorch/vision/v0.15.2/references/detection/utils.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 8388 (8.2K) [text/plain]
Saving to: ‘utils.py’
utils.py 100%[===================>] 8.19K --.-KB/s in 0s
2025-01-22 05:46:17 (69.5 MB/s) - ‘utils.py’ saved [8388/8388]
--2025-01-22 05:46:17-- https://raw.githubusercontent.com/pytorch/vision/v0.15.2/references/detection/transforms.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 23337 (23K) [text/plain]
Saving to: ‘transforms.py’
transforms.py 100%[===================>] 22.79K --.-KB/s in 0s
2025-01-22 05:46:17 (130 MB/s) - ‘transforms.py’ saved [23337/23337]
--2025-01-22 05:46:18-- https://raw.githubusercontent.com/pytorch/vision/v0.15.2/references/detection/engine.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4032 (3.9K) [text/plain]
Saving to: ‘engine.py’
engine.py 100%[===================>] 3.94K --.-KB/s in 0s
2025-01-22 05:46:18 (69.2 MB/s) - ‘engine.py’ saved [4032/4032]
--2025-01-22 05:46:18-- https://raw.githubusercontent.com/pytorch/vision/v0.15.2/references/detection/coco_eval.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 6447 (6.3K) [text/plain]
Saving to: ‘coco_eval.py’
coco_eval.py 100%[===================>] 6.30K --.-KB/s in 0s
2025-01-22 05:46:18 (95.5 MB/s) - ‘coco_eval.py’ saved [6447/6447]
--2025-01-22 05:46:18-- https://raw.githubusercontent.com/pytorch/vision/v0.15.2/references/detection/coco_utils.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 8893 (8.7K) [text/plain]
Saving to: ‘coco_utils.py’
coco_utils.py 100%[===================>] 8.68K --.-KB/s in 0s
2025-01-22 05:46:18 (103 MB/s) - ‘coco_utils.py’ saved [8893/8893]
Define a function for data augmentation using the downloaded refereces/detection
functions. Left-right flipping and Zoom out are performed here.
[39]:
from engine import train_one_epoch, evaluate
import utils
import torch.nn as nn
import transforms as T
# torchvision v0.15.2 does not provide ToTensor in transformes, so create your own.
class ToTensor(nn.Module):
def forward(self, image,target):
image = torchvision.transforms.functional.to_tensor(image)
return image, target
def get_transform(train):
transforms = []
# converts the image, a PIL image, into a PyTorch Tensor
transforms.append(ToTensor())
if train:
# during training, randomly flip the training images
# and ground-truth for data augmentation
transforms.append(T.RandomHorizontalFlip(0.5))
transforms.append(T.RandomZoomOut(side_range = (1.0,2.0),p=0.5))
return T.Compose(transforms)
Note: Sizing and normalization of the image data is included in the Faster R-CNN model and is not necessary here.
Creating Datasets and DataLoaders
Creating Datasets and DataLoaders with the first 6 out of 8 images as training data and the last 2 as evaluation data (=test data).
[40]:
# create datasets using differnt transformations
dataset = TunaDataset(images, gt_boxes, get_transform(train=True))
dataset_test = TunaDataset(images, gt_boxes, get_transform(train=False))
# Separate data into training data and evaluation data
torch.manual_seed(1)
indices = torch.randperm(len(dataset)).tolist()
dataset = torch.utils.data.Subset(dataset, indices[2:]) # Note that the taking out order is in reverse order.
dataset_test = torch.utils.data.Subset(dataset_test, indices[:2])
# Create a training data loader and an evaluation data loader
data_loader = torch.utils.data.DataLoader(
dataset, batch_size=2, shuffle=True, num_workers=2,
collate_fn=utils.collate_fn)
data_loader_test = torch.utils.data.DataLoader(
dataset_test, batch_size=1, shuffle=False, num_workers=2,
collate_fn=utils.collate_fn)
7.2. Loading and configuring a Faster R-CNN model
The Faster R-CNN used in this notebook is a model that outputs a set of rectangles in an image that are likely to contain the object being classified and the probability of the object being contained in the rectangle. (For example, the red rectangle in the image below has a 0.9 probability of being a cowboy, a 0.1 probability of being a jockey, etc.).

Here, let’s retrain the model for the tuna dataset previously trained on the COCO dataset. Since the number of classes to be classified is different, the last output layer should be redefined and trained.
[41]:
import torchvision
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor
# Loading the pretrained model
model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
The model architecture is as follows.
Input
|
(backbone): Residual Network + Feature Pyramid Network
|
(rpn): RegionProposalNetwork: Proposing bounding boxes
|
(roi_heads): RoIHeads:
(box_head): TwoMLPHead: two affine layers
|
(box_predictor): FastRCNNPredictor
|-(cls_score): predicting the class (output 1)
--(bbox_pred): predicting bounding boxes (output 2)
[42]:
model
[42]:
FasterRCNN(
(transform): GeneralizedRCNNTransform(
Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
Resize(min_size=(800,), max_size=1333, mode='bilinear')
)
(backbone): BackboneWithFPN(
(body): IntermediateLayerGetter(
(conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
(bn1): FrozenBatchNorm2d(64, eps=0.0)
(relu): ReLU(inplace=True)
(maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
(layer1): Sequential(
(0): Bottleneck(
(conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d(64, eps=0.0)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d(64, eps=0.0)
(conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d(256, eps=0.0)
(relu): ReLU(inplace=True)
(downsample): Sequential(
(0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): FrozenBatchNorm2d(256, eps=0.0)
)
)
(1): Bottleneck(
(conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d(64, eps=0.0)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d(64, eps=0.0)
(conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d(256, eps=0.0)
(relu): ReLU(inplace=True)
)
(2): Bottleneck(
(conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d(64, eps=0.0)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d(64, eps=0.0)
(conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d(256, eps=0.0)
(relu): ReLU(inplace=True)
)
)
(layer2): Sequential(
(0): Bottleneck(
(conv1): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d(128, eps=0.0)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d(128, eps=0.0)
(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d(512, eps=0.0)
(relu): ReLU(inplace=True)
(downsample): Sequential(
(0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): FrozenBatchNorm2d(512, eps=0.0)
)
)
(1): Bottleneck(
(conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d(128, eps=0.0)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d(128, eps=0.0)
(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d(512, eps=0.0)
(relu): ReLU(inplace=True)
)
(2): Bottleneck(
(conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d(128, eps=0.0)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d(128, eps=0.0)
(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d(512, eps=0.0)
(relu): ReLU(inplace=True)
)
(3): Bottleneck(
(conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d(128, eps=0.0)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d(128, eps=0.0)
(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d(512, eps=0.0)
(relu): ReLU(inplace=True)
)
)
(layer3): Sequential(
(0): Bottleneck(
(conv1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d(256, eps=0.0)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d(256, eps=0.0)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d(1024, eps=0.0)
(relu): ReLU(inplace=True)
(downsample): Sequential(
(0): Conv2d(512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): FrozenBatchNorm2d(1024, eps=0.0)
)
)
(1): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d(256, eps=0.0)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d(256, eps=0.0)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d(1024, eps=0.0)
(relu): ReLU(inplace=True)
)
(2): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d(256, eps=0.0)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d(256, eps=0.0)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d(1024, eps=0.0)
(relu): ReLU(inplace=True)
)
(3): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d(256, eps=0.0)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d(256, eps=0.0)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d(1024, eps=0.0)
(relu): ReLU(inplace=True)
)
(4): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d(256, eps=0.0)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d(256, eps=0.0)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d(1024, eps=0.0)
(relu): ReLU(inplace=True)
)
(5): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d(256, eps=0.0)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d(256, eps=0.0)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d(1024, eps=0.0)
(relu): ReLU(inplace=True)
)
)
(layer4): Sequential(
(0): Bottleneck(
(conv1): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d(512, eps=0.0)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d(512, eps=0.0)
(conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d(2048, eps=0.0)
(relu): ReLU(inplace=True)
(downsample): Sequential(
(0): Conv2d(1024, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): FrozenBatchNorm2d(2048, eps=0.0)
)
)
(1): Bottleneck(
(conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d(512, eps=0.0)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d(512, eps=0.0)
(conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d(2048, eps=0.0)
(relu): ReLU(inplace=True)
)
(2): Bottleneck(
(conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d(512, eps=0.0)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d(512, eps=0.0)
(conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d(2048, eps=0.0)
(relu): ReLU(inplace=True)
)
)
)
(fpn): FeaturePyramidNetwork(
(inner_blocks): ModuleList(
(0): Conv2dNormActivation(
(0): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
)
(1): Conv2dNormActivation(
(0): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1))
)
(2): Conv2dNormActivation(
(0): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1))
)
(3): Conv2dNormActivation(
(0): Conv2d(2048, 256, kernel_size=(1, 1), stride=(1, 1))
)
)
(layer_blocks): ModuleList(
(0-3): 4 x Conv2dNormActivation(
(0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
)
(extra_blocks): LastLevelMaxPool()
)
)
(rpn): RegionProposalNetwork(
(anchor_generator): AnchorGenerator()
(head): RPNHead(
(conv): Sequential(
(0): Conv2dNormActivation(
(0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace=True)
)
)
(cls_logits): Conv2d(256, 3, kernel_size=(1, 1), stride=(1, 1))
(bbox_pred): Conv2d(256, 12, kernel_size=(1, 1), stride=(1, 1))
)
)
(roi_heads): RoIHeads(
(box_roi_pool): MultiScaleRoIAlign(featmap_names=['0', '1', '2', '3'], output_size=(7, 7), sampling_ratio=2)
(box_head): TwoMLPHead(
(fc6): Linear(in_features=12544, out_features=1024, bias=True)
(fc7): Linear(in_features=1024, out_features=1024, bias=True)
)
(box_predictor): FastRCNNPredictor(
(cls_score): Linear(in_features=1024, out_features=91, bias=True)
(bbox_pred): Linear(in_features=1024, out_features=364, bias=True)
)
)
)
The last (box_predictor) is the output part of the model, called the header.
The output dimension of (cls_score) is the number of predicted classes
(bbox_pred) is the predicted bounding boxes (number of classes x 4)
We rewrite the number of predictor classes in roi_heads.box_predictor
to 2.
[43]:
# Number of classes to predict (counting background as well)
num_classes = 2 # 1 class (Tuna) + background
# get number of input features for the classifier
in_features = model.roi_heads.box_predictor.cls_score.in_features #1024
# replace the pre-trained head with a new one
model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)
[44]:
model.roi_heads.box_predictor
[44]:
FastRCNNPredictor(
(cls_score): Linear(in_features=1024, out_features=2, bias=True)
(bbox_pred): Linear(in_features=1024, out_features=8, bias=True)
)
Enables the use of GPUs and defines the optimization function.
[45]:
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
model.to(device)
# construct an optimizer
params = [p for p in model.parameters() if p.requires_grad]
optimizer = torch.optim.Adam(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0, amsgrad=False)
# learning rate scheduler which decreases the learning rate by * 0.2 every 20 epochs
lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer,
step_size=20,
gamma=0.2)
Training with fixed backbone parameters
The Faster RCNN model consists of three parts: backbone, rpn, and roi_heads.
Fix the parameters of the backbone and do not train it.
[46]:
for parameter in model.backbone.parameters():
parameter.requires_grad=False
Training the model for 60 epochs.
[26]:
# let's train it for 60 epochs
num_epochs = 60
for epoch in range(num_epochs):
# train for one epoch, printing every 10 iterations
train_one_epoch(model, optimizer, data_loader, device, epoch, print_freq=3)
# update the learning rate
lr_scheduler.step()
# evaluate on the test dataset
if (epoch+1) % 10 == 0:
evaluate(model, data_loader_test, device=device)
/content/engine.py:30: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
with torch.cuda.amp.autocast(enabled=scaler is not None):
Epoch: [0] [0/3] eta: 0:00:10 lr: 0.000500 loss: 1.8902 (1.8902) loss_classifier: 0.5949 (0.5949) loss_box_reg: 0.8558 (0.8558) loss_objectness: 0.3664 (0.3664) loss_rpn_box_reg: 0.0731 (0.0731) time: 3.6491 data: 0.1697 max mem: 1080
Epoch: [0] [2/3] eta: 0:00:01 lr: 0.001000 loss: 1.8902 (1.8451) loss_classifier: 0.5949 (0.8078) loss_box_reg: 0.6996 (0.7164) loss_objectness: 0.2163 (0.2536) loss_rpn_box_reg: 0.0722 (0.0674) time: 1.4319 data: 0.0638 max mem: 1206
Epoch: [0] Total time: 0:00:04 (1.4423 s / it)
Epoch: [1] [0/3] eta: 0:00:01 lr: 0.001000 loss: 0.8244 (0.8244) loss_classifier: 0.2032 (0.2032) loss_box_reg: 0.5002 (0.5002) loss_objectness: 0.0602 (0.0602) loss_rpn_box_reg: 0.0608 (0.0608) time: 0.5173 data: 0.2388 max mem: 1227
Epoch: [1] [2/3] eta: 0:00:00 lr: 0.001000 loss: 1.0763 (1.0429) loss_classifier: 0.3084 (0.2802) loss_box_reg: 0.5706 (0.5516) loss_objectness: 0.1251 (0.1371) loss_rpn_box_reg: 0.0608 (0.0740) time: 0.3628 data: 0.0885 max mem: 1227
Epoch: [1] Total time: 0:00:01 (0.3755 s / it)
Epoch: [2] [0/3] eta: 0:00:01 lr: 0.001000 loss: 0.9332 (0.9332) loss_classifier: 0.2537 (0.2537) loss_box_reg: 0.4144 (0.4144) loss_objectness: 0.2111 (0.2111) loss_rpn_box_reg: 0.0540 (0.0540) time: 0.4445 data: 0.1683 max mem: 1227
Epoch: [2] [2/3] eta: 0:00:00 lr: 0.001000 loss: 1.2907 (1.1945) loss_classifier: 0.3852 (0.3543) loss_box_reg: 0.6988 (0.6375) loss_objectness: 0.0847 (0.1209) loss_rpn_box_reg: 0.0692 (0.0818) time: 0.3332 data: 0.0604 max mem: 1227
Epoch: [2] Total time: 0:00:01 (0.3472 s / it)
Epoch: [3] [0/3] eta: 0:00:01 lr: 0.001000 loss: 1.1568 (1.1568) loss_classifier: 0.2954 (0.2954) loss_box_reg: 0.6887 (0.6887) loss_objectness: 0.0618 (0.0618) loss_rpn_box_reg: 0.1108 (0.1108) time: 0.5185 data: 0.2402 max mem: 1227
Epoch: [3] [2/3] eta: 0:00:00 lr: 0.001000 loss: 1.1568 (1.1078) loss_classifier: 0.2954 (0.2858) loss_box_reg: 0.6887 (0.6319) loss_objectness: 0.1080 (0.1109) loss_rpn_box_reg: 0.1007 (0.0792) time: 0.3599 data: 0.0879 max mem: 1227
Epoch: [3] Total time: 0:00:01 (0.3738 s / it)
Epoch: [4] [0/3] eta: 0:00:01 lr: 0.001000 loss: 1.0901 (1.0901) loss_classifier: 0.3181 (0.3181) loss_box_reg: 0.6547 (0.6547) loss_objectness: 0.0668 (0.0668) loss_rpn_box_reg: 0.0505 (0.0505) time: 0.4451 data: 0.1682 max mem: 1227
Epoch: [4] [2/3] eta: 0:00:00 lr: 0.001000 loss: 1.0901 (1.1176) loss_classifier: 0.3181 (0.3182) loss_box_reg: 0.6547 (0.6810) loss_objectness: 0.0443 (0.0483) loss_rpn_box_reg: 0.0505 (0.0701) time: 0.3342 data: 0.0604 max mem: 1227
Epoch: [4] Total time: 0:00:01 (0.3479 s / it)
Epoch: [5] [0/3] eta: 0:00:01 lr: 0.001000 loss: 0.9763 (0.9763) loss_classifier: 0.3232 (0.3232) loss_box_reg: 0.5184 (0.5184) loss_objectness: 0.0649 (0.0649) loss_rpn_box_reg: 0.0698 (0.0698) time: 0.4878 data: 0.2035 max mem: 1227
Epoch: [5] [2/3] eta: 0:00:00 lr: 0.001000 loss: 0.9785 (1.0504) loss_classifier: 0.3094 (0.2981) loss_box_reg: 0.5665 (0.6149) loss_objectness: 0.0649 (0.0713) loss_rpn_box_reg: 0.0696 (0.0660) time: 0.3528 data: 0.0764 max mem: 1229
Epoch: [5] Total time: 0:00:01 (0.3656 s / it)
Epoch: [6] [0/3] eta: 0:00:01 lr: 0.001000 loss: 1.0587 (1.0587) loss_classifier: 0.3190 (0.3190) loss_box_reg: 0.6667 (0.6667) loss_objectness: 0.0226 (0.0226) loss_rpn_box_reg: 0.0505 (0.0505) time: 0.4497 data: 0.1610 max mem: 1229
Epoch: [6] [2/3] eta: 0:00:00 lr: 0.001000 loss: 1.0587 (0.9518) loss_classifier: 0.3102 (0.2870) loss_box_reg: 0.6667 (0.5828) loss_objectness: 0.0226 (0.0388) loss_rpn_box_reg: 0.0505 (0.0431) time: 0.3406 data: 0.0613 max mem: 1229
Epoch: [6] Total time: 0:00:01 (0.3531 s / it)
Epoch: [7] [0/3] eta: 0:00:01 lr: 0.001000 loss: 1.0202 (1.0202) loss_classifier: 0.2977 (0.2977) loss_box_reg: 0.6502 (0.6502) loss_objectness: 0.0235 (0.0235) loss_rpn_box_reg: 0.0488 (0.0488) time: 0.4964 data: 0.2107 max mem: 1229
Epoch: [7] [2/3] eta: 0:00:00 lr: 0.001000 loss: 0.9203 (0.9387) loss_classifier: 0.2828 (0.2819) loss_box_reg: 0.5471 (0.5808) loss_objectness: 0.0235 (0.0316) loss_rpn_box_reg: 0.0453 (0.0444) time: 0.3521 data: 0.0765 max mem: 1229
Epoch: [7] Total time: 0:00:01 (0.3651 s / it)
Epoch: [8] [0/3] eta: 0:00:01 lr: 0.001000 loss: 0.9887 (0.9887) loss_classifier: 0.2585 (0.2585) loss_box_reg: 0.6257 (0.6257) loss_objectness: 0.0181 (0.0181) loss_rpn_box_reg: 0.0864 (0.0864) time: 0.4385 data: 0.1531 max mem: 1229
Epoch: [8] [2/3] eta: 0:00:00 lr: 0.001000 loss: 0.8732 (0.8746) loss_classifier: 0.2528 (0.2499) loss_box_reg: 0.5375 (0.5476) loss_objectness: 0.0181 (0.0306) loss_rpn_box_reg: 0.0288 (0.0465) time: 0.3343 data: 0.0586 max mem: 1229
Epoch: [8] Total time: 0:00:01 (0.3474 s / it)
Epoch: [9] [0/3] eta: 0:00:01 lr: 0.001000 loss: 1.0244 (1.0244) loss_classifier: 0.2691 (0.2691) loss_box_reg: 0.6695 (0.6695) loss_objectness: 0.0098 (0.0098) loss_rpn_box_reg: 0.0760 (0.0760) time: 0.4352 data: 0.1578 max mem: 1229
Epoch: [9] [2/3] eta: 0:00:00 lr: 0.001000 loss: 0.9754 (0.8399) loss_classifier: 0.2691 (0.2463) loss_box_reg: 0.6156 (0.5294) loss_objectness: 0.0154 (0.0178) loss_rpn_box_reg: 0.0490 (0.0464) time: 0.3346 data: 0.0589 max mem: 1229
Epoch: [9] Total time: 0:00:01 (0.3558 s / it)
creating index...
index created!
Test: [0/2] eta: 0:00:00 model_time: 0.1457 (0.1457) evaluator_time: 0.0793 (0.0793) time: 0.3673 data: 0.1389 max mem: 1229
Test: [1/2] eta: 0:00:00 model_time: 0.1108 (0.1283) evaluator_time: 0.0699 (0.0746) time: 0.2777 data: 0.0717 max mem: 1229
Test: Total time: 0:00:00 (0.3089 s / it)
Averaged stats: model_time: 0.1108 (0.1283) evaluator_time: 0.0699 (0.0746)
Accumulating evaluation results...
DONE (t=0.01s).
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.192
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.495
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.127
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.212
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.259
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.017
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.127
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.362
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.351
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.533
Epoch: [10] [0/3] eta: 0:00:01 lr: 0.001000 loss: 0.7365 (0.7365) loss_classifier: 0.2058 (0.2058) loss_box_reg: 0.4802 (0.4802) loss_objectness: 0.0237 (0.0237) loss_rpn_box_reg: 0.0268 (0.0268) time: 0.5833 data: 0.2989 max mem: 1229
Epoch: [10] [2/3] eta: 0:00:00 lr: 0.001000 loss: 0.8215 (0.8197) loss_classifier: 0.2505 (0.2417) loss_box_reg: 0.4828 (0.5104) loss_objectness: 0.0237 (0.0259) loss_rpn_box_reg: 0.0402 (0.0417) time: 0.3920 data: 0.1095 max mem: 1229
Epoch: [10] Total time: 0:00:01 (0.4121 s / it)
Epoch: [11] [0/3] eta: 0:00:01 lr: 0.001000 loss: 0.9869 (0.9869) loss_classifier: 0.2965 (0.2965) loss_box_reg: 0.6094 (0.6094) loss_objectness: 0.0216 (0.0216) loss_rpn_box_reg: 0.0595 (0.0595) time: 0.5604 data: 0.2807 max mem: 1229
Epoch: [11] [2/3] eta: 0:00:00 lr: 0.001000 loss: 0.8405 (0.7693) loss_classifier: 0.2176 (0.2239) loss_box_reg: 0.5587 (0.4852) loss_objectness: 0.0216 (0.0235) loss_rpn_box_reg: 0.0370 (0.0367) time: 0.3780 data: 0.1010 max mem: 1229
Epoch: [11] Total time: 0:00:01 (0.3914 s / it)
Epoch: [12] [0/3] eta: 0:00:01 lr: 0.001000 loss: 0.9518 (0.9518) loss_classifier: 0.2963 (0.2963) loss_box_reg: 0.6003 (0.6003) loss_objectness: 0.0071 (0.0071) loss_rpn_box_reg: 0.0481 (0.0481) time: 0.4466 data: 0.1647 max mem: 1229
Epoch: [12] [2/3] eta: 0:00:00 lr: 0.001000 loss: 0.7904 (0.7366) loss_classifier: 0.2150 (0.2177) loss_box_reg: 0.5211 (0.4717) loss_objectness: 0.0071 (0.0112) loss_rpn_box_reg: 0.0481 (0.0361) time: 0.3342 data: 0.0594 max mem: 1229
Epoch: [12] Total time: 0:00:01 (0.3472 s / it)
Epoch: [13] [0/3] eta: 0:00:01 lr: 0.001000 loss: 0.8051 (0.8051) loss_classifier: 0.2494 (0.2494) loss_box_reg: 0.4565 (0.4565) loss_objectness: 0.0586 (0.0586) loss_rpn_box_reg: 0.0406 (0.0406) time: 0.4817 data: 0.1978 max mem: 1229
Epoch: [13] [2/3] eta: 0:00:00 lr: 0.001000 loss: 0.8051 (0.7961) loss_classifier: 0.2494 (0.2369) loss_box_reg: 0.5011 (0.4872) loss_objectness: 0.0268 (0.0363) loss_rpn_box_reg: 0.0349 (0.0357) time: 0.3461 data: 0.0718 max mem: 1229
Epoch: [13] Total time: 0:00:01 (0.3599 s / it)
Epoch: [14] [0/3] eta: 0:00:01 lr: 0.001000 loss: 0.7793 (0.7793) loss_classifier: 0.2161 (0.2161) loss_box_reg: 0.4290 (0.4290) loss_objectness: 0.1121 (0.1121) loss_rpn_box_reg: 0.0221 (0.0221) time: 0.4840 data: 0.1951 max mem: 1229
Epoch: [14] [2/3] eta: 0:00:00 lr: 0.001000 loss: 0.7793 (0.7687) loss_classifier: 0.2161 (0.2209) loss_box_reg: 0.4336 (0.4639) loss_objectness: 0.0272 (0.0484) loss_rpn_box_reg: 0.0246 (0.0354) time: 0.3485 data: 0.0702 max mem: 1229
Epoch: [14] Total time: 0:00:01 (0.3619 s / it)
Epoch: [15] [0/3] eta: 0:00:01 lr: 0.001000 loss: 0.8790 (0.8790) loss_classifier: 0.2291 (0.2291) loss_box_reg: 0.5508 (0.5508) loss_objectness: 0.0379 (0.0379) loss_rpn_box_reg: 0.0612 (0.0612) time: 0.4715 data: 0.1911 max mem: 1229
Epoch: [15] [2/3] eta: 0:00:00 lr: 0.001000 loss: 0.7833 (0.7218) loss_classifier: 0.1974 (0.1948) loss_box_reg: 0.5091 (0.4547) loss_objectness: 0.0379 (0.0338) loss_rpn_box_reg: 0.0372 (0.0384) time: 0.3469 data: 0.0723 max mem: 1229
Epoch: [15] Total time: 0:00:01 (0.3609 s / it)
Epoch: [16] [0/3] eta: 0:00:01 lr: 0.001000 loss: 0.7364 (0.7364) loss_classifier: 0.2080 (0.2080) loss_box_reg: 0.4769 (0.4769) loss_objectness: 0.0258 (0.0258) loss_rpn_box_reg: 0.0257 (0.0257) time: 0.4415 data: 0.1553 max mem: 1229
Epoch: [16] [2/3] eta: 0:00:00 lr: 0.001000 loss: 0.6977 (0.6790) loss_classifier: 0.1807 (0.1832) loss_box_reg: 0.4671 (0.4407) loss_objectness: 0.0202 (0.0217) loss_rpn_box_reg: 0.0308 (0.0334) time: 0.3357 data: 0.0597 max mem: 1229
Epoch: [16] Total time: 0:00:01 (0.3499 s / it)
Epoch: [17] [0/3] eta: 0:00:01 lr: 0.001000 loss: 0.7422 (0.7422) loss_classifier: 0.1905 (0.1905) loss_box_reg: 0.5147 (0.5147) loss_objectness: 0.0153 (0.0153) loss_rpn_box_reg: 0.0218 (0.0218) time: 0.4396 data: 0.1594 max mem: 1229
Epoch: [17] [2/3] eta: 0:00:00 lr: 0.001000 loss: 0.6315 (0.6534) loss_classifier: 0.1869 (0.1850) loss_box_reg: 0.3986 (0.4201) loss_objectness: 0.0153 (0.0149) loss_rpn_box_reg: 0.0238 (0.0334) time: 0.3367 data: 0.0595 max mem: 1229
Epoch: [17] Total time: 0:00:01 (0.3630 s / it)
Epoch: [18] [0/3] eta: 0:00:01 lr: 0.001000 loss: 0.5843 (0.5843) loss_classifier: 0.1814 (0.1814) loss_box_reg: 0.3520 (0.3520) loss_objectness: 0.0247 (0.0247) loss_rpn_box_reg: 0.0263 (0.0263) time: 0.6021 data: 0.3093 max mem: 1229
Epoch: [18] [2/3] eta: 0:00:00 lr: 0.001000 loss: 0.6493 (0.6501) loss_classifier: 0.1980 (0.1931) loss_box_reg: 0.4117 (0.4128) loss_objectness: 0.0154 (0.0146) loss_rpn_box_reg: 0.0263 (0.0295) time: 0.3890 data: 0.1090 max mem: 1229
Epoch: [18] Total time: 0:00:01 (0.4035 s / it)
Epoch: [19] [0/3] eta: 0:00:01 lr: 0.001000 loss: 0.7670 (0.7670) loss_classifier: 0.2409 (0.2409) loss_box_reg: 0.4776 (0.4776) loss_objectness: 0.0189 (0.0189) loss_rpn_box_reg: 0.0296 (0.0296) time: 0.4333 data: 0.1542 max mem: 1229
Epoch: [19] [2/3] eta: 0:00:00 lr: 0.001000 loss: 0.5790 (0.6416) loss_classifier: 0.1520 (0.1754) loss_box_reg: 0.3947 (0.4216) loss_objectness: 0.0189 (0.0230) loss_rpn_box_reg: 0.0221 (0.0215) time: 0.3304 data: 0.0556 max mem: 1229
Epoch: [19] Total time: 0:00:01 (0.3437 s / it)
creating index...
index created!
Test: [0/2] eta: 0:00:00 model_time: 0.1138 (0.1138) evaluator_time: 0.0222 (0.0222) time: 0.2286 data: 0.0892 max mem: 1229
Test: [1/2] eta: 0:00:00 model_time: 0.1125 (0.1131) evaluator_time: 0.0222 (0.0299) time: 0.1933 data: 0.0472 max mem: 1229
Test: Total time: 0:00:00 (0.2165 s / it)
Averaged stats: model_time: 0.1125 (0.1131) evaluator_time: 0.0222 (0.0299)
Accumulating evaluation results...
DONE (t=0.01s).
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.162
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.486
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.022
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.169
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.429
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.019
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.127
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.327
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.309
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.600
Epoch: [20] [0/3] eta: 0:00:01 lr: 0.000200 loss: 0.7972 (0.7972) loss_classifier: 0.2144 (0.2144) loss_box_reg: 0.4914 (0.4914) loss_objectness: 0.0168 (0.0168) loss_rpn_box_reg: 0.0746 (0.0746) time: 0.6242 data: 0.3028 max mem: 1229
Epoch: [20] [2/3] eta: 0:00:00 lr: 0.000200 loss: 0.7920 (0.6892) loss_classifier: 0.2144 (0.2046) loss_box_reg: 0.4462 (0.4137) loss_objectness: 0.0386 (0.0315) loss_rpn_box_reg: 0.0305 (0.0393) time: 0.4049 data: 0.1134 max mem: 1229
Epoch: [20] Total time: 0:00:01 (0.4285 s / it)
Epoch: [21] [0/3] eta: 0:00:01 lr: 0.000200 loss: 0.5600 (0.5600) loss_classifier: 0.1859 (0.1859) loss_box_reg: 0.3341 (0.3341) loss_objectness: 0.0176 (0.0176) loss_rpn_box_reg: 0.0224 (0.0224) time: 0.5890 data: 0.2837 max mem: 1229
Epoch: [21] [2/3] eta: 0:00:00 lr: 0.000200 loss: 0.5656 (0.5958) loss_classifier: 0.1823 (0.1790) loss_box_reg: 0.3359 (0.3705) loss_objectness: 0.0174 (0.0169) loss_rpn_box_reg: 0.0317 (0.0294) time: 0.3900 data: 0.1015 max mem: 1229
Epoch: [21] Total time: 0:00:01 (0.4166 s / it)
Epoch: [22] [0/3] eta: 0:00:01 lr: 0.000200 loss: 0.6212 (0.6212) loss_classifier: 0.2074 (0.2074) loss_box_reg: 0.3868 (0.3868) loss_objectness: 0.0057 (0.0057) loss_rpn_box_reg: 0.0213 (0.0213) time: 0.5261 data: 0.2464 max mem: 1229
Epoch: [22] [2/3] eta: 0:00:00 lr: 0.000200 loss: 0.6212 (0.5568) loss_classifier: 0.1984 (0.1714) loss_box_reg: 0.3868 (0.3518) loss_objectness: 0.0115 (0.0158) loss_rpn_box_reg: 0.0213 (0.0178) time: 0.3614 data: 0.0872 max mem: 1229
Epoch: [22] Total time: 0:00:01 (0.3737 s / it)
Epoch: [23] [0/3] eta: 0:00:01 lr: 0.000200 loss: 0.7325 (0.7325) loss_classifier: 0.2596 (0.2596) loss_box_reg: 0.4308 (0.4308) loss_objectness: 0.0097 (0.0097) loss_rpn_box_reg: 0.0323 (0.0323) time: 0.5438 data: 0.2588 max mem: 1237
Epoch: [23] [2/3] eta: 0:00:00 lr: 0.000200 loss: 0.6265 (0.6230) loss_classifier: 0.1891 (0.1958) loss_box_reg: 0.3961 (0.3889) loss_objectness: 0.0097 (0.0111) loss_rpn_box_reg: 0.0273 (0.0272) time: 0.3745 data: 0.0951 max mem: 1237
Epoch: [23] Total time: 0:00:01 (0.3878 s / it)
Epoch: [24] [0/3] eta: 0:00:01 lr: 0.000200 loss: 0.4957 (0.4957) loss_classifier: 0.1469 (0.1469) loss_box_reg: 0.3258 (0.3258) loss_objectness: 0.0061 (0.0061) loss_rpn_box_reg: 0.0168 (0.0168) time: 0.4337 data: 0.1553 max mem: 1237
Epoch: [24] [2/3] eta: 0:00:00 lr: 0.000200 loss: 0.5271 (0.6090) loss_classifier: 0.1688 (0.1813) loss_box_reg: 0.3264 (0.3816) loss_objectness: 0.0078 (0.0167) loss_rpn_box_reg: 0.0241 (0.0294) time: 0.3324 data: 0.0562 max mem: 1237
Epoch: [24] Total time: 0:00:01 (0.3459 s / it)
Epoch: [25] [0/3] eta: 0:00:01 lr: 0.000200 loss: 0.4198 (0.4198) loss_classifier: 0.1238 (0.1238) loss_box_reg: 0.2560 (0.2560) loss_objectness: 0.0204 (0.0204) loss_rpn_box_reg: 0.0195 (0.0195) time: 0.4338 data: 0.1520 max mem: 1237
Epoch: [25] [2/3] eta: 0:00:00 lr: 0.000200 loss: 0.5124 (0.5617) loss_classifier: 0.1688 (0.1849) loss_box_reg: 0.3099 (0.3427) loss_objectness: 0.0167 (0.0145) loss_rpn_box_reg: 0.0195 (0.0196) time: 0.3344 data: 0.0576 max mem: 1237
Epoch: [25] Total time: 0:00:01 (0.3479 s / it)
Epoch: [26] [0/3] eta: 0:00:01 lr: 0.000200 loss: 0.5882 (0.5882) loss_classifier: 0.1748 (0.1748) loss_box_reg: 0.3733 (0.3733) loss_objectness: 0.0100 (0.0100) loss_rpn_box_reg: 0.0301 (0.0301) time: 0.4719 data: 0.1889 max mem: 1237
Epoch: [26] [2/3] eta: 0:00:00 lr: 0.000200 loss: 0.5882 (0.5756) loss_classifier: 0.1748 (0.1774) loss_box_reg: 0.3733 (0.3542) loss_objectness: 0.0153 (0.0155) loss_rpn_box_reg: 0.0301 (0.0286) time: 0.3437 data: 0.0682 max mem: 1237
Epoch: [26] Total time: 0:00:01 (0.3581 s / it)
Epoch: [27] [0/3] eta: 0:00:01 lr: 0.000200 loss: 0.4331 (0.4331) loss_classifier: 0.1291 (0.1291) loss_box_reg: 0.2540 (0.2540) loss_objectness: 0.0200 (0.0200) loss_rpn_box_reg: 0.0300 (0.0300) time: 0.4378 data: 0.1598 max mem: 1237
Epoch: [27] [2/3] eta: 0:00:00 lr: 0.000200 loss: 0.5343 (0.5311) loss_classifier: 0.1522 (0.1524) loss_box_reg: 0.3354 (0.3280) loss_objectness: 0.0200 (0.0242) loss_rpn_box_reg: 0.0297 (0.0266) time: 0.3314 data: 0.0575 max mem: 1237
Epoch: [27] Total time: 0:00:01 (0.3453 s / it)
Epoch: [28] [0/3] eta: 0:00:01 lr: 0.000200 loss: 0.3621 (0.3621) loss_classifier: 0.1114 (0.1114) loss_box_reg: 0.2179 (0.2179) loss_objectness: 0.0064 (0.0064) loss_rpn_box_reg: 0.0264 (0.0264) time: 0.4696 data: 0.1881 max mem: 1237
Epoch: [28] [2/3] eta: 0:00:00 lr: 0.000200 loss: 0.6419 (0.5496) loss_classifier: 0.1833 (0.1612) loss_box_reg: 0.3990 (0.3418) loss_objectness: 0.0176 (0.0159) loss_rpn_box_reg: 0.0292 (0.0306) time: 0.3461 data: 0.0687 max mem: 1237
Epoch: [28] Total time: 0:00:01 (0.3599 s / it)
Epoch: [29] [0/3] eta: 0:00:01 lr: 0.000200 loss: 0.7010 (0.7010) loss_classifier: 0.2021 (0.2021) loss_box_reg: 0.4513 (0.4513) loss_objectness: 0.0088 (0.0088) loss_rpn_box_reg: 0.0388 (0.0388) time: 0.4849 data: 0.1949 max mem: 1237
Epoch: [29] [2/3] eta: 0:00:00 lr: 0.000200 loss: 0.7010 (0.5999) loss_classifier: 0.2021 (0.1793) loss_box_reg: 0.4513 (0.3833) loss_objectness: 0.0088 (0.0071) loss_rpn_box_reg: 0.0340 (0.0301) time: 0.3545 data: 0.0744 max mem: 1237
Epoch: [29] Total time: 0:00:01 (0.3676 s / it)
creating index...
index created!
Test: [0/2] eta: 0:00:00 model_time: 0.1136 (0.1136) evaluator_time: 0.0246 (0.0246) time: 0.2247 data: 0.0830 max mem: 1237
Test: [1/2] eta: 0:00:00 model_time: 0.1093 (0.1115) evaluator_time: 0.0246 (0.0247) time: 0.1839 data: 0.0446 max mem: 1237
Test: Total time: 0:00:00 (0.2018 s / it)
Averaged stats: model_time: 0.1093 (0.1115) evaluator_time: 0.0246 (0.0247)
Accumulating evaluation results...
DONE (t=0.00s).
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.267
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.599
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.186
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.259
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.550
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.017
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.169
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.421
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.404
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.667
Epoch: [30] [0/3] eta: 0:00:01 lr: 0.000200 loss: 0.6413 (0.6413) loss_classifier: 0.2077 (0.2077) loss_box_reg: 0.4060 (0.4060) loss_objectness: 0.0085 (0.0085) loss_rpn_box_reg: 0.0191 (0.0191) time: 0.4280 data: 0.1448 max mem: 1237
Epoch: [30] [2/3] eta: 0:00:00 lr: 0.000200 loss: 0.6413 (0.5688) loss_classifier: 0.2077 (0.1876) loss_box_reg: 0.4060 (0.3426) loss_objectness: 0.0151 (0.0135) loss_rpn_box_reg: 0.0191 (0.0252) time: 0.3331 data: 0.0543 max mem: 1238
Epoch: [30] Total time: 0:00:01 (0.3494 s / it)
Epoch: [31] [0/3] eta: 0:00:01 lr: 0.000200 loss: 0.6128 (0.6128) loss_classifier: 0.1956 (0.1956) loss_box_reg: 0.3813 (0.3813) loss_objectness: 0.0108 (0.0108) loss_rpn_box_reg: 0.0251 (0.0251) time: 0.5254 data: 0.2264 max mem: 1238
Epoch: [31] [2/3] eta: 0:00:00 lr: 0.000200 loss: 0.5440 (0.5112) loss_classifier: 0.1631 (0.1527) loss_box_reg: 0.3529 (0.3252) loss_objectness: 0.0108 (0.0124) loss_rpn_box_reg: 0.0214 (0.0209) time: 0.3662 data: 0.0837 max mem: 1238
Epoch: [31] Total time: 0:00:01 (0.3918 s / it)
Epoch: [32] [0/3] eta: 0:00:01 lr: 0.000200 loss: 0.5958 (0.5958) loss_classifier: 0.1851 (0.1851) loss_box_reg: 0.3799 (0.3799) loss_objectness: 0.0098 (0.0098) loss_rpn_box_reg: 0.0210 (0.0210) time: 0.5763 data: 0.2368 max mem: 1238
Epoch: [32] [2/3] eta: 0:00:00 lr: 0.000200 loss: 0.5883 (0.5623) loss_classifier: 0.1851 (0.1731) loss_box_reg: 0.3563 (0.3558) loss_objectness: 0.0098 (0.0089) loss_rpn_box_reg: 0.0210 (0.0246) time: 0.3983 data: 0.0906 max mem: 1238
Epoch: [32] Total time: 0:00:01 (0.4274 s / it)
Epoch: [33] [0/3] eta: 0:00:01 lr: 0.000200 loss: 0.5317 (0.5317) loss_classifier: 0.1647 (0.1647) loss_box_reg: 0.3355 (0.3355) loss_objectness: 0.0126 (0.0126) loss_rpn_box_reg: 0.0189 (0.0189) time: 0.6093 data: 0.2953 max mem: 1238
Epoch: [33] [2/3] eta: 0:00:00 lr: 0.000200 loss: 0.5076 (0.4714) loss_classifier: 0.1508 (0.1391) loss_box_reg: 0.3196 (0.3012) loss_objectness: 0.0126 (0.0108) loss_rpn_box_reg: 0.0189 (0.0203) time: 0.4013 data: 0.1095 max mem: 1238
Epoch: [33] Total time: 0:00:01 (0.4309 s / it)
Epoch: [34] [0/3] eta: 0:00:02 lr: 0.000200 loss: 0.5235 (0.5235) loss_classifier: 0.1646 (0.1646) loss_box_reg: 0.3169 (0.3169) loss_objectness: 0.0100 (0.0100) loss_rpn_box_reg: 0.0319 (0.0319) time: 0.7185 data: 0.4121 max mem: 1238
Epoch: [34] [2/3] eta: 0:00:00 lr: 0.000200 loss: 0.5609 (0.5615) loss_classifier: 0.1722 (0.1785) loss_box_reg: 0.3561 (0.3434) loss_objectness: 0.0154 (0.0147) loss_rpn_box_reg: 0.0255 (0.0249) time: 0.4377 data: 0.1462 max mem: 1238
Epoch: [34] Total time: 0:00:01 (0.4608 s / it)
Epoch: [35] [0/3] eta: 0:00:01 lr: 0.000200 loss: 0.4128 (0.4128) loss_classifier: 0.1226 (0.1226) loss_box_reg: 0.2733 (0.2733) loss_objectness: 0.0071 (0.0071) loss_rpn_box_reg: 0.0098 (0.0098) time: 0.4955 data: 0.2121 max mem: 1238
Epoch: [35] [2/3] eta: 0:00:00 lr: 0.000200 loss: 0.4128 (0.5081) loss_classifier: 0.1226 (0.1548) loss_box_reg: 0.2733 (0.3266) loss_objectness: 0.0071 (0.0067) loss_rpn_box_reg: 0.0144 (0.0200) time: 0.3541 data: 0.0760 max mem: 1238
Epoch: [35] Total time: 0:00:01 (0.3672 s / it)
Epoch: [36] [0/3] eta: 0:00:01 lr: 0.000200 loss: 0.4695 (0.4695) loss_classifier: 0.1668 (0.1668) loss_box_reg: 0.2726 (0.2726) loss_objectness: 0.0039 (0.0039) loss_rpn_box_reg: 0.0261 (0.0261) time: 0.4676 data: 0.1824 max mem: 1238
Epoch: [36] [2/3] eta: 0:00:00 lr: 0.000200 loss: 0.4752 (0.5531) loss_classifier: 0.1668 (0.1875) loss_box_reg: 0.2876 (0.3293) loss_objectness: 0.0039 (0.0106) loss_rpn_box_reg: 0.0261 (0.0257) time: 0.3444 data: 0.0673 max mem: 1238
Epoch: [36] Total time: 0:00:01 (0.3568 s / it)
Epoch: [37] [0/3] eta: 0:00:01 lr: 0.000200 loss: 0.4593 (0.4593) loss_classifier: 0.1449 (0.1449) loss_box_reg: 0.2867 (0.2867) loss_objectness: 0.0109 (0.0109) loss_rpn_box_reg: 0.0167 (0.0167) time: 0.4213 data: 0.1445 max mem: 1238
Epoch: [37] [2/3] eta: 0:00:00 lr: 0.000200 loss: 0.4593 (0.4766) loss_classifier: 0.1449 (0.1466) loss_box_reg: 0.2867 (0.3001) loss_objectness: 0.0109 (0.0093) loss_rpn_box_reg: 0.0167 (0.0205) time: 0.3289 data: 0.0552 max mem: 1238
Epoch: [37] Total time: 0:00:01 (0.3418 s / it)
Epoch: [38] [0/3] eta: 0:00:01 lr: 0.000200 loss: 0.6817 (0.6817) loss_classifier: 0.2169 (0.2169) loss_box_reg: 0.4037 (0.4037) loss_objectness: 0.0213 (0.0213) loss_rpn_box_reg: 0.0397 (0.0397) time: 0.4610 data: 0.1762 max mem: 1238
Epoch: [38] [2/3] eta: 0:00:00 lr: 0.000200 loss: 0.4829 (0.5004) loss_classifier: 0.1554 (0.1566) loss_box_reg: 0.2870 (0.3032) loss_objectness: 0.0213 (0.0172) loss_rpn_box_reg: 0.0186 (0.0233) time: 0.3413 data: 0.0634 max mem: 1238
Epoch: [38] Total time: 0:00:01 (0.3546 s / it)
Epoch: [39] [0/3] eta: 0:00:01 lr: 0.000200 loss: 0.5351 (0.5351) loss_classifier: 0.1620 (0.1620) loss_box_reg: 0.3342 (0.3342) loss_objectness: 0.0063 (0.0063) loss_rpn_box_reg: 0.0327 (0.0327) time: 0.4904 data: 0.2083 max mem: 1238
Epoch: [39] [2/3] eta: 0:00:00 lr: 0.000200 loss: 0.5351 (0.5223) loss_classifier: 0.1532 (0.1531) loss_box_reg: 0.3342 (0.3373) loss_objectness: 0.0055 (0.0049) loss_rpn_box_reg: 0.0310 (0.0271) time: 0.3547 data: 0.0758 max mem: 1238
Epoch: [39] Total time: 0:00:01 (0.3683 s / it)
creating index...
index created!
Test: [0/2] eta: 0:00:00 model_time: 0.1160 (0.1160) evaluator_time: 0.0220 (0.0220) time: 0.2289 data: 0.0871 max mem: 1238
Test: [1/2] eta: 0:00:00 model_time: 0.1117 (0.1139) evaluator_time: 0.0220 (0.0226) time: 0.1865 data: 0.0467 max mem: 1238
Test: Total time: 0:00:00 (0.2046 s / it)
Averaged stats: model_time: 0.1117 (0.1139) evaluator_time: 0.0220 (0.0226)
Accumulating evaluation results...
DONE (t=0.00s).
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.240
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.587
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.130
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.233
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.545
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.023
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.171
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.383
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.371
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.567
Epoch: [40] [0/3] eta: 0:00:01 lr: 0.000040 loss: 0.6574 (0.6574) loss_classifier: 0.2129 (0.2129) loss_box_reg: 0.4124 (0.4124) loss_objectness: 0.0031 (0.0031) loss_rpn_box_reg: 0.0291 (0.0291) time: 0.4781 data: 0.1942 max mem: 1238
Epoch: [40] [2/3] eta: 0:00:00 lr: 0.000040 loss: 0.3934 (0.4812) loss_classifier: 0.1219 (0.1484) loss_box_reg: 0.2541 (0.3058) loss_objectness: 0.0058 (0.0064) loss_rpn_box_reg: 0.0184 (0.0206) time: 0.3471 data: 0.0700 max mem: 1238
Epoch: [40] Total time: 0:00:01 (0.3602 s / it)
Epoch: [41] [0/3] eta: 0:00:01 lr: 0.000040 loss: 0.7990 (0.7990) loss_classifier: 0.2617 (0.2617) loss_box_reg: 0.4865 (0.4865) loss_objectness: 0.0125 (0.0125) loss_rpn_box_reg: 0.0383 (0.0383) time: 0.5432 data: 0.2521 max mem: 1238
Epoch: [41] [2/3] eta: 0:00:00 lr: 0.000040 loss: 0.4659 (0.5342) loss_classifier: 0.1314 (0.1646) loss_box_reg: 0.2924 (0.3297) loss_objectness: 0.0160 (0.0154) loss_rpn_box_reg: 0.0260 (0.0244) time: 0.3713 data: 0.0916 max mem: 1238
Epoch: [41] Total time: 0:00:01 (0.3843 s / it)
Epoch: [42] [0/3] eta: 0:00:01 lr: 0.000040 loss: 0.5306 (0.5306) loss_classifier: 0.1665 (0.1665) loss_box_reg: 0.3368 (0.3368) loss_objectness: 0.0046 (0.0046) loss_rpn_box_reg: 0.0228 (0.0228) time: 0.4605 data: 0.1745 max mem: 1238
Epoch: [42] [2/3] eta: 0:00:00 lr: 0.000040 loss: 0.4387 (0.4577) loss_classifier: 0.1368 (0.1441) loss_box_reg: 0.2815 (0.2881) loss_objectness: 0.0071 (0.0066) loss_rpn_box_reg: 0.0211 (0.0188) time: 0.3449 data: 0.0638 max mem: 1238
Epoch: [42] Total time: 0:00:01 (0.3682 s / it)
Epoch: [43] [0/3] eta: 0:00:01 lr: 0.000040 loss: 0.6965 (0.6965) loss_classifier: 0.2337 (0.2337) loss_box_reg: 0.4102 (0.4102) loss_objectness: 0.0187 (0.0187) loss_rpn_box_reg: 0.0339 (0.0339) time: 0.5506 data: 0.2600 max mem: 1238
Epoch: [43] [2/3] eta: 0:00:00 lr: 0.000040 loss: 0.3886 (0.4898) loss_classifier: 0.1143 (0.1526) loss_box_reg: 0.2337 (0.2924) loss_objectness: 0.0242 (0.0227) loss_rpn_box_reg: 0.0165 (0.0220) time: 0.3762 data: 0.0934 max mem: 1238
Epoch: [43] Total time: 0:00:01 (0.3957 s / it)
Epoch: [44] [0/3] eta: 0:00:01 lr: 0.000040 loss: 0.4588 (0.4588) loss_classifier: 0.1400 (0.1400) loss_box_reg: 0.2912 (0.2912) loss_objectness: 0.0097 (0.0097) loss_rpn_box_reg: 0.0179 (0.0179) time: 0.5428 data: 0.2385 max mem: 1238
Epoch: [44] [2/3] eta: 0:00:00 lr: 0.000040 loss: 0.4701 (0.4935) loss_classifier: 0.1447 (0.1449) loss_box_reg: 0.2923 (0.3195) loss_objectness: 0.0097 (0.0076) loss_rpn_box_reg: 0.0230 (0.0215) time: 0.3821 data: 0.0905 max mem: 1238
Epoch: [44] Total time: 0:00:01 (0.4056 s / it)
Epoch: [45] [0/3] eta: 0:00:01 lr: 0.000040 loss: 0.5754 (0.5754) loss_classifier: 0.1756 (0.1756) loss_box_reg: 0.3672 (0.3672) loss_objectness: 0.0118 (0.0118) loss_rpn_box_reg: 0.0209 (0.0209) time: 0.5397 data: 0.2566 max mem: 1238
Epoch: [45] [2/3] eta: 0:00:00 lr: 0.000040 loss: 0.4711 (0.4494) loss_classifier: 0.1470 (0.1391) loss_box_reg: 0.2865 (0.2784) loss_objectness: 0.0118 (0.0148) loss_rpn_box_reg: 0.0172 (0.0171) time: 0.3687 data: 0.0907 max mem: 1238
Epoch: [45] Total time: 0:00:01 (0.3802 s / it)
Epoch: [46] [0/3] eta: 0:00:01 lr: 0.000040 loss: 0.5908 (0.5908) loss_classifier: 0.1661 (0.1661) loss_box_reg: 0.3840 (0.3840) loss_objectness: 0.0183 (0.0183) loss_rpn_box_reg: 0.0224 (0.0224) time: 0.4726 data: 0.1859 max mem: 1238
Epoch: [46] [2/3] eta: 0:00:00 lr: 0.000040 loss: 0.4913 (0.5042) loss_classifier: 0.1642 (0.1548) loss_box_reg: 0.3021 (0.3180) loss_objectness: 0.0078 (0.0090) loss_rpn_box_reg: 0.0224 (0.0224) time: 0.3471 data: 0.0678 max mem: 1238
Epoch: [46] Total time: 0:00:01 (0.3613 s / it)
Epoch: [47] [0/3] eta: 0:00:01 lr: 0.000040 loss: 0.3532 (0.3532) loss_classifier: 0.0960 (0.0960) loss_box_reg: 0.2351 (0.2351) loss_objectness: 0.0024 (0.0024) loss_rpn_box_reg: 0.0196 (0.0196) time: 0.4486 data: 0.1670 max mem: 1238
Epoch: [47] [2/3] eta: 0:00:00 lr: 0.000040 loss: 0.5930 (0.5247) loss_classifier: 0.1881 (0.1587) loss_box_reg: 0.3605 (0.3308) loss_objectness: 0.0093 (0.0110) loss_rpn_box_reg: 0.0232 (0.0243) time: 0.3459 data: 0.0653 max mem: 1238
Epoch: [47] Total time: 0:00:01 (0.3596 s / it)
Epoch: [48] [0/3] eta: 0:00:01 lr: 0.000040 loss: 0.4339 (0.4339) loss_classifier: 0.1421 (0.1421) loss_box_reg: 0.2744 (0.2744) loss_objectness: 0.0028 (0.0028) loss_rpn_box_reg: 0.0145 (0.0145) time: 0.4299 data: 0.1482 max mem: 1238
Epoch: [48] [2/3] eta: 0:00:00 lr: 0.000040 loss: 0.4339 (0.4410) loss_classifier: 0.1421 (0.1397) loss_box_reg: 0.2744 (0.2752) loss_objectness: 0.0049 (0.0091) loss_rpn_box_reg: 0.0154 (0.0170) time: 0.3319 data: 0.0539 max mem: 1238
Epoch: [48] Total time: 0:00:01 (0.3458 s / it)
Epoch: [49] [0/3] eta: 0:00:01 lr: 0.000040 loss: 0.5952 (0.5952) loss_classifier: 0.1497 (0.1497) loss_box_reg: 0.3691 (0.3691) loss_objectness: 0.0265 (0.0265) loss_rpn_box_reg: 0.0500 (0.0500) time: 0.5417 data: 0.2492 max mem: 1238
Epoch: [49] [2/3] eta: 0:00:00 lr: 0.000040 loss: 0.5952 (0.5496) loss_classifier: 0.1497 (0.1625) loss_box_reg: 0.3691 (0.3399) loss_objectness: 0.0116 (0.0160) loss_rpn_box_reg: 0.0334 (0.0311) time: 0.3749 data: 0.0916 max mem: 1238
Epoch: [49] Total time: 0:00:01 (0.3879 s / it)
creating index...
index created!
Test: [0/2] eta: 0:00:00 model_time: 0.1185 (0.1185) evaluator_time: 0.0211 (0.0211) time: 0.2301 data: 0.0871 max mem: 1238
Test: [1/2] eta: 0:00:00 model_time: 0.1130 (0.1158) evaluator_time: 0.0211 (0.0228) time: 0.1883 data: 0.0466 max mem: 1238
Test: Total time: 0:00:00 (0.2060 s / it)
Averaged stats: model_time: 0.1130 (0.1158) evaluator_time: 0.0211 (0.0228)
Accumulating evaluation results...
DONE (t=0.00s).
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.230
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.545
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.126
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.226
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.534
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.019
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.162
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.385
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.376
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.533
Epoch: [50] [0/3] eta: 0:00:01 lr: 0.000040 loss: 0.4961 (0.4961) loss_classifier: 0.1517 (0.1517) loss_box_reg: 0.2979 (0.2979) loss_objectness: 0.0171 (0.0171) loss_rpn_box_reg: 0.0293 (0.0293) time: 0.4946 data: 0.2104 max mem: 1238
Epoch: [50] [2/3] eta: 0:00:00 lr: 0.000040 loss: 0.4961 (0.4826) loss_classifier: 0.1517 (0.1463) loss_box_reg: 0.2979 (0.2976) loss_objectness: 0.0132 (0.0144) loss_rpn_box_reg: 0.0229 (0.0243) time: 0.3575 data: 0.0778 max mem: 1238
Epoch: [50] Total time: 0:00:01 (0.3707 s / it)
Epoch: [51] [0/3] eta: 0:00:01 lr: 0.000040 loss: 0.4982 (0.4982) loss_classifier: 0.1735 (0.1735) loss_box_reg: 0.2870 (0.2870) loss_objectness: 0.0233 (0.0233) loss_rpn_box_reg: 0.0144 (0.0144) time: 0.4373 data: 0.1513 max mem: 1238
Epoch: [51] [2/3] eta: 0:00:00 lr: 0.000040 loss: 0.4594 (0.4574) loss_classifier: 0.1478 (0.1458) loss_box_reg: 0.2870 (0.2729) loss_objectness: 0.0164 (0.0173) loss_rpn_box_reg: 0.0144 (0.0214) time: 0.3351 data: 0.0555 max mem: 1238
Epoch: [51] Total time: 0:00:01 (0.3479 s / it)
Epoch: [52] [0/3] eta: 0:00:01 lr: 0.000040 loss: 0.3007 (0.3007) loss_classifier: 0.0951 (0.0951) loss_box_reg: 0.1943 (0.1943) loss_objectness: 0.0034 (0.0034) loss_rpn_box_reg: 0.0079 (0.0079) time: 0.4511 data: 0.1638 max mem: 1238
Epoch: [52] [2/3] eta: 0:00:00 lr: 0.000040 loss: 0.4014 (0.4742) loss_classifier: 0.1063 (0.1493) loss_box_reg: 0.2493 (0.2806) loss_objectness: 0.0140 (0.0183) loss_rpn_box_reg: 0.0317 (0.0260) time: 0.3432 data: 0.0620 max mem: 1238
Epoch: [52] Total time: 0:00:01 (0.3563 s / it)
Epoch: [53] [0/3] eta: 0:00:01 lr: 0.000040 loss: 0.5603 (0.5603) loss_classifier: 0.1681 (0.1681) loss_box_reg: 0.3426 (0.3426) loss_objectness: 0.0230 (0.0230) loss_rpn_box_reg: 0.0266 (0.0266) time: 0.4938 data: 0.2057 max mem: 1238
Epoch: [53] [2/3] eta: 0:00:00 lr: 0.000040 loss: 0.5071 (0.4723) loss_classifier: 0.1600 (0.1423) loss_box_reg: 0.3171 (0.2965) loss_objectness: 0.0031 (0.0091) loss_rpn_box_reg: 0.0266 (0.0243) time: 0.3613 data: 0.0776 max mem: 1238
Epoch: [53] Total time: 0:00:01 (0.3771 s / it)
Epoch: [54] [0/3] eta: 0:00:01 lr: 0.000040 loss: 0.5108 (0.5108) loss_classifier: 0.1704 (0.1704) loss_box_reg: 0.3022 (0.3022) loss_objectness: 0.0056 (0.0056) loss_rpn_box_reg: 0.0325 (0.0325) time: 0.5865 data: 0.2800 max mem: 1238
Epoch: [54] [2/3] eta: 0:00:00 lr: 0.000040 loss: 0.5079 (0.4879) loss_classifier: 0.1665 (0.1539) loss_box_reg: 0.3022 (0.2980) loss_objectness: 0.0056 (0.0082) loss_rpn_box_reg: 0.0287 (0.0278) time: 0.3904 data: 0.1017 max mem: 1238
Epoch: [54] Total time: 0:00:01 (0.4120 s / it)
Epoch: [55] [0/3] eta: 0:00:01 lr: 0.000040 loss: 0.3227 (0.3227) loss_classifier: 0.0939 (0.0939) loss_box_reg: 0.1990 (0.1990) loss_objectness: 0.0178 (0.0178) loss_rpn_box_reg: 0.0120 (0.0120) time: 0.5063 data: 0.2108 max mem: 1238
Epoch: [55] [2/3] eta: 0:00:00 lr: 0.000040 loss: 0.3227 (0.4314) loss_classifier: 0.0939 (0.1314) loss_box_reg: 0.2032 (0.2720) loss_objectness: 0.0114 (0.0117) loss_rpn_box_reg: 0.0120 (0.0163) time: 0.3667 data: 0.0787 max mem: 1238
Epoch: [55] Total time: 0:00:01 (0.3906 s / it)
Epoch: [56] [0/3] eta: 0:00:01 lr: 0.000040 loss: 0.2767 (0.2767) loss_classifier: 0.0961 (0.0961) loss_box_reg: 0.1577 (0.1577) loss_objectness: 0.0161 (0.0161) loss_rpn_box_reg: 0.0068 (0.0068) time: 0.5617 data: 0.2776 max mem: 1238
Epoch: [56] [2/3] eta: 0:00:00 lr: 0.000040 loss: 0.3857 (0.4321) loss_classifier: 0.1056 (0.1315) loss_box_reg: 0.2492 (0.2691) loss_objectness: 0.0129 (0.0120) loss_rpn_box_reg: 0.0180 (0.0195) time: 0.3783 data: 0.0981 max mem: 1238
Epoch: [56] Total time: 0:00:01 (0.3923 s / it)
Epoch: [57] [0/3] eta: 0:00:01 lr: 0.000040 loss: 0.7346 (0.7346) loss_classifier: 0.2029 (0.2029) loss_box_reg: 0.4753 (0.4753) loss_objectness: 0.0092 (0.0092) loss_rpn_box_reg: 0.0472 (0.0472) time: 0.5355 data: 0.2438 max mem: 1238
Epoch: [57] [2/3] eta: 0:00:00 lr: 0.000040 loss: 0.4264 (0.5288) loss_classifier: 0.1447 (0.1585) loss_box_reg: 0.2711 (0.3315) loss_objectness: 0.0092 (0.0106) loss_rpn_box_reg: 0.0259 (0.0282) time: 0.3727 data: 0.0898 max mem: 1238
Epoch: [57] Total time: 0:00:01 (0.3859 s / it)
Epoch: [58] [0/3] eta: 0:00:01 lr: 0.000040 loss: 0.2747 (0.2747) loss_classifier: 0.0759 (0.0759) loss_box_reg: 0.1832 (0.1832) loss_objectness: 0.0043 (0.0043) loss_rpn_box_reg: 0.0112 (0.0112) time: 0.4584 data: 0.1701 max mem: 1238
Epoch: [58] [2/3] eta: 0:00:00 lr: 0.000040 loss: 0.4412 (0.4260) loss_classifier: 0.1377 (0.1278) loss_box_reg: 0.2709 (0.2649) loss_objectness: 0.0132 (0.0156) loss_rpn_box_reg: 0.0194 (0.0177) time: 0.3427 data: 0.0620 max mem: 1238
Epoch: [58] Total time: 0:00:01 (0.3556 s / it)
Epoch: [59] [0/3] eta: 0:00:01 lr: 0.000040 loss: 0.3892 (0.3892) loss_classifier: 0.1280 (0.1280) loss_box_reg: 0.2235 (0.2235) loss_objectness: 0.0232 (0.0232) loss_rpn_box_reg: 0.0144 (0.0144) time: 0.4715 data: 0.1809 max mem: 1238
Epoch: [59] [2/3] eta: 0:00:00 lr: 0.000040 loss: 0.4376 (0.4895) loss_classifier: 0.1410 (0.1561) loss_box_reg: 0.2663 (0.2905) loss_objectness: 0.0172 (0.0183) loss_rpn_box_reg: 0.0157 (0.0246) time: 0.3485 data: 0.0666 max mem: 1238
Epoch: [59] Total time: 0:00:01 (0.3615 s / it)
creating index...
index created!
Test: [0/2] eta: 0:00:00 model_time: 0.1148 (0.1148) evaluator_time: 0.0199 (0.0199) time: 0.2331 data: 0.0951 max mem: 1238
Test: [1/2] eta: 0:00:00 model_time: 0.1114 (0.1131) evaluator_time: 0.0197 (0.0198) time: 0.1850 data: 0.0492 max mem: 1238
Test: Total time: 0:00:00 (0.2029 s / it)
Averaged stats: model_time: 0.1114 (0.1131) evaluator_time: 0.0197 (0.0198)
Accumulating evaluation results...
DONE (t=0.01s).
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.228
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.561
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.137
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.221
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.511
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.017
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.169
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.369
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.358
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.533
Further learning for 60 epochs with all the layers available for training.
[27]:
for parameter in model.backbone.parameters():
parameter.requires_grad=True
# construct an optimizer
params = [p for p in model.parameters() if p.requires_grad]
optimizer = torch.optim.Adam(params, lr=0.00005, betas=(0.9, 0.999), eps=1e-08, weight_decay=0, amsgrad=False)
# train it for 60 epochs
num_epochs = 60
for epoch in range(num_epochs):
# train for one epoch, printing every 10 iterations
train_one_epoch(model, optimizer, data_loader, device, epoch, print_freq=3)
# update the learning rate
lr_scheduler.step()
# evaluate on the test dataset
if (epoch+1) % 10 == 0:
evaluate(model, data_loader_test, device=device)
Epoch: [0] [0/3] eta: 0:00:02 lr: 0.000025 loss: 0.3134 (0.3134) loss_classifier: 0.1008 (0.1008) loss_box_reg: 0.1957 (0.1957) loss_objectness: 0.0046 (0.0046) loss_rpn_box_reg: 0.0124 (0.0124) time: 0.7888 data: 0.1499 max mem: 3283
Epoch: [0] [2/3] eta: 0:00:00 lr: 0.000050 loss: 0.4157 (0.5806) loss_classifier: 0.1338 (0.1999) loss_box_reg: 0.2498 (0.3290) loss_objectness: 0.0175 (0.0269) loss_rpn_box_reg: 0.0146 (0.0248) time: 0.6823 data: 0.0541 max mem: 3636
Epoch: [0] Total time: 0:00:02 (0.6969 s / it)
Epoch: [1] [0/3] eta: 0:00:02 lr: 0.000050 loss: 0.5490 (0.5490) loss_classifier: 0.1915 (0.1915) loss_box_reg: 0.3284 (0.3284) loss_objectness: 0.0025 (0.0025) loss_rpn_box_reg: 0.0266 (0.0266) time: 0.8109 data: 0.1886 max mem: 3636
Epoch: [1] [2/3] eta: 0:00:00 lr: 0.000050 loss: 0.5490 (0.6025) loss_classifier: 0.1915 (0.2071) loss_box_reg: 0.3284 (0.3639) loss_objectness: 0.0061 (0.0062) loss_rpn_box_reg: 0.0263 (0.0253) time: 0.6900 data: 0.0688 max mem: 3636
Epoch: [1] Total time: 0:00:02 (0.7036 s / it)
Epoch: [2] [0/3] eta: 0:00:02 lr: 0.000050 loss: 0.5391 (0.5391) loss_classifier: 0.1777 (0.1777) loss_box_reg: 0.3254 (0.3254) loss_objectness: 0.0093 (0.0093) loss_rpn_box_reg: 0.0268 (0.0268) time: 0.8511 data: 0.2252 max mem: 3636
Epoch: [2] [2/3] eta: 0:00:00 lr: 0.000050 loss: 0.5391 (0.6493) loss_classifier: 0.1777 (0.2262) loss_box_reg: 0.3323 (0.3772) loss_objectness: 0.0141 (0.0182) loss_rpn_box_reg: 0.0280 (0.0277) time: 0.7112 data: 0.0842 max mem: 3636
Epoch: [2] Total time: 0:00:02 (0.7316 s / it)
Epoch: [3] [0/3] eta: 0:00:02 lr: 0.000050 loss: 0.3754 (0.3754) loss_classifier: 0.1305 (0.1305) loss_box_reg: 0.2192 (0.2192) loss_objectness: 0.0085 (0.0085) loss_rpn_box_reg: 0.0171 (0.0171) time: 0.9628 data: 0.3159 max mem: 3636
Epoch: [3] [2/3] eta: 0:00:00 lr: 0.000050 loss: 0.6100 (0.5492) loss_classifier: 0.1911 (0.1764) loss_box_reg: 0.3574 (0.3318) loss_objectness: 0.0085 (0.0149) loss_rpn_box_reg: 0.0286 (0.0260) time: 0.7483 data: 0.1135 max mem: 3636
Epoch: [3] Total time: 0:00:02 (0.7629 s / it)
Epoch: [4] [0/3] eta: 0:00:02 lr: 0.000050 loss: 0.5492 (0.5492) loss_classifier: 0.1965 (0.1965) loss_box_reg: 0.3107 (0.3107) loss_objectness: 0.0185 (0.0185) loss_rpn_box_reg: 0.0236 (0.0236) time: 0.7902 data: 0.1652 max mem: 3636
Epoch: [4] [2/3] eta: 0:00:00 lr: 0.000050 loss: 0.5492 (0.5660) loss_classifier: 0.1965 (0.1845) loss_box_reg: 0.3107 (0.3332) loss_objectness: 0.0185 (0.0211) loss_rpn_box_reg: 0.0242 (0.0271) time: 0.6837 data: 0.0602 max mem: 3636
Epoch: [4] Total time: 0:00:02 (0.6974 s / it)
Epoch: [5] [0/3] eta: 0:00:02 lr: 0.000050 loss: 0.3871 (0.3871) loss_classifier: 0.1101 (0.1101) loss_box_reg: 0.2484 (0.2484) loss_objectness: 0.0080 (0.0080) loss_rpn_box_reg: 0.0206 (0.0206) time: 0.8154 data: 0.1943 max mem: 3636
Epoch: [5] [2/3] eta: 0:00:00 lr: 0.000050 loss: 0.5679 (0.5342) loss_classifier: 0.1836 (0.1644) loss_box_reg: 0.3257 (0.3289) loss_objectness: 0.0082 (0.0167) loss_rpn_box_reg: 0.0249 (0.0243) time: 0.6955 data: 0.0728 max mem: 3636
Epoch: [5] Total time: 0:00:02 (0.7083 s / it)
Epoch: [6] [0/3] eta: 0:00:02 lr: 0.000050 loss: 0.5063 (0.5063) loss_classifier: 0.1683 (0.1683) loss_box_reg: 0.3106 (0.3106) loss_objectness: 0.0009 (0.0009) loss_rpn_box_reg: 0.0265 (0.0265) time: 0.7769 data: 0.1487 max mem: 3636
Epoch: [6] [2/3] eta: 0:00:00 lr: 0.000050 loss: 0.4918 (0.4741) loss_classifier: 0.1404 (0.1466) loss_box_reg: 0.3027 (0.2963) loss_objectness: 0.0010 (0.0098) loss_rpn_box_reg: 0.0212 (0.0214) time: 0.6838 data: 0.0565 max mem: 3636
Epoch: [6] Total time: 0:00:02 (0.6972 s / it)
Epoch: [7] [0/3] eta: 0:00:02 lr: 0.000050 loss: 0.4842 (0.4842) loss_classifier: 0.1611 (0.1611) loss_box_reg: 0.2814 (0.2814) loss_objectness: 0.0090 (0.0090) loss_rpn_box_reg: 0.0327 (0.0327) time: 0.8350 data: 0.2037 max mem: 3636
Epoch: [7] [2/3] eta: 0:00:00 lr: 0.000050 loss: 0.5536 (0.5409) loss_classifier: 0.1718 (0.1709) loss_box_reg: 0.3076 (0.3210) loss_objectness: 0.0090 (0.0191) loss_rpn_box_reg: 0.0304 (0.0300) time: 0.7039 data: 0.0749 max mem: 3641
Epoch: [7] Total time: 0:00:02 (0.7168 s / it)
Epoch: [8] [0/3] eta: 0:00:02 lr: 0.000050 loss: 0.3884 (0.3884) loss_classifier: 0.1249 (0.1249) loss_box_reg: 0.2321 (0.2321) loss_objectness: 0.0097 (0.0097) loss_rpn_box_reg: 0.0217 (0.0217) time: 0.8446 data: 0.2159 max mem: 3641
Epoch: [8] [2/3] eta: 0:00:00 lr: 0.000050 loss: 0.3884 (0.4633) loss_classifier: 0.1249 (0.1319) loss_box_reg: 0.2575 (0.3003) loss_objectness: 0.0097 (0.0085) loss_rpn_box_reg: 0.0217 (0.0226) time: 0.7140 data: 0.0817 max mem: 3641
Epoch: [8] Total time: 0:00:02 (0.7351 s / it)
Epoch: [9] [0/3] eta: 0:00:02 lr: 0.000050 loss: 0.5802 (0.5802) loss_classifier: 0.1757 (0.1757) loss_box_reg: 0.3779 (0.3779) loss_objectness: 0.0039 (0.0039) loss_rpn_box_reg: 0.0227 (0.0227) time: 0.9909 data: 0.3204 max mem: 3641
Epoch: [9] [2/3] eta: 0:00:00 lr: 0.000050 loss: 0.5439 (0.5036) loss_classifier: 0.1757 (0.1676) loss_box_reg: 0.3016 (0.3070) loss_objectness: 0.0052 (0.0055) loss_rpn_box_reg: 0.0227 (0.0235) time: 0.7649 data: 0.1179 max mem: 3641
Epoch: [9] Total time: 0:00:02 (0.7794 s / it)
creating index...
index created!
Test: [0/2] eta: 0:00:00 model_time: 0.1209 (0.1209) evaluator_time: 0.0190 (0.0190) time: 0.2349 data: 0.0914 max mem: 3641
Test: [1/2] eta: 0:00:00 model_time: 0.1134 (0.1172) evaluator_time: 0.0189 (0.0190) time: 0.1876 data: 0.0483 max mem: 3641
Test: Total time: 0:00:00 (0.2072 s / it)
Averaged stats: model_time: 0.1134 (0.1172) evaluator_time: 0.0189 (0.0190)
Accumulating evaluation results...
DONE (t=0.01s).
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.283
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.615
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.196
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.285
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.609
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.019
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.181
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.429
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.411
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.700
Epoch: [10] [0/3] eta: 0:00:02 lr: 0.000050 loss: 0.4905 (0.4905) loss_classifier: 0.1592 (0.1592) loss_box_reg: 0.3059 (0.3059) loss_objectness: 0.0041 (0.0041) loss_rpn_box_reg: 0.0213 (0.0213) time: 0.8512 data: 0.2209 max mem: 3641
Epoch: [10] [2/3] eta: 0:00:00 lr: 0.000050 loss: 0.4637 (0.4468) loss_classifier: 0.1428 (0.1388) loss_box_reg: 0.2817 (0.2744) loss_objectness: 0.0159 (0.0129) loss_rpn_box_reg: 0.0213 (0.0206) time: 0.7130 data: 0.0818 max mem: 3641
Epoch: [10] Total time: 0:00:02 (0.7263 s / it)
Epoch: [11] [0/3] eta: 0:00:02 lr: 0.000050 loss: 0.4851 (0.4851) loss_classifier: 0.1390 (0.1390) loss_box_reg: 0.3123 (0.3123) loss_objectness: 0.0142 (0.0142) loss_rpn_box_reg: 0.0196 (0.0196) time: 0.9054 data: 0.2711 max mem: 3647
Epoch: [11] [2/3] eta: 0:00:00 lr: 0.000050 loss: 0.4851 (0.4622) loss_classifier: 0.1390 (0.1396) loss_box_reg: 0.3123 (0.2922) loss_objectness: 0.0089 (0.0107) loss_rpn_box_reg: 0.0196 (0.0198) time: 0.7351 data: 0.1002 max mem: 3647
Epoch: [11] Total time: 0:00:02 (0.7488 s / it)
Epoch: [12] [0/3] eta: 0:00:02 lr: 0.000050 loss: 0.4356 (0.4356) loss_classifier: 0.1286 (0.1286) loss_box_reg: 0.2291 (0.2291) loss_objectness: 0.0635 (0.0635) loss_rpn_box_reg: 0.0143 (0.0143) time: 0.7840 data: 0.1385 max mem: 3647
Epoch: [12] [2/3] eta: 0:00:00 lr: 0.000050 loss: 0.4356 (0.4069) loss_classifier: 0.1286 (0.1235) loss_box_reg: 0.2291 (0.2466) loss_objectness: 0.0041 (0.0228) loss_rpn_box_reg: 0.0143 (0.0141) time: 0.6883 data: 0.0515 max mem: 3647
Epoch: [12] Total time: 0:00:02 (0.7008 s / it)
Epoch: [13] [0/3] eta: 0:00:02 lr: 0.000050 loss: 0.3630 (0.3630) loss_classifier: 0.1245 (0.1245) loss_box_reg: 0.2233 (0.2233) loss_objectness: 0.0004 (0.0004) loss_rpn_box_reg: 0.0148 (0.0148) time: 0.8219 data: 0.1819 max mem: 3647
Epoch: [13] [2/3] eta: 0:00:00 lr: 0.000050 loss: 0.3630 (0.3766) loss_classifier: 0.1242 (0.1193) loss_box_reg: 0.2233 (0.2329) loss_objectness: 0.0026 (0.0048) loss_rpn_box_reg: 0.0148 (0.0197) time: 0.7264 data: 0.0720 max mem: 3647
Epoch: [13] Total time: 0:00:02 (0.7496 s / it)
Epoch: [14] [0/3] eta: 0:00:02 lr: 0.000050 loss: 0.3824 (0.3824) loss_classifier: 0.1159 (0.1159) loss_box_reg: 0.2036 (0.2036) loss_objectness: 0.0231 (0.0231) loss_rpn_box_reg: 0.0398 (0.0398) time: 0.9853 data: 0.3290 max mem: 3647
Epoch: [14] [2/3] eta: 0:00:00 lr: 0.000050 loss: 0.3824 (0.4436) loss_classifier: 0.1159 (0.1341) loss_box_reg: 0.2036 (0.2694) loss_objectness: 0.0137 (0.0127) loss_rpn_box_reg: 0.0346 (0.0274) time: 0.7665 data: 0.1220 max mem: 3647
Epoch: [14] Total time: 0:00:02 (0.7882 s / it)
Epoch: [15] [0/3] eta: 0:00:02 lr: 0.000050 loss: 0.4284 (0.4284) loss_classifier: 0.1325 (0.1325) loss_box_reg: 0.2674 (0.2674) loss_objectness: 0.0108 (0.0108) loss_rpn_box_reg: 0.0176 (0.0176) time: 0.8003 data: 0.1694 max mem: 3647
Epoch: [15] [2/3] eta: 0:00:00 lr: 0.000050 loss: 0.4284 (0.4289) loss_classifier: 0.1325 (0.1312) loss_box_reg: 0.2674 (0.2671) loss_objectness: 0.0108 (0.0096) loss_rpn_box_reg: 0.0176 (0.0210) time: 0.6966 data: 0.0634 max mem: 3647
Epoch: [15] Total time: 0:00:02 (0.7095 s / it)
Epoch: [16] [0/3] eta: 0:00:02 lr: 0.000050 loss: 0.3933 (0.3933) loss_classifier: 0.1102 (0.1102) loss_box_reg: 0.2501 (0.2501) loss_objectness: 0.0047 (0.0047) loss_rpn_box_reg: 0.0283 (0.0283) time: 0.8162 data: 0.1807 max mem: 3647
Epoch: [16] [2/3] eta: 0:00:00 lr: 0.000050 loss: 0.3933 (0.4079) loss_classifier: 0.1102 (0.1269) loss_box_reg: 0.2501 (0.2560) loss_objectness: 0.0047 (0.0049) loss_rpn_box_reg: 0.0226 (0.0201) time: 0.7011 data: 0.0665 max mem: 3647
Epoch: [16] Total time: 0:00:02 (0.7136 s / it)
Epoch: [17] [0/3] eta: 0:00:02 lr: 0.000050 loss: 0.5419 (0.5419) loss_classifier: 0.1607 (0.1607) loss_box_reg: 0.3427 (0.3427) loss_objectness: 0.0177 (0.0177) loss_rpn_box_reg: 0.0208 (0.0208) time: 0.8044 data: 0.1694 max mem: 3647
Epoch: [17] [2/3] eta: 0:00:00 lr: 0.000050 loss: 0.3268 (0.3873) loss_classifier: 0.1017 (0.1199) loss_box_reg: 0.1967 (0.2412) loss_objectness: 0.0028 (0.0072) loss_rpn_box_reg: 0.0208 (0.0189) time: 0.6996 data: 0.0632 max mem: 3647
Epoch: [17] Total time: 0:00:02 (0.7124 s / it)
Epoch: [18] [0/3] eta: 0:00:02 lr: 0.000050 loss: 0.3381 (0.3381) loss_classifier: 0.1171 (0.1171) loss_box_reg: 0.1895 (0.1895) loss_objectness: 0.0037 (0.0037) loss_rpn_box_reg: 0.0278 (0.0278) time: 0.8751 data: 0.2381 max mem: 3647
Epoch: [18] [2/3] eta: 0:00:00 lr: 0.000050 loss: 0.3381 (0.4130) loss_classifier: 0.1171 (0.1426) loss_box_reg: 0.1988 (0.2437) loss_objectness: 0.0037 (0.0047) loss_rpn_box_reg: 0.0276 (0.0219) time: 0.7251 data: 0.0869 max mem: 3647
Epoch: [18] Total time: 0:00:02 (0.7379 s / it)
Epoch: [19] [0/3] eta: 0:00:02 lr: 0.000050 loss: 0.4803 (0.4803) loss_classifier: 0.1520 (0.1520) loss_box_reg: 0.3009 (0.3009) loss_objectness: 0.0037 (0.0037) loss_rpn_box_reg: 0.0236 (0.0236) time: 0.8165 data: 0.1790 max mem: 3647
Epoch: [19] [2/3] eta: 0:00:00 lr: 0.000050 loss: 0.3666 (0.3815) loss_classifier: 0.1100 (0.1186) loss_box_reg: 0.2332 (0.2391) loss_objectness: 0.0111 (0.0092) loss_rpn_box_reg: 0.0123 (0.0146) time: 0.7064 data: 0.0664 max mem: 3647
Epoch: [19] Total time: 0:00:02 (0.7264 s / it)
creating index...
index created!
Test: [0/2] eta: 0:00:00 model_time: 0.1199 (0.1199) evaluator_time: 0.0368 (0.0368) time: 0.3025 data: 0.1426 max mem: 3647
Test: [1/2] eta: 0:00:00 model_time: 0.1180 (0.1189) evaluator_time: 0.0356 (0.0362) time: 0.2312 data: 0.0729 max mem: 3647
Test: Total time: 0:00:00 (0.2662 s / it)
Averaged stats: model_time: 0.1180 (0.1189) evaluator_time: 0.0356 (0.0362)
Accumulating evaluation results...
DONE (t=0.01s).
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.311
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.696
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.234
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.302
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.557
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.015
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.183
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.450
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.436
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.667
Epoch: [20] [0/3] eta: 0:00:02 lr: 0.000050 loss: 0.3622 (0.3622) loss_classifier: 0.1028 (0.1028) loss_box_reg: 0.2164 (0.2164) loss_objectness: 0.0122 (0.0122) loss_rpn_box_reg: 0.0307 (0.0307) time: 0.9996 data: 0.3228 max mem: 3647
Epoch: [20] [2/3] eta: 0:00:00 lr: 0.000050 loss: 0.3938 (0.4088) loss_classifier: 0.1069 (0.1203) loss_box_reg: 0.2496 (0.2549) loss_objectness: 0.0122 (0.0113) loss_rpn_box_reg: 0.0205 (0.0222) time: 0.7734 data: 0.1208 max mem: 3647
Epoch: [20] Total time: 0:00:02 (0.7869 s / it)
Epoch: [21] [0/3] eta: 0:00:02 lr: 0.000050 loss: 0.2972 (0.2972) loss_classifier: 0.0825 (0.0825) loss_box_reg: 0.2059 (0.2059) loss_objectness: 0.0015 (0.0015) loss_rpn_box_reg: 0.0073 (0.0073) time: 0.8171 data: 0.1793 max mem: 3647
Epoch: [21] [2/3] eta: 0:00:00 lr: 0.000050 loss: 0.3166 (0.3658) loss_classifier: 0.0959 (0.1067) loss_box_reg: 0.2084 (0.2418) loss_objectness: 0.0024 (0.0024) loss_rpn_box_reg: 0.0099 (0.0149) time: 0.7042 data: 0.0651 max mem: 3647
Epoch: [21] Total time: 0:00:02 (0.7172 s / it)
Epoch: [22] [0/3] eta: 0:00:02 lr: 0.000050 loss: 0.3829 (0.3829) loss_classifier: 0.1235 (0.1235) loss_box_reg: 0.2459 (0.2459) loss_objectness: 0.0022 (0.0022) loss_rpn_box_reg: 0.0114 (0.0114) time: 0.7839 data: 0.1426 max mem: 3647
Epoch: [22] [2/3] eta: 0:00:00 lr: 0.000050 loss: 0.2935 (0.3108) loss_classifier: 0.0916 (0.0985) loss_box_reg: 0.1806 (0.1988) loss_objectness: 0.0023 (0.0027) loss_rpn_box_reg: 0.0114 (0.0108) time: 0.6922 data: 0.0519 max mem: 3647
Epoch: [22] Total time: 0:00:02 (0.7048 s / it)
Epoch: [23] [0/3] eta: 0:00:02 lr: 0.000050 loss: 0.5032 (0.5032) loss_classifier: 0.1444 (0.1444) loss_box_reg: 0.3322 (0.3322) loss_objectness: 0.0063 (0.0063) loss_rpn_box_reg: 0.0203 (0.0203) time: 0.8381 data: 0.1931 max mem: 3647
Epoch: [23] [2/3] eta: 0:00:00 lr: 0.000050 loss: 0.2974 (0.3615) loss_classifier: 0.0981 (0.1125) loss_box_reg: 0.1874 (0.2311) loss_objectness: 0.0022 (0.0033) loss_rpn_box_reg: 0.0134 (0.0146) time: 0.7134 data: 0.0703 max mem: 3647
Epoch: [23] Total time: 0:00:02 (0.7266 s / it)
Epoch: [24] [0/3] eta: 0:00:02 lr: 0.000050 loss: 0.3383 (0.3383) loss_classifier: 0.0925 (0.0925) loss_box_reg: 0.2269 (0.2269) loss_objectness: 0.0042 (0.0042) loss_rpn_box_reg: 0.0147 (0.0147) time: 0.8238 data: 0.1845 max mem: 3647
Epoch: [24] [2/3] eta: 0:00:00 lr: 0.000050 loss: 0.3383 (0.3512) loss_classifier: 0.1069 (0.1058) loss_box_reg: 0.2269 (0.2227) loss_objectness: 0.0042 (0.0069) loss_rpn_box_reg: 0.0147 (0.0158) time: 0.7068 data: 0.0669 max mem: 3647
Epoch: [24] Total time: 0:00:02 (0.7202 s / it)
Epoch: [25] [0/3] eta: 0:00:02 lr: 0.000050 loss: 0.4217 (0.4217) loss_classifier: 0.1144 (0.1144) loss_box_reg: 0.2835 (0.2835) loss_objectness: 0.0044 (0.0044) loss_rpn_box_reg: 0.0194 (0.0194) time: 0.8239 data: 0.1775 max mem: 3647
Epoch: [25] [2/3] eta: 0:00:00 lr: 0.000050 loss: 0.3688 (0.3382) loss_classifier: 0.1124 (0.0941) loss_box_reg: 0.2447 (0.2275) loss_objectness: 0.0037 (0.0034) loss_rpn_box_reg: 0.0104 (0.0132) time: 0.7101 data: 0.0659 max mem: 3647
Epoch: [25] Total time: 0:00:02 (0.7293 s / it)
Epoch: [26] [0/3] eta: 0:00:02 lr: 0.000050 loss: 0.2384 (0.2384) loss_classifier: 0.0700 (0.0700) loss_box_reg: 0.1633 (0.1633) loss_objectness: 0.0001 (0.0001) loss_rpn_box_reg: 0.0049 (0.0049) time: 0.8794 data: 0.2124 max mem: 3647
Epoch: [26] [2/3] eta: 0:00:00 lr: 0.000050 loss: 0.2847 (0.3667) loss_classifier: 0.0863 (0.1093) loss_box_reg: 0.1878 (0.2312) loss_objectness: 0.0007 (0.0117) loss_rpn_box_reg: 0.0100 (0.0145) time: 0.7304 data: 0.0792 max mem: 3647
Epoch: [26] Total time: 0:00:02 (0.7435 s / it)
Epoch: [27] [0/3] eta: 0:00:02 lr: 0.000050 loss: 0.4884 (0.4884) loss_classifier: 0.1323 (0.1323) loss_box_reg: 0.2999 (0.2999) loss_objectness: 0.0347 (0.0347) loss_rpn_box_reg: 0.0215 (0.0215) time: 0.8430 data: 0.2047 max mem: 3647
Epoch: [27] [2/3] eta: 0:00:00 lr: 0.000050 loss: 0.4798 (0.4131) loss_classifier: 0.1323 (0.1153) loss_box_reg: 0.2999 (0.2636) loss_objectness: 0.0107 (0.0153) loss_rpn_box_reg: 0.0215 (0.0189) time: 0.7128 data: 0.0754 max mem: 3647
Epoch: [27] Total time: 0:00:02 (0.7255 s / it)
Epoch: [28] [0/3] eta: 0:00:02 lr: 0.000050 loss: 0.3465 (0.3465) loss_classifier: 0.1084 (0.1084) loss_box_reg: 0.2124 (0.2124) loss_objectness: 0.0057 (0.0057) loss_rpn_box_reg: 0.0201 (0.0201) time: 0.8584 data: 0.2170 max mem: 3647
Epoch: [28] [2/3] eta: 0:00:00 lr: 0.000050 loss: 0.3465 (0.3669) loss_classifier: 0.1130 (0.1211) loss_box_reg: 0.2124 (0.2262) loss_objectness: 0.0057 (0.0046) loss_rpn_box_reg: 0.0143 (0.0150) time: 0.7199 data: 0.0793 max mem: 3647
Epoch: [28] Total time: 0:00:02 (0.7318 s / it)
Epoch: [29] [0/3] eta: 0:00:02 lr: 0.000050 loss: 0.2974 (0.2974) loss_classifier: 0.0815 (0.0815) loss_box_reg: 0.2002 (0.2002) loss_objectness: 0.0030 (0.0030) loss_rpn_box_reg: 0.0128 (0.0128) time: 0.8199 data: 0.1771 max mem: 3647
Epoch: [29] [2/3] eta: 0:00:00 lr: 0.000050 loss: 0.2974 (0.3444) loss_classifier: 0.0816 (0.0951) loss_box_reg: 0.2002 (0.2213) loss_objectness: 0.0038 (0.0145) loss_rpn_box_reg: 0.0128 (0.0136) time: 0.7077 data: 0.0648 max mem: 3647
Epoch: [29] Total time: 0:00:02 (0.7208 s / it)
creating index...
index created!
Test: [0/2] eta: 0:00:00 model_time: 0.1199 (0.1199) evaluator_time: 0.0154 (0.0154) time: 0.2268 data: 0.0877 max mem: 3647
Test: [1/2] eta: 0:00:00 model_time: 0.1165 (0.1182) evaluator_time: 0.0151 (0.0153) time: 0.1838 data: 0.0470 max mem: 3647
Test: Total time: 0:00:00 (0.2030 s / it)
Averaged stats: model_time: 0.1165 (0.1182) evaluator_time: 0.0151 (0.0153)
Accumulating evaluation results...
DONE (t=0.00s).
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.255
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.617
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.142
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.243
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.481
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.015
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.167
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.375
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.358
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.633
Epoch: [30] [0/3] eta: 0:00:02 lr: 0.000050 loss: 0.3005 (0.3005) loss_classifier: 0.0663 (0.0663) loss_box_reg: 0.2156 (0.2156) loss_objectness: 0.0055 (0.0055) loss_rpn_box_reg: 0.0131 (0.0131) time: 0.8615 data: 0.2218 max mem: 3647
Epoch: [30] [2/3] eta: 0:00:00 lr: 0.000050 loss: 0.3005 (0.3167) loss_classifier: 0.0669 (0.0917) loss_box_reg: 0.2156 (0.2095) loss_objectness: 0.0025 (0.0029) loss_rpn_box_reg: 0.0131 (0.0127) time: 0.7251 data: 0.0799 max mem: 3647
Epoch: [30] Total time: 0:00:02 (0.7485 s / it)
Epoch: [31] [0/3] eta: 0:00:03 lr: 0.000050 loss: 0.4379 (0.4379) loss_classifier: 0.1157 (0.1157) loss_box_reg: 0.3021 (0.3021) loss_objectness: 0.0015 (0.0015) loss_rpn_box_reg: 0.0187 (0.0187) time: 1.0022 data: 0.3289 max mem: 3647
Epoch: [31] [2/3] eta: 0:00:00 lr: 0.000050 loss: 0.3240 (0.3516) loss_classifier: 0.1057 (0.1073) loss_box_reg: 0.1974 (0.2258) loss_objectness: 0.0015 (0.0057) loss_rpn_box_reg: 0.0110 (0.0129) time: 0.7771 data: 0.1178 max mem: 3647
Epoch: [31] Total time: 0:00:02 (0.7926 s / it)
Epoch: [32] [0/3] eta: 0:00:02 lr: 0.000050 loss: 0.2672 (0.2672) loss_classifier: 0.0791 (0.0791) loss_box_reg: 0.1704 (0.1704) loss_objectness: 0.0064 (0.0064) loss_rpn_box_reg: 0.0113 (0.0113) time: 0.8175 data: 0.1689 max mem: 3647
Epoch: [32] [2/3] eta: 0:00:00 lr: 0.000050 loss: 0.2672 (0.2922) loss_classifier: 0.0791 (0.0889) loss_box_reg: 0.1704 (0.1883) loss_objectness: 0.0033 (0.0039) loss_rpn_box_reg: 0.0113 (0.0111) time: 0.7111 data: 0.0636 max mem: 3647
Epoch: [32] Total time: 0:00:02 (0.7251 s / it)
Epoch: [33] [0/3] eta: 0:00:02 lr: 0.000050 loss: 0.4819 (0.4819) loss_classifier: 0.1314 (0.1314) loss_box_reg: 0.3099 (0.3099) loss_objectness: 0.0058 (0.0058) loss_rpn_box_reg: 0.0347 (0.0347) time: 0.8582 data: 0.2105 max mem: 3647
Epoch: [33] [2/3] eta: 0:00:00 lr: 0.000050 loss: 0.3551 (0.3385) loss_classifier: 0.1178 (0.1037) loss_box_reg: 0.2275 (0.2154) loss_objectness: 0.0025 (0.0029) loss_rpn_box_reg: 0.0093 (0.0164) time: 0.7232 data: 0.0764 max mem: 3647
Epoch: [33] Total time: 0:00:02 (0.7376 s / it)
Epoch: [34] [0/3] eta: 0:00:02 lr: 0.000050 loss: 0.2516 (0.2516) loss_classifier: 0.0747 (0.0747) loss_box_reg: 0.1578 (0.1578) loss_objectness: 0.0133 (0.0133) loss_rpn_box_reg: 0.0058 (0.0058) time: 0.8055 data: 0.1578 max mem: 3647
Epoch: [34] [2/3] eta: 0:00:00 lr: 0.000050 loss: 0.2702 (0.2885) loss_classifier: 0.0780 (0.0819) loss_box_reg: 0.1691 (0.1870) loss_objectness: 0.0104 (0.0084) loss_rpn_box_reg: 0.0127 (0.0112) time: 0.7072 data: 0.0596 max mem: 3647
Epoch: [34] Total time: 0:00:02 (0.7210 s / it)
Epoch: [35] [0/3] eta: 0:00:02 lr: 0.000050 loss: 0.5490 (0.5490) loss_classifier: 0.1747 (0.1747) loss_box_reg: 0.3409 (0.3409) loss_objectness: 0.0049 (0.0049) loss_rpn_box_reg: 0.0284 (0.0284) time: 0.8806 data: 0.2265 max mem: 3647
Epoch: [35] [2/3] eta: 0:00:00 lr: 0.000050 loss: 0.3028 (0.3548) loss_classifier: 0.0918 (0.1096) loss_box_reg: 0.1928 (0.2253) loss_objectness: 0.0038 (0.0040) loss_rpn_box_reg: 0.0151 (0.0160) time: 0.7326 data: 0.0837 max mem: 3647
Epoch: [35] Total time: 0:00:02 (0.7466 s / it)
Epoch: [36] [0/3] eta: 0:00:02 lr: 0.000050 loss: 0.2617 (0.2617) loss_classifier: 0.0762 (0.0762) loss_box_reg: 0.1760 (0.1760) loss_objectness: 0.0002 (0.0002) loss_rpn_box_reg: 0.0092 (0.0092) time: 0.8579 data: 0.2125 max mem: 3647
Epoch: [36] [2/3] eta: 0:00:00 lr: 0.000050 loss: 0.2894 (0.3318) loss_classifier: 0.0762 (0.0972) loss_box_reg: 0.1955 (0.2166) loss_objectness: 0.0017 (0.0056) loss_rpn_box_reg: 0.0092 (0.0124) time: 0.7254 data: 0.0774 max mem: 3647
Epoch: [36] Total time: 0:00:02 (0.7454 s / it)
Epoch: [37] [0/3] eta: 0:00:02 lr: 0.000050 loss: 0.2559 (0.2559) loss_classifier: 0.0794 (0.0794) loss_box_reg: 0.1659 (0.1659) loss_objectness: 0.0007 (0.0007) loss_rpn_box_reg: 0.0099 (0.0099) time: 0.9276 data: 0.2557 max mem: 3647
Epoch: [37] [2/3] eta: 0:00:00 lr: 0.000050 loss: 0.2850 (0.3091) loss_classifier: 0.0794 (0.0933) loss_box_reg: 0.1896 (0.1964) loss_objectness: 0.0030 (0.0026) loss_rpn_box_reg: 0.0181 (0.0168) time: 0.7574 data: 0.0957 max mem: 3647
Epoch: [37] Total time: 0:00:02 (0.7717 s / it)
Epoch: [38] [0/3] eta: 0:00:02 lr: 0.000050 loss: 0.3397 (0.3397) loss_classifier: 0.1174 (0.1174) loss_box_reg: 0.2036 (0.2036) loss_objectness: 0.0046 (0.0046) loss_rpn_box_reg: 0.0141 (0.0141) time: 0.8152 data: 0.1569 max mem: 3647
Epoch: [38] [2/3] eta: 0:00:00 lr: 0.000050 loss: 0.3099 (0.3183) loss_classifier: 0.1028 (0.1019) loss_box_reg: 0.2026 (0.1997) loss_objectness: 0.0048 (0.0049) loss_rpn_box_reg: 0.0122 (0.0117) time: 0.7138 data: 0.0610 max mem: 3647
Epoch: [38] Total time: 0:00:02 (0.7272 s / it)
Epoch: [39] [0/3] eta: 0:00:02 lr: 0.000050 loss: 0.2082 (0.2082) loss_classifier: 0.0588 (0.0588) loss_box_reg: 0.1393 (0.1393) loss_objectness: 0.0063 (0.0063) loss_rpn_box_reg: 0.0038 (0.0038) time: 0.8077 data: 0.1528 max mem: 3647
Epoch: [39] [2/3] eta: 0:00:00 lr: 0.000050 loss: 0.3133 (0.2963) loss_classifier: 0.0920 (0.0895) loss_box_reg: 0.1932 (0.1872) loss_objectness: 0.0039 (0.0035) loss_rpn_box_reg: 0.0167 (0.0161) time: 0.7095 data: 0.0582 max mem: 3647
Epoch: [39] Total time: 0:00:02 (0.7232 s / it)
creating index...
index created!
Test: [0/2] eta: 0:00:00 model_time: 0.1210 (0.1210) evaluator_time: 0.0116 (0.0116) time: 0.2269 data: 0.0905 max mem: 3647
Test: [1/2] eta: 0:00:00 model_time: 0.1179 (0.1195) evaluator_time: 0.0116 (0.0146) time: 0.1856 data: 0.0484 max mem: 3647
Test: Total time: 0:00:00 (0.2078 s / it)
Averaged stats: model_time: 0.1179 (0.1195) evaluator_time: 0.0116 (0.0146)
Accumulating evaluation results...
DONE (t=0.01s).
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.251
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.605
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.143
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.240
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.578
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.015
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.165
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.354
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.336
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.633
Epoch: [40] [0/3] eta: 0:00:02 lr: 0.000050 loss: 0.2210 (0.2210) loss_classifier: 0.0574 (0.0574) loss_box_reg: 0.1558 (0.1558) loss_objectness: 0.0021 (0.0021) loss_rpn_box_reg: 0.0057 (0.0057) time: 0.8345 data: 0.1841 max mem: 3647
Epoch: [40] [2/3] eta: 0:00:00 lr: 0.000050 loss: 0.2716 (0.2826) loss_classifier: 0.0964 (0.0922) loss_box_reg: 0.1558 (0.1739) loss_objectness: 0.0021 (0.0050) loss_rpn_box_reg: 0.0083 (0.0114) time: 0.7180 data: 0.0666 max mem: 3647
Epoch: [40] Total time: 0:00:02 (0.7327 s / it)
Epoch: [41] [0/3] eta: 0:00:02 lr: 0.000050 loss: 0.1619 (0.1619) loss_classifier: 0.0563 (0.0563) loss_box_reg: 0.0912 (0.0912) loss_objectness: 0.0025 (0.0025) loss_rpn_box_reg: 0.0119 (0.0119) time: 0.8362 data: 0.1817 max mem: 3647
Epoch: [41] [2/3] eta: 0:00:00 lr: 0.000050 loss: 0.3029 (0.2620) loss_classifier: 0.0984 (0.0877) loss_box_reg: 0.1921 (0.1591) loss_objectness: 0.0025 (0.0019) loss_rpn_box_reg: 0.0119 (0.0133) time: 0.7199 data: 0.0660 max mem: 3647
Epoch: [41] Total time: 0:00:02 (0.7356 s / it)
Epoch: [42] [0/3] eta: 0:00:02 lr: 0.000050 loss: 0.1882 (0.1882) loss_classifier: 0.0562 (0.0562) loss_box_reg: 0.1220 (0.1220) loss_objectness: 0.0029 (0.0029) loss_rpn_box_reg: 0.0071 (0.0071) time: 0.9894 data: 0.3164 max mem: 3647
Epoch: [42] [2/3] eta: 0:00:00 lr: 0.000050 loss: 0.3239 (0.2916) loss_classifier: 0.1068 (0.0925) loss_box_reg: 0.1941 (0.1833) loss_objectness: 0.0035 (0.0034) loss_rpn_box_reg: 0.0112 (0.0123) time: 0.7823 data: 0.1192 max mem: 3647
Epoch: [42] Total time: 0:00:02 (0.8035 s / it)
Epoch: [43] [0/3] eta: 0:00:02 lr: 0.000050 loss: 0.2215 (0.2215) loss_classifier: 0.0764 (0.0764) loss_box_reg: 0.1288 (0.1288) loss_objectness: 0.0002 (0.0002) loss_rpn_box_reg: 0.0161 (0.0161) time: 0.8807 data: 0.2291 max mem: 3647
Epoch: [43] [2/3] eta: 0:00:00 lr: 0.000050 loss: 0.2215 (0.3100) loss_classifier: 0.0764 (0.0896) loss_box_reg: 0.1612 (0.2006) loss_objectness: 0.0002 (0.0034) loss_rpn_box_reg: 0.0161 (0.0165) time: 0.7370 data: 0.0812 max mem: 3647
Epoch: [43] Total time: 0:00:02 (0.7499 s / it)
Epoch: [44] [0/3] eta: 0:00:02 lr: 0.000050 loss: 0.3605 (0.3605) loss_classifier: 0.1084 (0.1084) loss_box_reg: 0.2204 (0.2204) loss_objectness: 0.0003 (0.0003) loss_rpn_box_reg: 0.0313 (0.0313) time: 0.8611 data: 0.2020 max mem: 3647
Epoch: [44] [2/3] eta: 0:00:00 lr: 0.000050 loss: 0.3109 (0.2763) loss_classifier: 0.1009 (0.0871) loss_box_reg: 0.1943 (0.1721) loss_objectness: 0.0003 (0.0018) loss_rpn_box_reg: 0.0108 (0.0152) time: 0.7341 data: 0.0731 max mem: 3647
Epoch: [44] Total time: 0:00:02 (0.7499 s / it)
Epoch: [45] [0/3] eta: 0:00:02 lr: 0.000050 loss: 0.4685 (0.4685) loss_classifier: 0.1695 (0.1695) loss_box_reg: 0.2786 (0.2786) loss_objectness: 0.0024 (0.0024) loss_rpn_box_reg: 0.0180 (0.0180) time: 0.9209 data: 0.2610 max mem: 3647
Epoch: [45] [2/3] eta: 0:00:00 lr: 0.000050 loss: 0.4193 (0.3608) loss_classifier: 0.1126 (0.1147) loss_box_reg: 0.2786 (0.2287) loss_objectness: 0.0007 (0.0011) loss_rpn_box_reg: 0.0180 (0.0163) time: 0.7520 data: 0.0957 max mem: 3647
Epoch: [45] Total time: 0:00:02 (0.7646 s / it)
Epoch: [46] [0/3] eta: 0:00:02 lr: 0.000050 loss: 0.3659 (0.3659) loss_classifier: 0.1045 (0.1045) loss_box_reg: 0.2429 (0.2429) loss_objectness: 0.0007 (0.0007) loss_rpn_box_reg: 0.0177 (0.0177) time: 0.8620 data: 0.2095 max mem: 3647
Epoch: [46] [2/3] eta: 0:00:00 lr: 0.000050 loss: 0.3620 (0.3318) loss_classifier: 0.1045 (0.0997) loss_box_reg: 0.2302 (0.2169) loss_objectness: 0.0007 (0.0016) loss_rpn_box_reg: 0.0116 (0.0136) time: 0.7350 data: 0.0767 max mem: 3647
Epoch: [46] Total time: 0:00:02 (0.7475 s / it)
Epoch: [47] [0/3] eta: 0:00:02 lr: 0.000050 loss: 0.3012 (0.3012) loss_classifier: 0.0959 (0.0959) loss_box_reg: 0.1878 (0.1878) loss_objectness: 0.0038 (0.0038) loss_rpn_box_reg: 0.0137 (0.0137) time: 0.9080 data: 0.2505 max mem: 3647
Epoch: [47] [2/3] eta: 0:00:00 lr: 0.000050 loss: 0.3214 (0.3260) loss_classifier: 0.1025 (0.1031) loss_box_reg: 0.2062 (0.2075) loss_objectness: 0.0010 (0.0016) loss_rpn_box_reg: 0.0137 (0.0137) time: 0.7571 data: 0.0951 max mem: 3647
Epoch: [47] Total time: 0:00:02 (0.7773 s / it)
Epoch: [48] [0/3] eta: 0:00:02 lr: 0.000050 loss: 0.3281 (0.3281) loss_classifier: 0.0989 (0.0989) loss_box_reg: 0.2130 (0.2130) loss_objectness: 0.0061 (0.0061) loss_rpn_box_reg: 0.0100 (0.0100) time: 0.9032 data: 0.2268 max mem: 3647
Epoch: [48] [2/3] eta: 0:00:00 lr: 0.000050 loss: 0.3281 (0.2958) loss_classifier: 0.0989 (0.0947) loss_box_reg: 0.2130 (0.1859) loss_objectness: 0.0061 (0.0055) loss_rpn_box_reg: 0.0100 (0.0096) time: 0.7540 data: 0.0842 max mem: 3647
Epoch: [48] Total time: 0:00:02 (0.7687 s / it)
Epoch: [49] [0/3] eta: 0:00:02 lr: 0.000050 loss: 0.3644 (0.3644) loss_classifier: 0.1112 (0.1112) loss_box_reg: 0.2324 (0.2324) loss_objectness: 0.0064 (0.0064) loss_rpn_box_reg: 0.0144 (0.0144) time: 0.8705 data: 0.2137 max mem: 3647
Epoch: [49] [2/3] eta: 0:00:00 lr: 0.000050 loss: 0.2680 (0.2842) loss_classifier: 0.0849 (0.0895) loss_box_reg: 0.1737 (0.1798) loss_objectness: 0.0064 (0.0058) loss_rpn_box_reg: 0.0065 (0.0091) time: 0.7366 data: 0.0777 max mem: 3647
Epoch: [49] Total time: 0:00:02 (0.7493 s / it)
creating index...
index created!
Test: [0/2] eta: 0:00:00 model_time: 0.1219 (0.1219) evaluator_time: 0.0149 (0.0149) time: 0.2292 data: 0.0888 max mem: 3647
Test: [1/2] eta: 0:00:00 model_time: 0.1171 (0.1195) evaluator_time: 0.0149 (0.0149) time: 0.1851 data: 0.0476 max mem: 3647
Test: Total time: 0:00:00 (0.2061 s / it)
Averaged stats: model_time: 0.1171 (0.1195) evaluator_time: 0.0149 (0.0149)
Accumulating evaluation results...
DONE (t=0.00s).
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.286
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.649
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.185
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.277
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.437
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.017
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.173
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.394
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.380
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.600
Epoch: [50] [0/3] eta: 0:00:02 lr: 0.000050 loss: 0.3796 (0.3796) loss_classifier: 0.0987 (0.0987) loss_box_reg: 0.2554 (0.2554) loss_objectness: 0.0029 (0.0029) loss_rpn_box_reg: 0.0226 (0.0226) time: 0.8663 data: 0.2066 max mem: 3647
Epoch: [50] [2/3] eta: 0:00:00 lr: 0.000050 loss: 0.2387 (0.2848) loss_classifier: 0.0987 (0.0890) loss_box_reg: 0.1551 (0.1808) loss_objectness: 0.0007 (0.0014) loss_rpn_box_reg: 0.0112 (0.0136) time: 0.7383 data: 0.0757 max mem: 3647
Epoch: [50] Total time: 0:00:02 (0.7524 s / it)
Epoch: [51] [0/3] eta: 0:00:02 lr: 0.000050 loss: 0.1809 (0.1809) loss_classifier: 0.0516 (0.0516) loss_box_reg: 0.1246 (0.1246) loss_objectness: 0.0012 (0.0012) loss_rpn_box_reg: 0.0035 (0.0035) time: 0.8105 data: 0.1486 max mem: 3647
Epoch: [51] [2/3] eta: 0:00:00 lr: 0.000050 loss: 0.2681 (0.2486) loss_classifier: 0.0838 (0.0764) loss_box_reg: 0.1742 (0.1618) loss_objectness: 0.0012 (0.0021) loss_rpn_box_reg: 0.0097 (0.0083) time: 0.7212 data: 0.0541 max mem: 3647
Epoch: [51] Total time: 0:00:02 (0.7340 s / it)
Epoch: [52] [0/3] eta: 0:00:02 lr: 0.000050 loss: 0.3794 (0.3794) loss_classifier: 0.1246 (0.1246) loss_box_reg: 0.2422 (0.2422) loss_objectness: 0.0002 (0.0002) loss_rpn_box_reg: 0.0124 (0.0124) time: 0.8769 data: 0.2158 max mem: 3647
Epoch: [52] [2/3] eta: 0:00:00 lr: 0.000050 loss: 0.3556 (0.3305) loss_classifier: 0.1120 (0.1031) loss_box_reg: 0.2256 (0.2071) loss_objectness: 0.0005 (0.0063) loss_rpn_box_reg: 0.0124 (0.0140) time: 0.7454 data: 0.0799 max mem: 3647
Epoch: [52] Total time: 0:00:02 (0.7653 s / it)
Epoch: [53] [0/3] eta: 0:00:02 lr: 0.000050 loss: 0.1895 (0.1895) loss_classifier: 0.0684 (0.0684) loss_box_reg: 0.1133 (0.1133) loss_objectness: 0.0004 (0.0004) loss_rpn_box_reg: 0.0074 (0.0074) time: 0.9137 data: 0.2358 max mem: 3647
Epoch: [53] [2/3] eta: 0:00:00 lr: 0.000050 loss: 0.2840 (0.2671) loss_classifier: 0.0931 (0.0891) loss_box_reg: 0.1707 (0.1639) loss_objectness: 0.0020 (0.0027) loss_rpn_box_reg: 0.0119 (0.0113) time: 0.7626 data: 0.0876 max mem: 3647
Epoch: [53] Total time: 0:00:02 (0.7815 s / it)
Epoch: [54] [0/3] eta: 0:00:02 lr: 0.000050 loss: 0.2575 (0.2575) loss_classifier: 0.0771 (0.0771) loss_box_reg: 0.1609 (0.1609) loss_objectness: 0.0036 (0.0036) loss_rpn_box_reg: 0.0159 (0.0159) time: 0.8459 data: 0.1804 max mem: 3647
Epoch: [54] [2/3] eta: 0:00:00 lr: 0.000050 loss: 0.2575 (0.2482) loss_classifier: 0.0766 (0.0747) loss_box_reg: 0.1609 (0.1615) loss_objectness: 0.0010 (0.0017) loss_rpn_box_reg: 0.0087 (0.0104) time: 0.7361 data: 0.0658 max mem: 3647
Epoch: [54] Total time: 0:00:02 (0.7496 s / it)
Epoch: [55] [0/3] eta: 0:00:02 lr: 0.000050 loss: 0.2437 (0.2437) loss_classifier: 0.0689 (0.0689) loss_box_reg: 0.1619 (0.1619) loss_objectness: 0.0031 (0.0031) loss_rpn_box_reg: 0.0099 (0.0099) time: 0.8780 data: 0.2078 max mem: 3647
Epoch: [55] [2/3] eta: 0:00:00 lr: 0.000050 loss: 0.2437 (0.2808) loss_classifier: 0.0689 (0.0839) loss_box_reg: 0.1619 (0.1804) loss_objectness: 0.0031 (0.0039) loss_rpn_box_reg: 0.0099 (0.0126) time: 0.7473 data: 0.0786 max mem: 3647
Epoch: [55] Total time: 0:00:02 (0.7606 s / it)
Epoch: [56] [0/3] eta: 0:00:02 lr: 0.000050 loss: 0.2796 (0.2796) loss_classifier: 0.0983 (0.0983) loss_box_reg: 0.1648 (0.1648) loss_objectness: 0.0033 (0.0033) loss_rpn_box_reg: 0.0133 (0.0133) time: 0.8376 data: 0.1709 max mem: 3647
Epoch: [56] [2/3] eta: 0:00:00 lr: 0.000050 loss: 0.2733 (0.2596) loss_classifier: 0.0901 (0.0816) loss_box_reg: 0.1648 (0.1631) loss_objectness: 0.0033 (0.0043) loss_rpn_box_reg: 0.0097 (0.0105) time: 0.7325 data: 0.0620 max mem: 3647
Epoch: [56] Total time: 0:00:02 (0.7492 s / it)
Epoch: [57] [0/3] eta: 0:00:02 lr: 0.000050 loss: 0.3704 (0.3704) loss_classifier: 0.1550 (0.1550) loss_box_reg: 0.2029 (0.2029) loss_objectness: 0.0006 (0.0006) loss_rpn_box_reg: 0.0120 (0.0120) time: 0.8709 data: 0.2071 max mem: 3647
Epoch: [57] [2/3] eta: 0:00:00 lr: 0.000050 loss: 0.3100 (0.3019) loss_classifier: 0.0915 (0.1122) loss_box_reg: 0.2029 (0.1782) loss_objectness: 0.0006 (0.0005) loss_rpn_box_reg: 0.0120 (0.0110) time: 0.7432 data: 0.0754 max mem: 3647
Epoch: [57] Total time: 0:00:02 (0.7563 s / it)
Epoch: [58] [0/3] eta: 0:00:02 lr: 0.000050 loss: 0.1926 (0.1926) loss_classifier: 0.0694 (0.0694) loss_box_reg: 0.1172 (0.1172) loss_objectness: 0.0002 (0.0002) loss_rpn_box_reg: 0.0058 (0.0058) time: 0.8239 data: 0.1602 max mem: 3647
Epoch: [58] [2/3] eta: 0:00:00 lr: 0.000050 loss: 0.2117 (0.2344) loss_classifier: 0.0694 (0.0671) loss_box_reg: 0.1625 (0.1544) loss_objectness: 0.0002 (0.0040) loss_rpn_box_reg: 0.0064 (0.0089) time: 0.7339 data: 0.0643 max mem: 3647
Epoch: [58] Total time: 0:00:02 (0.7536 s / it)
Epoch: [59] [0/3] eta: 0:00:02 lr: 0.000050 loss: 0.1862 (0.1862) loss_classifier: 0.0611 (0.0611) loss_box_reg: 0.1191 (0.1191) loss_objectness: 0.0004 (0.0004) loss_rpn_box_reg: 0.0055 (0.0055) time: 0.9154 data: 0.2197 max mem: 3647
Epoch: [59] [2/3] eta: 0:00:00 lr: 0.000050 loss: 0.2972 (0.2875) loss_classifier: 0.0843 (0.0881) loss_box_reg: 0.1983 (0.1832) loss_objectness: 0.0052 (0.0047) loss_rpn_box_reg: 0.0060 (0.0115) time: 0.7626 data: 0.0829 max mem: 3647
Epoch: [59] Total time: 0:00:02 (0.7781 s / it)
creating index...
index created!
Test: [0/2] eta: 0:00:00 model_time: 0.1221 (0.1221) evaluator_time: 0.0247 (0.0247) time: 0.2410 data: 0.0908 max mem: 3647
Test: [1/2] eta: 0:00:00 model_time: 0.1191 (0.1206) evaluator_time: 0.0131 (0.0189) time: 0.1912 data: 0.0486 max mem: 3647
Test: Total time: 0:00:00 (0.2093 s / it)
Averaged stats: model_time: 0.1191 (0.1206) evaluator_time: 0.0131 (0.0189)
Accumulating evaluation results...
DONE (t=0.01s).
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.278
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.622
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.217
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.267
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.480
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.015
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.188
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.394
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.378
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.633
7.3. Prediction on a test image
First, we create a function to combine the image and the prediction result.
[28]:
import matplotlib.pyplot as plt
labelmap = ['others', 'tuna']
def plot_image_box(img, minimum_score=0.8):
# Evaluation
model.eval()
with torch.no_grad():
prediction = model([img.to(device)])
# Display the predicted result
plt.figure(figsize=(15, 20))
plt.imshow(img.numpy().transpose(1,2,0))
currentAxis = plt.gca()
for i, box in enumerate(prediction[0]['boxes']):
xmin, ymin, xmax, ymax = box.cpu().numpy().astype('int64')
label = prediction[0]['labels'][i].cpu().numpy()
score = prediction[0]['scores'][i].cpu().numpy()
print(xmin, ymin, xmax, ymax,label,score)
if score > minimum_score:
coords = (xmin, ymin), xmax-xmin+1, ymax-ymin+1
currentAxis.add_patch(plt.Rectangle(*coords, fill=False, edgecolor='yellow', linewidth=2))
display_txt = '{} {:0.2f}'.format(labelmap[label], score)
currentAxis.text(xmin, ymin, display_txt, bbox={'facecolor':'yellow', 'alpha':0.5})
plt.show()
[29]:
# Select one image from the evaluation data and display the result.
img, _ = dataset_test[0]
plot_image_box(img, minimum_score=0.8)
364 177 464 237 1 0.99954814
434 286 524 336 1 0.9993298
542 272 670 327 1 0.9990403
470 198 587 249 1 0.9988784
653 162 723 221 1 0.9985947
618 303 724 350 1 0.9981419
682 232 764 295 1 0.9970728
535 131 637 189 1 0.9965342
510 350 674 432 1 0.9916945
172 335 266 376 1 0.99080545
649 417 805 484 1 0.98945236
790 401 865 456 1 0.9811653
687 225 774 332 1 0.9450546
666 385 765 417 1 0.94444937
413 359 466 411 1 0.9423702
59 305 148 341 1 0.92740476
57 390 218 435 1 0.917934
228 387 399 440 1 0.8833836
660 374 711 415 1 0.83972514
239 391 324 428 1 0.8168078
267 279 322 333 1 0.794034
585 155 650 205 1 0.75297713
794 343 842 390 1 0.6317944
794 375 861 496 1 0.5017062
306 319 354 377 1 0.3824813
492 392 574 426 1 0.302111
569 286 717 341 1 0.25219503
628 248 770 349 1 0.23413819
349 331 415 364 1 0.23021588
546 145 657 202 1 0.21353845
378 317 440 345 1 0.17509954
406 365 471 393 1 0.1725177
524 263 752 343 1 0.11276494
532 388 673 424 1 0.06586922
351 315 433 360 1 0.056577183
69 391 160 425 1 0.056155566
401 356 498 404 1 0.05515057

7.4. Application to Video
Let’s process a video using the trained model.
First, we use cv2 to create a function that processes images at the NumPy level. When inputting the model, we need to change the (height, width, ch) in cv2 to (width, height, ch), which fits the PIL.
[30]:
import cv2
def bounding(image, boxes, labels, scores, min_score):
"""Add bounding boxes
image: RGB image [height, width, 3]
boxes: [num_instance, xmin, ymin, xmax, ymax]
Returns result image.
"""
# Bounding box
N = boxes.shape[0]
for i in range(N):
label = labels[i]
color = (255,255,255)
if label == 1: color = (255,255,0)
xmin, ymin, xmax, ymax = boxes[i].astype('int64')
if (label in {1}) and (scores[i] > min_score):
cv2.rectangle(image, (xmin,ymin), (xmax,ymax), color, thickness=2)
return image.astype(np.uint8)
image = images[7]
print(image.shape)
img = torch.as_tensor(image.transpose(2,0,1)/255.0, dtype=torch.float32) #モデルの入力の形式に直す
with torch.no_grad():
prediction = model([img.to(device)])
new_image = bounding(image.copy(),
prediction[0]['boxes'].cpu().numpy(),
prediction[0]['labels'].cpu().numpy().astype(np.uint32),
prediction[0]['scores'].cpu().numpy(),
min_score=0.7)
plt.imshow(new_image)
(640, 1137, 3)
[30]:
<matplotlib.image.AxesImage at 0x783de1a76290>

Next, create a function that processes the video frame by frame and recreates a new video.
[31]:
def bounding_video(video_path, max_frame=100):
import datetime
import cv2
# Video capture
vcapture = cv2.VideoCapture(video_path)
width = int(vcapture.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(vcapture.get(cv2.CAP_PROP_FRAME_HEIGHT))
fps = vcapture.get(cv2.CAP_PROP_FPS)
# Define codec and create video writer
file_name = "bounding_{:%Y%m%dT%H%M%S}.mp4".format(datetime.datetime.now())
vwriter = cv2.VideoWriter(file_name,
cv2.VideoWriter_fourcc(*'mp4v'),
fps, (width, height))
count = 0
success = True
model.eval()
while success and (count<max_frame):
if count % 10 ==0: print("frame: ", count)
# Read next image
success, image = vcapture.read()
if success:
# OpenCV returns images as BGR, convert to RGB
image = image[..., ::-1]
img = torch.as_tensor(image.transpose(2,0,1)/255.0, dtype=torch.float32)
# Detect objects
with torch.no_grad():
prediction = model([img.to(device)])
new_image = bounding(image.copy(),
prediction[0]['boxes'].cpu().numpy(),
prediction[0]['labels'].cpu().numpy().astype(np.uint32),
prediction[0]['scores'].cpu().numpy(),
min_score=0.7)
# Add image to video writer
vwriter.write(np.uint8(new_image[:,:,::-1]))#RGBが逆になる
count += 1
plt.imshow(new_image)
vwriter.release()
print("Saved to ", file_name)
Let’s create a movie. mp4 file will be saved, so please download and watch it from the folder symbol on the left.
[32]:
bounding_video('/content/tuna.mp4', max_frame=200) # execute to the last frame if max_frame is large as max_frame=10000
frame: 0
frame: 10
frame: 20
frame: 30
frame: 40
frame: 50
frame: 60
frame: 70
frame: 80
frame: 90
frame: 100
frame: 110
frame: 120
frame: 130
frame: 140
frame: 150
frame: 160
frame: 170
frame: 180
frame: 190
Saved to bounding_20250122T054541.mp4

Copy right: 竹縄 知之