...
HTML |
---|
<input type="hidden" name="templateid" value="6d5df9e5-f557-4eab-b01f-9d5d52527323"/> |
Page properties | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| |||||||||||||
|
Multiexcerpt include | ||||
---|---|---|---|---|
|
...
Table of Contents outline true style none printable false
Introduction
This guide describes the tools and provides a few key elements to get your phyBOARD-Pollux AI kit started. These include:
- creating a celebrity look-a-like demo and run inference on an NPU
- quantize the Deep Learning model to run on an NPU
Orientation
If you want to build our BSP please continue to read here: Build the Phytec BSP
...
Note | ||||
---|---|---|---|---|
| ||||
For further information on the phyBOARD-Pollux, go to https://www.phytec.de/produkte/system-on-modules/phycore-imx-8m-plus/#downloads/. There you will find the following documentation:
On top of these standard manuals and guides, PHYTEC will also provide Product Change Notifications, Application Notes, and Technical Notes. These will be done on a case-by-case basis. For more information or details regarding the phyCORE-i.MX 8M Plus / phyBOARD-Pollux, please go to our phyCORE i.MX 8M Plus Product page or contact the PHYTEC Sales department. |
phyBOARD-Pollux Quickstart
For instructions on how to connect, boot up, and begin the demo on your kit, head to: L-1016e.A0 phyCORE-i.MX 8M Plus AI Kit Quickstart.
Prerequisites
Anchor | ||||
---|---|---|---|---|
|
Before you begin with this manual, there are a few requirements that must be met.
Software
Your PC will need to be using:
- Python 3.6+ environment (we recommend Anaconda with a virtual environment)
- TensorFlow 2.x
- numpy
- opencv-phython
- pandas (for preparation)
- tflite_runtime
- and the tf.keras-vggface model
- If you use conda, you can clone the environment with this conda environment-file: here (latest: TF2.3envfile.yml)
- To install the tflite_runtime, download this wheel file and install via pip install path_to_file. For example go to file directory where you downloaded your wheel file and run "pip install tflite_runtime-2.5.0-cp36-cp36m-linux_x86_64.whl".
- Install mtcnn detector 0.1.1 instead of 0.1.0 to filter your images by embeddings comparison.
Downloads
There are a few files that can be downloaded to help with running the phyBOARD-Pollux AI kit:
- The environment file to clone the environment can be found here (latest: TF2.3envfile.yml). You can easily install all required packages by first cloning the git repository and running "conda env create -f <path to yml file>".
- If you intend to use the latest environment version, make sure to comment line 205 (keras-vggface-tf==0.7) and line 227 (tflite-runtime==2.5.0) in the yml file. Install the keras-vggface-tf model and the tflite-runtime from the links above.
- You can find the model and installation instructions here.
- The demo code, as described here, can be found here.
- The demo as preinstalled on the device, running optimized in a GUI, can be found here.
Building the PHYTEC BSP
The BSP shipped with the phyCORE-i.MX 8M Plus AI Kit is based on the standard PHYTEC BSP of the phyCORE-i.MX 8M Plus. This means building the BSP with the help of our phyLinux script is quite similar to the standard BSP.
Note | ||
---|---|---|
| ||
This kit is provided with a special SD-Card image, which might differ from our general purpose evaluation kit SD-Card Image based on the standard PHYTEC BSP. So if you want to use any of the other manuals provided for this kit, you might need to download the standard phytec image in order to ensure the functions described in those other manuals are available. |
Get the PHYTEC BSP
- Create a fresh project directory:
...
Code Block |
---|
host$ cd ~/yocto host$ wget https://download.phytec.de/Software/Linux/Yocto/Tools/phyLinux host$ chmod +x phyLinux host$ MACHINE=phyboard-pollux-imx8mp-1 DISTRO=yogurt-vendor-xwayland ./phyLinux init -p topic -r PD-BSP-Yocto-CelebrityFaceMatch-i.MX8MP-v0.2 |
Start the Build
After you downloaded all the metadata with phyLinux init, you have to set up the shell environment variables. This needs to be done every time you open a new shell for starting builds. We use the shell script provided by Poky in its default configuration. From the root of your project directory type:
...
For more information and documentation of the BSP and our Yocto Distribution please take a look at the following two manuals:
- Yocto Manual: Yocto Reference Manual (kirkstone) (L-813e.A13 Yocto Reference Manual (kirkstone)
- i.MX 8MP BSP Manual: L-1017e.Ax BSP Manual - phyCORE-i.MX 8M Plus BSP Manual Head(L-1017e.Ax) Head#NPU
The Celebrity Face Match Demo
Anchor | ||||
---|---|---|---|---|
|
Scroll Title | ||||
---|---|---|---|---|
| ||||
...
For more general information on the process we used, please check the section L-1015e.Ax phyCORE-i.MX 8M Plus AI Kit Guide Head Further Reading.
Preparation
Scroll Title | ||||
---|---|---|---|---|
| ||||
As shown in the block diagram above, we are using a pre-trained network. We are using a pre-trained network as the task of facial recognition has been very well accomplished by different research groups. We used the network from Refik Can Malli (rcmalli), which was originally trained by Q. Cao on the FaceVGG2 dataset. As rcmalli's model was written with TensorFlow 1.14.0 and Keras 2.2.4, we updated it to TensorFlow version 2.2.0. You can find the updated model here. However, we are still using the weights from the original model.
What are Embeddings
To identify a human face, we need to identify specific facial features such as the length of the nose, the distance between the eyes, the angle between nose and mouth, etc.
...
More information about embedding can be found here.
Creating Embeddings
The pre-trained network we are using gives us output values for 8631 classes as it was trained on this amount of classes.
...
Scroll Title | ||||
---|---|---|---|---|
| ||||
Scroll Title | ||||
---|---|---|---|---|
| ||||
As you can see, the information gets more detailed the deeper you go into the network. The following layer inputs are composed of combinations of the previous layer outputs. in the last layers (170 of 176), you can see the information reached almost pixel-level details.
...
With this truncated network, we can now create a library of embeddings of celebrity faces or faces of our known employees and compare them to new faces later.
Implementation
You can install the rcmalli model as described on their git. It will work the same way as the following method, however, you would have to work with a TensorFlow version < 1.15.3 and install Keras v2.2.4. We will continue with our updated version.
...
Code Block | ||
---|---|---|
| ||
1 from keras_vggface_TF.vggfaceTF import VGGFace 2 from keras_vggface_TF import utils 3 pretrained_model = VGGFace(model='resnet50', include_top=False, input_shape=(224, 224, 3), pooling='avg') # pooling: None, avg or max |
Gist File
Quantize your Model to int8
Tip | ||
---|---|---|
| ||
If you are not planning to run your model on an embedded device, you can proceed to section L-1015e.Ax phyCORE-i.MX 8M Plus AI Kit Guide Head. Create a Database. More details on quantization can be found in the section Quantize Your Deep Learning Model to Run on an NPU. |
For our model, we are using the NPU of NXP's i.MX 8M Plus. The NXP NPU requires the model to be a TFlite or PyTorch model. Either must be fully quantized to int8.
...
For more information on details on quantization, using different TensorFlow versions, or the ins and outs to be quantized, go to L-1015e.Ax phyCORE-i.MX 8M Plus AI Kit Guide Headto Quantize Your Deep Learning Model to Run on an NPU.
Create a Database
Anchor | ||||
---|---|---|---|---|
|
Note | ||
---|---|---|
| ||
If you are looking to use this commercially, please read this article to learn how to tune a license-free dataset to perform better. |
...
Code Block | ||
---|---|---|
| ||
#Removing the files lowembeddings=[] removelist=[] for idx in EMBEDDINGS_temp.index: if EMBEDDINGS_temp.MeanEmb[idx] > 100: print(EMBEDDINGS_temp.MeanEmb[idx]) print('removing: ' + str(path_crops / EMBEDDINGS_temp.Name[idx] / EMBEDDINGS_temp.File[idx])) os.remove(str(path_crops / EMBEDDINGS_temp.Name[idx] / EMBEDDINGS_temp.File[idx]))# Remove the file EMBEDDINGS_modified=EMBEDDINGS_temp.drop(EMBEDDINGS_temp.index[idx]) # Remove that line from the Embeddings lowembeddings.append(str(Path(path_crops / celeb / f))) #Add info to file filename_csv= path_crops / 'modified.csv' filename_json= path_crops / 'modified.json' EMBEDDINGS_modified.to_csv(Path(Path.cwd() / filename_csv), index=False) EMBEDDINGS_modified.to_json(Path(Path.cwd() / filename_json)) print(time.time()-time1) |
Prepare the Dataset and Create an "Only Faces Dataset"
The model expects 224x224 sized images. We are also looking for facial embedding, so we first extract the faces from our dataset and resize them to 224x224. You can use any facial detection algorithm. A good example is MTCNN. As we will later use openCV for facial detection, we will also use an OpenCV variant with a haarcascade classifier here. You can find and download the classifier here.
...
Now we have a data set with cropped faces with dimensions of 224x224. This dataset can also be used for the representative_dataset() for the quantization method mentioned earlier.
Create Embeddings of Each Image in Your Database
For the final part of the preparation, we need to create embedding for each of the 10k facial images. We can use the quantized model we created earlier so that the embeddings are calculated with the same model we will use later for the live steam analysis.
...
- Boot your device.
- Connect a monitor, mouse, and keyboard.
Open a console and get your IP address with the Linux command:
Code Block ip a
- Use ssh to go to your device and copy data to it.
Go into the folder where your files are stored, then use scp command to copy the files to the device. ip_address is the address you retrieved.
Code Block scp -r user@ip_address:./
This will copy the entire folder (due to -r) to the device home device.
Live Stream Analysis
Scroll Title | ||||
---|---|---|---|---|
| ||||
...
- Load the embeddings from JSON
- Load the model
- Load the cascade classifier
- Define the pre-processing function
- Split your data
- Set the video pipeline
Load the Embeddings from JSON
We use pandas to read the CSV, however, pandas are more difficult to implement via Yocto Linux on the embedded system. The JSON library is already implemented:
Code Block | ||
---|---|---|
| ||
1 import json 2 f = open((embeddingpath + embeddingsfile),'r') 3 ImportedData =json.load(f) 4 dataE=[np.array(ImportedData['Embedding'][str(i)]) for i in range(len(ImportedData['Name']))] 5 dataN=[np.array(ImportedData['Name'][str(i)]) for i in range(len(ImportedData['Name']))] 6 dataF=[np.array(ImportedData['File'][str(i)]) for i in range(len(ImportedData['Name']))] |
Read JSON
Load the Model
Code Block | ||
---|---|---|
| ||
1 try: 2 interpreter = tf.lite.Interpreter(model_path) 3 except ValueError as e: 4 print("Error: Modelfile could not be found. Check if you are in the correct workdirectory. Errormessage: " + str(e)) 5 #Depending on the version of TF running, check where lite is set : 6 if tf.__version__.startswith ('1.'): 7 print('lite in dir(tf.contrib)' + str('lite' in dir(tf.contrib))) 8 elif tf.__version__.startswith ('2.'): 9 print('lite in dir(tf)? ' + str('lite' in dir(tf))) 10 11 interpreter.allocate_tensors() 12 13 # Get input and output tensors. 14 input_details = interpreter.get_input_details() 15 output_details = interpreter.get_output_details() |
Load the Model
Point to the Cascade Classifier
Code Block | ||
---|---|---|
| ||
1 #face_cascade = cv2.CascadeClassifier(cascaderpath + 'haarcascade_frontalface_alt.xml') 2 face_cascade = cv2.CascadeClassifier(cascaderpath + 'lbpcascade_frontalface_improved.xml') |
...
We use the local binary pattern (LBP) classifier. If you stay with OpenCV, you can choose the haar-classifier which is better in performance but demands more resources. For an embedded device, we recommend the LBP version. However, optimizing the haar-cascader can also run smoothly on our system.
Set Pre-Processing Function
The pre-processing function that was used by rcmalli used a center pixel-mean algorithm. On your PC, you can import this function. However, if you are planning to run this on an embedded device, we recommend writing the function out so that there is less trouble including the function in your board support package (BSP).
...
Code Block | ||
---|---|---|
| ||
1 def preprocess_input(x, data_format): #Choose version same as in " 2-Create embeddings database.py or jupyter" 2 x_temp = np.copy(x) 3 if data_format is None: 4 data_format = tf.keras.backend.image_data_format() 5 assert data_format in {'channels_last', 'channels_first'} 6 7 if data_format == 'channels_first': 8 x_temp = x_temp[:, ::-1, ...] 9 x_temp[:, 0, :, :] -= 91.4953 10 x_temp[:, 1, :, :] -= 103.8827 11 x_temp[:, 2, :, :] -= 131.0912 12 else: 13 x_temp = x_temp[..., ::-1] 14 x_temp[..., 0] -= 91.4953 15 x_temp[..., 1] -= 103.8827 16 x_temp[..., 2] -= 131.0912 17 18 return x_temp |
Pre-processing
Split Your Data
To split up the comparison of our embeddings vs the celebrity embeddings, we can split the data into x chunks. The embeddings from those threads have to be collected.
Code Block | ||
---|---|---|
| ||
1 def splitDataFrameIntoSmaller(df, chunkSize): 2 listOfDf = list() 3 numberChunks = len(df) // chunkSize + 1 4 for i in range(numberChunks): 5 listOfDf.append(df[i*chunkSize:(i+1)*chunkSize]) 6 return listOfDf 7 8 def faceembedding(YourFace,CelebDaten): 9 Dist=[] 10 for i in range(len(CelebDaten.File)): 11 Celebs=np.array(CelebDaten.Embedding[i]) 12 Dist.append(np.linalg.norm(YourFace-Celebs)) 13 return Dist 14 15 def faceembeddingNP(YourFace,CelebDaten): 16 Dist=[] 17 for i in range(len(CelebDaten)): 18 Celebs=np.array(CelebDaten[i]) 19 Dist.append(np.linalg.norm(YourFace-Celebs)) 20 return Dist 21 22 # Split data for threadding 23 #----------------------------------------------------------------------------- 24 celeb_embeddings=splitDataFrameIntoSmaller(dataE, int(np.ceil(len(dataE)/4))) |
Split Embeddings
Setting the Video Pipeline
If you are not on an embedded system, you can read in your webcam stream via OpenCV.
...
Code Block | ||
---|---|---|
| ||
1 videodev = 'video0' 2 buildinfo = cv2.getBuildInformation() 3 if buildinfo.find("GStreamer") < 0: 4 print('no GStreamer support in OpenCV') 5 exit(0) 6 7 path = os.path.join('/sys/bus/i2c/devices', '2-0010', 'driver') 8 if not os.path.exists(path): 9 return None 10 11 cmd = 'media-ctl -V "31:0[fmt:SGRBG8_1X8/1280x800 (4,4)/1280x800]"' 12 ret = subprocess.call(cmd, shell=True) 13 cmd = 'media-ctl -V "22:0[fmt:SGRBG8_1X8/1280x800]"' 14 ret = subprocess.call(cmd, shell=True) 15 cmd = 'v4l2-ctl -d0 -v width=1280,height=800,pixelformat=GRBG' #set size and format 16 ret = subprocess.call(cmd, shell=True) 17 cmd = 'v4l2-ctl -d0 -c vertical_flip=1' #If the image is flipped. 18 ret = subprocess.call(cmd, shell=True) 19 cmd = 'v4l2-ctl -c horizontal_blanking=2500' 20 ret = subprocess.call(cmd, shell=True) 21 cmd = 'v4l2-ctl -c digital_gain_red=1400' #If needed color corrections 22 ret = subprocess.call(cmd, shell=True) 23 cmd = 'v4l2-ctl -c digital_gain_blue=1700' #If needed color corrections 24 ret = subprocess.call(cmd, shell=True) 25 26 pipeline = 'v4l2src device=/dev/{video} ! appsink'.format(video=videodev) 27 |
Gist File1
Start the Live Stream
Call the Video Pipeline
We can now call the video pipeline with OpenCV and have a constant video stream. As the video stream from the MIPI camera is in the Bayer format, it has to be converted to RGB. You can also use the Gstreamer pipeline. However, we have seen the OpenCV is much faster.
Code Block | ||
---|---|---|
| ||
1 cap = cv2.VideoCapture(pipeline, cv2.CAP_GSTREAMER) 2 3 while(True): 4 # CAPTURE FRAME BY FRAME 5 ret, frame = cap.read() 6 frame=cv2.cvtColor(frame, cv2.COLOR_BAYER_GB2RGB) 7 cv2.namedWindow('frame', cv2.WND_PROP_FULLSCREEN) 8 cv2.setWindowProperty('frame', cv2.WND_PROP_FULLSCREEN, cv2.WINDOW_FULLSCREEN) 9 cv2.imshow('frame', frame) |
Get Live Stream
Find Faces in the Live Stream
Each frame is then analyzed for faces:
Code Block | ||
---|---|---|
| ||
1 #DECTECT FACE IN VIDEO CONTINUOUSLY 2 faces_detected = face_cascade.detectMultiScale(frame, scaleFactor=1.2, minNeighbors=5)#, Size(50,50)) 3 for (x,y,w,h) in faces_detected: 4 rechteck=cv2.rectangle(frame, (x-p, y-p+2), (x+w+p, y+h+p+2), (0, 255, 0), 2) 5 #rechteck=cv2.rectangle(frame, (x-p, y-p+2), (x+int(np.ceil(height))+p, y+int(np.ceil(height))+p+2), (0, 0, 100), 2) 6 cv2.imshow('frame', rechteck) |
Analyzed Faces
After-button Press
Finding and Cropping Middle Faces
As soon as a button is pressed, we start to find the most middle face and use it by cropping it to 224x224:
Code Block | ||
---|---|---|
| ||
1 # DETECT KEY INPUT - ESC OR FIND MOST CENTERED FACE 2 key = cv2.waitKey(1) 3 if key == 27: #Esc key 4 cap.release() 5 cv2.destroyAllWindows() 6 break 7 if key ==32: 8 mittleres_Gesicht_X=() 9 mitte=() 10 if len(faces_detected) !=0: # only if the cascader detected a face, otherwise error 11 start1 = time() 12 #FIND MOST MIDDLE FACE 13 for (x,y,w,h) in faces_detected: 14 mitte=np.append(mitte,(x+w/2)) 15 mittleres_Gesicht_X = (np.abs(mitte - framemitte)).argmin() 16 print('detect middel face ' ,time()-start1) 17 # FRAME THE DETECTED FACE 18 start2=time() 19 #print(faces_detected[mittleres_Gesicht_X]) 20 (x, y, w, h) = faces_detected[mittleres_Gesicht_X] 21 img=frame[y-p+2:y+h+p-2, x-p+2:x+w+p-2] #use only the detected face; crop it +2 to remove frame # CHECK IF IMAGE EMPTY (OUT OF IMAGE = EMPTY) 22 23 if len(img) != 0: # Check if face is out of the frame, then img=[], throwing error 24 print('detect face ',time()-start2) 25 26 # CROP IMAGE 27 start3=time() 28 if img.shape > (width,height): #downsampling 29 img_small=cv2.resize(img, (width, height), interpolation=cv2.INTER_AREA) #resize the image to desired dimensions e.g., 256x256 30 elif img.shape < (width,height): #upsampling 31 img_small=cv2.resize(img, (width, height), interpolation=cv2.INTER_CUBIC) #resize the image to desired dimensions e.g., 256x256 32 cv2.imshow('frame',img_small) 33 cv2.waitKey(1) #hit any key 34 end3=time() 35 print('face crop', end3-start3) |
Face Crop
Further Pre-Processing Found Face
After we crop the face, we have to do the same preprocessing as done on the training data and then feed it to the model to create the embedding:
Code Block | ||
---|---|---|
| ||
1 # IMAGE PREPROCESSING 2 start4=time() 3 if inputtype=='int': 4 samples = np.expand_dims(img_small, axis=0) 5 samples = preprocess_input(samples, data_format=None, version=3).astype('int8')#data_format= None, 'channels_last', 'channels_first' . If None, it is determined automatically from the backend 6 else: 7 pixels = img_small.astype('float32') 8 samples = np.expand_dims( pixels, axis=0) 9 samples = preprocess_input(samples, data_format=None, version=2)#data_format= None, 'channels_last', 'channels_first' . If None, it is determined automatically from the backend 10 #now using the tflight model 11 print('preprocess data for model' , time()-start4) 12 # CREATE FACE EMBEDDINGS 13 if Loadtype=='armNN': 14 prep=time() 15 input_tensors = ann.make_input_tensors([input_binding_info], [samples]) 16 # Get output binding information for an output layer by using the layer name. 17 output_binding_info = parser.GetNetworkOutputBindingInfo(0, 'model/output') 18 output_tensors = ann.make_output_tensors([output_binding_info]) 19 runtime.EnqueueWorkload(0, input_tensors, output_tensors) 20 print('ANN preperation ',time()-prep) 21 start42=time() 22 EMBEDDINGS=ann.workload_tensors_to_ndarray(output_tensors) 23 elif Loadtype=='TL': 24 prep=time() 25 input_shape = input_details[0]['shape'] 26 input_data = samples 27 interpreter.set_tensor(input_details[0]['index'], input_data) 28 interpreter.invoke() 29 print('ANN preperation ',time()-prep) 30 start42=time() 31 EMBEDDINGS = interpreter.get_tensor(output_details[0]['index']) 32 print('create face embeddings' , time()-start42) |
Preprocess 2
Compare Embeddings
With the resulting embeddings, we can now compare them to existing embeddings. We do this using threading. This uses all four cores of our device and speeds up the process. This is one of the most demanding tasks in this demo.
Code Block | ||
---|---|---|
| ||
1 # READ CELEB EMBEDDINGS AND COMPARE 2 start_EU=time() 3 EuDist=[] 4 with concurrent.futures.ThreadPoolExecutor(max_workers=1) as executor: 5 ergebniss_1=executor.submit(faceembeddingNP,EMBEDDINGS,np.array(celeb_embeddings[0])) 6 ergebniss_2=executor.submit(faceembeddingNP,EMBEDDINGS,np.array(celeb_embeddings[1])) 7 ergebniss_3=executor.submit(faceembeddingNP,EMBEDDINGS,np.array(celeb_embeddings[2])) 8 ergebniss_4=executor.submit(faceembeddingNP,EMBEDDINGS,np.array(celeb_embeddings[3])) 9 10 if ergebniss_1.done() & ergebniss_2.done() & ergebniss_3.done() & ergebniss_4.done(): 11 EuDist.extend(ergebniss_1.result()) 12 EuDist.extend(ergebniss_2.result()) 13 EuDist.extend(ergebniss_3.result()) 14 EuDist.extend(ergebniss_4.result()) 15 print('Create_EuDist', time()-start_EU) 16 17 start_Min=time() 18 idx = np.argpartition(EuDist, 5) 19 folder_idx= dataN[idx[0]] 20 image_idx = dataF[idx[0]] 21 print('find minimum for facematch', time()-start_Min) |
Get Minimum
Plot Results
Finally, we stitch our face together with the best matches and plot it. You can also implement a GUI here as well. For simplicity and better understanding, we have done a more "raw" version:
...
This finishes the demo description. On the platform itself, this demo is implemented with a GUI and the use of object-oriented programming. However, the basics are exactly the same.
Porting to Embedded Hardware
If you use a phyBOARD-Pollux kit, the needed software from NXP is already included in the BSP. NXP created eIQ, which facilitates the connection between the onboard NPU and the peripheral components. This is done in core with a tuned google NNAPI, which is capable of understanding TensorFlow lite models and Pytorch models. Therefore, after converting your model to TFlite and quantizing it, eIQ takes over.
If you have created your own AI model, you only have to copy the model and otherwise required files of your application onto the board. Here we suggest using the ssh protocol.
If you want to include your AI application or specific libraries into our BSP using Yocto Linux, that is of course also possible.
...
Quantize Your Deep Learning Model to Run on an NPU
Anchor | ||||
---|---|---|---|---|
|
In this section, we explain which steps you have to take to transform and quantize your model with different TensorFlow versions. We are only looking into post-training quantization. The phyBOARD-Pollux incorporates i.MX 8M Plus, which features a dedicated neural network accelerator IP from VeriSilicon (Vivante VIP8000).
Scroll Title | ||||
---|---|---|---|---|
| ||||
As the neural processing unit (NPU) from NXP needs a fully int8 quantized model, we have to look into the full int8 quantization of a TensorFlow lite or PyTorch model. Both libraries are supported by the eIQ library from NXP. This manual only works with the TensorFlow variant. The general overview of how to do the post-training quantization can be found on the TensorFlow website.
Why does the NPU utilize int8 when most ANNs are trained in float32?
The operations for floating-point are more complex than for integers (arithmetic, avoiding overflow, etc.). This results in the ability to use only the much simpler and smaller arithmetic units instead of the larger floating-point units.
...
- Lower power consumption
- Less heat development
- The ability to join more calculation units decreases inference time
Post-training Quantization with TensorFlow Version 2.x
Tip | ||
---|---|---|
| ||
Before you begin, make sure that you have met all of the L-1015e.Ax phyCORE-i.MX 8M Plus AI Kit Guide Head Prerequisites needed. |
After you have created and trained a model via tf.keras, there are three possible ways of quantizing the model.
Method One - Directly Quantizing a Trained Model
Anchor | ||||
---|---|---|---|---|
|
The trained TensorFlow model has to be converted into a TFlite model and can be directly quantized as described in the following code block. For the trained model, we explicitly used the updated tf.keras_vggface model based on the work or rcmalli. The transformation starts at line 28.
...
Scroll Title | ||||
---|---|---|---|---|
| ||||
However, if we do not set the inference_input_type and inference_output_type, our model changes to:
...
The effect is that you can determine which input data type the model accepts and returns. This can be important if you work with an embedded camera, such as the one included with your phyBOARD-Pollux AI kit. The MIPI camera returns 8bit values. So if you want to spare a conversion to float32, int8 input can be useful. Be aware, if you use a model without prediction layers to gain embeddings, an int8 output will result in very poor performance. We recommend an output of float32. This shows that each problem needs a specific solution.
Method Two and Three - Quantize a Saved Model from *.h5 or *.pb Files
Anchor | ||||
---|---|---|---|---|
|
If you already have your model, you most likely have it saved somewhere either as a Keras .h5 file or a TensorFlow protocol buffer .pb. We can quickly save our model using TF2.3:
...
The conversion and quantization are very similar to Method One. The only difference is how well load the model in with the converter. Either load the model (see code block below) or continue as in Method One.
Code Block | ||
---|---|---|
| ||
1 #Load the h5 model 2 pretrained_model = tf.keras.models.load_model('my_model.h5') 3 #Then use the converter as before 4 converter = tf.lite.TFLiteConverter.from_keras_model(pretrained_model) |
...
Code Block | ||
---|---|---|
| ||
1 # Or load the pb file from the modelfolder 2 # TensorFlow version >2.x 3 4 converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir) # the folder contains the pb file and a assets and variables folder |
Converting from .pb
Converting with TensorFlow Versions Below 2.0
It is possible to convert a model written in TensorFlow version < 1.15.3 using Keras. However, not all options are available for TFlite conversion and quantization. The best way is to save the model with the TensorFlow version it was created in (for example, rcmalli keras-vggface was trained in TF 1.13.2). We suggest not using the "saving and freeze graph" method to create a .pb file as the .pb files differ between TF1 and TF2. The TFLiteConvert.from_saved_model does not work, creating several problems for quantization.
A better method is using Method Two or Three using Keras:
Code Block | ||
---|---|---|
| ||
1 import keras 2 ... 3 pretrained_model.save('my_model.h5') |
Then, convert and quantize your model with a TensorFlow version of 1.15.3 or newer, which has many functions that were added in preparation for TF2. We suggest using the latest version, which will result in the same models that were presented earlier.
Next Steps
PHYTEC provides several guides when it comes to customizing your software:
...
All PHYTEC manuals, as well as other information, can be found at https://www.phytec.de/produkte/development-kits/phyboard-pollux-ki-kit/#downloads/
Further Reading
Anchor | ||||
---|---|---|---|---|
|
How to Perform Face Recognition With VGGFace2 in Keras - Jason Brownlee
...
Tutorials from the tensorflow website.
Revision History
Date | Version Numbers | Changes in this Manual |
---|---|---|
26.11.2020 | Manual L-1015e.A0 | Preliminary Edition |
29.08.2022 | Manual L-1015e.A1 | Upgraded to Regular Version |
...