¥½¥Õ¥Æ¥Ã¥¯¡¦¥È¥Ã¥×¥Ú¡¼¥¸¤Ø
¥Û¡¼¥à À½ÉÊ ¥»¥­¥å¥ê¥Æ¥£¡¦¥µ¡¼¥Ó¥¹ HPC¥µ¡¼¥Ó¥¹ ¥À¥¦¥ó¥í¡¼¥É ´ë¶È¾ðÊó

PGI compiler TIPS
AMD64 ¤È EM64T ¤ÎÆâÉôºÇŬ²½¤Ï°Û¤Ê¤ë¡©
PGIÀ­Ç½¾ðÊó > AMD64 vs EM64TºÇŬ²½

¡¡Á´¤Æ¤Î¥³¥ó¥Ñ¥¤¥ë¡¦¥ª¥×¥·¥ç¥ó

¸ß´¹À­¤Ï¤¢¤ë¤¬ºÇŬ²½¼êË¡¤¬°Û¤Ê¤ë¡¢AMD64 CPU ¤È EM64T CPU

(2005ǯ11·î19Æü¡Ë

¥¤¥ó¥Æ¥ë¼Ò¤ÎEM64TÂбþ¥×¥í¥»¥Ã¥µ¤Ï¡¢AMD ¼Ò¤Î AMD64 (Opteron, Athlon64) ¥×¥í¥»¥Ã¥µ¤È¡¢AMD64-ABI ¤Ë¤è¤Ã¤ÆCPU ¥¤¥ó¥¹¥È¥é¥¯¥·¥ç¥ó¤È¸À¤¦´ÑÅÀ¤Ç¤Ï¸ß´¹À­¤¬¤¢¤ê¤Þ¤¹¤¬¡¢³Æ¥×¥í¥»¥Ã¥µ¤Î¥Þ¥¤¥¯¥í¡¦¥¢¡¼¥­¥Æ¥¯¥Á¥ã¾å¤Ë¤ª¤±¤ë°ã¤¤¤Ë¤è¤ê¡¢¤½¤ÎºÇŬ²½¼êË¡¡ÊÆÃ¤Ë¡¢¥á¥â¥ê¥¢¥¯¥»¥¹¼þ¤ê¡Ë¤¬°Û¤Ê¤ê¡¢¤½¤ÎCPUÆÃÀ­¤Ë¨¤·¤¿ºÇŬ²½¤ò¹Ô¤ï¤Ê¤±¤ì¤Ð¡¢À­Ç½¤¬Äã²¼¤¹¤ë¤³¤È¤¬¤¢¤ê¤Þ¤¹¡£PGI ¥³¥ó¥Ñ¥¤¥é¤Ï¡¢AMD64ʤӤËEM64T¥×¥í¥»¥Ã¥µ¤Î¤½¤ì¤¾¤ì¤Î¥Þ¥¤¥¯¥í¥¢¡¼¥­¥Æ¥¯¥Á¥ã¤Ë±þ¤¸¤¿ºÇŬ²½¤¬¹Ô¤¨¤ë¾¦ÍÑ¥³¥ó¥Ñ¥¤¥é¤Ç¤¹¡£¾¦ÍÑ¥³¥ó¥Ñ¥¤¥é¤Ï¡¢¸ß´¹À­¤À¤±¤Ç¤Ï¤Ê¤¯¤½¤ÎÀ­Ç½¤âºÇŬ²½¤¹¤ë¤³¤È¤¬É¬ÍפǤ¹¤¬¡¢Â¾¤Î¿¤¯¤Î¥³¥ó¥Ñ¥¤¥é¤Ï¡¢¤½¤ÎÅÀ¤òÌÀ³Î¤Ëëð¤Ã¤Æ¤¤¤Þ¤»¤ó¡£¥×¥í¥»¥Ã¥µ¤Ë±þ¤¸¤¿¿¿¤ÎÀ­Ç½¤òµý¼õ¤¹¤ë¤Ë¤Ï¡¢AMD64·Á¼°¤Î¼Â¹Ô¥Ð¥¤¥Ê¥ê¤È¸À¤¦¤À¤±¤Ç¤Ï¤Ê¤¯¡¢¤³¤ì¤¬ EM64T ÍѤ«¡¢AMD64 ÍѤ«¤ò¶èÊ̤·¤Æ»ÈÍѤ¹¤ë¤³¤È¤¬É¬ÍפǤ¹¡£¤·¤«¤·¡¢¤³¤ì¤Ï¡¢ISV ¥Ù¥ó¥À¡¼¤Î³«È¯¼Ô¡¢¤½¤ì¤òÍøÍѤ¹¤ë¥æ¡¼¥¶¤Ë¤È¤Ã¤ÆÂ礭¤ÊÉéô¤È¤Ê¤ë¤³¤È¤ÏÌÀ¤é¤«¤Ç¤¹¡£¤³¤ÎÌäÂê¤ò²ò·è¤¹¤ë¤¿¤á¤Ë¡¢2006 ǯ 1 ·î¤Ë¥ê¥ê¡¼¥¹¤·¤¿¡¢PGI ¥Ð¡¼¥¸¥ç¥ó 6.1 ¤è¤ê¡¢AMD64¤ÈEM64T´Ö¤Î¡ÖPGI Unified Bynary support¡×¤ò¶È³¦¤Ç½é¤á¤Æ¼Â¸½¤·¤Þ¤·¤¿¡£Unified BinaryTM ¤È¤Ï¡¢¤É¤Á¤é¤Î¥×¥é¥Ã¥È¥Õ¥©¡¼¥à¤ËÂФ·¤Æ¤âÀ­Ç½¥Ú¥Ê¥ë¥Æ¥£¤¬¤Ê¤¤·Á¤ÇºÇŬ²½¤ò¹Ô¤¤¡¢Ã±°ì¤Î¼Â¹Ô¥â¥¸¥å¡¼¥ë¤òÀ¸À®¤¹¤ëµ¡Ç½¤Ç¤¹¡£¥×¥í¥°¥é¥à³«È¯¤Ë¤ª¤¤¤Æ¤Ï¡¢¥×¥í¥°¥é¥à¤ÎÀ­Ç½¤¬¤É¤Á¤é¤Î¥×¥é¥Ã¥È¥Õ¥©¡¼¥à¤Ë¤âÆ©²áŪ¤Ë°Ý»ý¤Ç¤­¤ë¤³¤È¤Ç¡¢ºÇŬ²½¤Î¤¿¤á¤ËÈñ¤ä¤¹¥³¥¹¥È¤¬·Ú¸º¤µ¤ì¤Þ¤¹¡£
¤³¤³¤Ç¤Ï¡¢PGI 6.0 ¥³¥ó¥Ñ¥¤¥é¤òÍѤ¤¤Æ¡¢AMD64 ÍѥХ¤¥Ê¥ê¤È EM64T ¥Ð¥¤¥Ê¥ê¤ÎÀ­Ç½¤¬°Û¤Ê¤ë¤³¤È¤ò¼Â¾Ú¤·¡¢PGI ¤ÏŪ³Î¤Ë¤½¤ì¤¾¤ì¤Î¥×¥í¥»¥Ã¥µÍѤ˺ÇŬ²½¤·¤Æ¤¤¤ë¤³¤È¤ò¼¨¤·¤Þ¤¹¡£

PGI ¥³¥ó¥Ñ¥¤¥é¤Ï¡¢AMD64/EM64T¤Î¥×¥í¥»¥Ã¥µÆÃÀ­¤Ë±þ¤¸¤¿ºÇŬ²½¤À¤±¤Ç¤Ê¤¯¡¢AMD ¼Ò¤Î NUMA ¥¢¡¼¥­¥Æ¥¯¥Á¥ã¤Ë¤â¡ÖºÇŬ²½¡×¤¹¤ëµ¡Ç½¤òÍ­¤·¡¢¥¤¥ó¥Æ¥ë¼Ò¤Î½¾Íè¤Î UMA ¥¢¡¼¥­¥Æ¥¯¥Á¥ã¤Ë¤âºÇŬ²½²Äǽ¤Ê¥³¥ó¥Ñ¥¤¥é¤Ç¤¹¡£´ØÏ¢¥ê¥ó¥¯


¡¡¤´»²¹Í¡¡¡§¡¡Intel(R) ¥Ç¥å¥¢¥ëPentium(R) D¥×¥í¥»¥Ã¥µ¡¦¥·¥¹¥Æ¥à¤Ç¤âPGI¥³¥ó¥Ñ¥¤¥é¤Î¹â®À­¤¬¼Â¾Ú¤µ¤ì¤ë !!
¡¡ ¡¡¡¡¡¡¡¡ ¡§¡¡AMD ¥Ç¥å¥¢¥ë Athlon64 X2 ¥×¥í¥»¥Ã¥µ¡¦¥·¥¹¥Æ¥à¤òPGI ¥³¥ó¥Ñ¥¤¥é¤Çɾ²Á¤¹¤ë¡ª


PGI 6.0 ¤Î¥¯¥í¥¹¥³¥ó¥Ñ¥¤¥ëµ¡Ç½¤Ç¸¡¾Ú


¥³¥ó¥Ñ¥¤¥é¤ÎºÇŬ²½¤Ï¡¢¥Ç¥Õ¥©¥ë¥È¤Ç¤Ï¡¢¥³¥ó¥Ñ¥¤¥ë¤ò¹Ô¤¦¥·¥¹¥Æ¥à¤Î CPU ¥¿¥¤¥×¤Ë±þ¤¸¤¿ºÇŬ²½¤ò¹Ô¤¤¤Þ¤¹¡£Î㤨¤Ð¡¢Opteron¡ÊAMD64)¡¡¾å¤Ç¡¢¥³¥ó¥Ñ¥¤¥ë¤µ¤ì¤¿¥â¥¸¥å¡¼¥ë¤Ï¡¢ AMD64 CPU ÍѤ˺ÇŬ²½¤µ¤ì¤Æ¤¤¤Þ¤¹¡£¤³¤Î¥â¥¸¥å¡¼¥ë¤òAMD64 ¤Ë¸ß´¹À­¤Î¤¢¤ë¥¤¥ó¥Æ¥ë¼Ò¤Î¡¡EM64T CPU¡ÊPentium,¡¡Xeon) ¾å¤Ç¼Â¹Ô¤·¤¿¾ì¹ç¡¢À­Ç½¤ËÍ¿¤¨¤ë±Æ¶Á¤¬¤É¤ÎÄøÅÙ¤¢¤ë¤«¤ò¸«¤ë¤³¤È¤Ç¡¢CPU ¤Ë°Í¸¤¹¤ëºÇŬ²½¤¬¹Ô¤ï¤ì¤Æ¤¤¤ë¤«¤É¤¦¤«¤òÍý²ò¤Ç¤­¤Þ¤¹¡£¤Þ¤¿¡¢¤½¤ÎµÕ¤Î¥Ñ¥¿¡¼¥ó¤âƱÍͤǤ¹¡£
°Ê²¼¤Ë¼¨¤¹Îã¤Ï¡¢¤³¤Î¤è¤¦¤Ê¾ì¹ç¤ÎÀ­Ç½¤Î°ã¤¤¤òÍý²ò¤¤¤¿¤À¤¯¤¿¤á¤Î¤â¤Î¤Ç¤¹¡£¤³¤ì¤Ë¤è¤Ã¤Æ¡¢PGI ¥³¥ó¥Ñ¥¤¥é¤Ï³ÆCPU¤Ë±þ¤¸¤¿ºÇŬ²½¤ò¹Ô¤Ã¤Æ¤¤¤ë¤³¤È¤¬Íý²ò¤Ç¤­¤ë¤â¤Î¤È»×¤¤¤Þ¤¹¡£¤Þ¤º¡¢¥³¥ó¥Ñ¥¤¥ë¤ÎÊýË¡¤ò°Ê²¼¤Ë¼¨¤·¤Þ¤¹¡£

¡Ú-tp ¥ª¥×¥·¥ç¥ó¤òÉÕ¤±¤Ê¤¤¥Ç¥Õ¥©¥ë¥È»þ¡Û 

¡¡¡¡pgf95 -fastsse -Minfo xx.f¡Ê¥³¥ó¥Ñ¥¤¥ë»þ¤Ë»ÈÍѤ¹¤ëCPU¤¬¥Ç¥Õ¥©¥ë¥È¤ÎºÇŬ²½¥¿¡¼¥²¥Ã¥È¤È¤Ê¤ë¡Ë

¡Ú°Û¤Ê¤ëCPU ÍѤ˺ÇŬ²½¤¹¤ë¥¯¥í¥¹¥³¥ó¥Ñ¥¤¥ë¤ÎÎã¡Û

   pgf95 -fastsse -Minfo -tp p7-64 xx.f (-tp ¥ª¥×¥·¥ç¥ó¤Ç ¥¤¥ó¥Æ¥ë EM64T ÍѤ˺ÇŬ²½¥³¡¼¥ÉÀ¸À®)
  • AMD64 (Opteron¡Ë¾å¤Ç¡¢-tp p7-64 ¤òÉղ䷤ƥ³¥ó¥Ñ¥¤¥ë¤¹¤ë¤È¡¢EM64T ÍѤ˺ÇŬ²½¤µ¤ì¤¿¥³¡¼¥É¤¬À¸À®¤µ¤ì¤Þ¤¹¡£µÕ¤Ë¡¢ Pentium(R) 4/Xeon(R) EM64T ¾å¤Ç¡¢-tp k8-64 ¤òÉղ䷤ƥ³¥ó¥Ñ¥¤¥ë¤¹¤ë¤È AMD64 ÍѤΥ³¡¼¥É¤¬¤Ç¤­¤Þ¤¹¡£

°Ê²¼¤ÎÆó¤Ä¤Î¥Þ¥·¥ó¾å¤Ç¡¢¥Þ¥·¥ó¤È°Û¤Ê¤ë CPU ¥¿¥¤¥×¤Î¼Â¹Ô¥â¥¸¥å¡¼¥ë¤òºîÀ®¤·¡¢¤½¤Î¼Â¹Ô»þ´Ö¤òÈæ³Ó¤·¤Æ¤ß¤Þ¤¹¡£¤³¤³¤Ç¡¢»ÈÍѤ·¤¿¥×¥í¥°¥é¥àÂêºà¤Ï¡¢¥á¥â¥êÂÓ°è¤ò¥Õ¥ë¤ËɬÍפȤ¹¤ë¡ÖɱÌî¥Ù¥ó¥Á¥Þ¡¼¥¯¡×¤ò»ÈÍѤ·¤Æ¡¢¤½¤ÎÀ­Ç½¤¬¤É¤ÎÄøÅٰۤʤ뤫¤ò¸¡¾Ú¤·¤Þ¤¹¡£

­¡¡¡AMD64 ¥Þ¥·¥ó¡§ Athlon64x2 (2.2GHz)¡¡+¡¡Äã® PC3200¥á¥â¥ê
­¢¡¡EM64T ¥Þ¥·¥ó¡§ Pentium D  (2.8GHz)  +  ¹â® DDR2-667(PC2-5300) ¥á¥â¥ê

 »ÈÍÑ OS ¤Ï¡¢¶¦¤Ë SUSE10.0 (kernel 2.6.13)

­¡ AMD64 ¥Þ¥·¥ó¾å¤Ç¸¡¾Ú¤¹¤ë

¡¡¡¡AMD64 ¥Þ¥·¥ó¾å¤Ç¡¢¤½¤ì¤È¤Ï°Û¤Ê¤ë EM64T ºÇŬ²½¥³¡¼¥É¤ò¼Â¹Ô¤¹¤ë¤È¤½¤ÎÀ­Ç½¤Ï¡¢Â¿¾¯Îô²½¤¹¤ë¤³¤È¤¬Ê¬¤«¤ê¤Þ¤¹¡£

¡ÚAMD64 ¥³¡¼¥É¤òÀ¸À®¤·¡¢¼Â¹Ô¡Û

amd64 $ pgf95 -fastsse -O3 -Mprefetch=distance:8,nta -Minfo himenoBMTxp_s.f
amd64 $ ./a.out
  mimax=          129  mjmax=           65  mkmax=           65
  imax=          128  jmax=           64  kmax=           64
Loop executed for 4500 times Gosa : 2.3161024E-06 MFLOPS: 1306.499 time(s): 56.72000 ¡ÚEM64T ¥³¡¼¥É¤òÀ¸À®¡¢¼Â¹Ô¡Û amd64 $ pgf95 -fastsse -O3 -Mprefetch=distance:8,nta -Minfo -tp p7-64 himenoBMTxp_s.f amd64 $ ./a.out mimax= 129 mjmax= 65 mkmax= 65 imax= 128 jmax= 64 kmax= 64 Loop executed for 4500 times Gosa : 2.3161024E-06 MFLOPS: 1223.252 time(s): 60.58000

­¢ EM64T ¥Þ¥·¥ó¾å¤Ç¸¡¾Ú¤¹¤ë

¡¡¡¡EM64T ¥Þ¥·¥ó¾å¤Ç¡¢¤½¤ì¤È¤Ï°Û¤Ê¤ë AMD64 ºÇŬ²½¥³¡¼¥É¤ò¼Â¹Ô¤¹¤ë¤È¤½¤ÎÀ­Ç½¤Ï¡¢Îô²½¤¹¤ë¤³¤È¤¬Ê¬¤«¤ê¤Þ¤¹¡£

¡ÚEM64T ¥³¡¼¥É¤òÀ¸À®¡¢¼Â¹Ô¡Û

em64t $ pgf95 -fastsse -O3 -Mprefetch=distance:8,nta -Minfo himenoBMTxp_s.f
em64t $ ./a.out
  mimax=          129  mjmax=           65  mkmax=           65
  imax=          128  jmax=           64  kmax=           64
Loop executed for 4500 times Gosa : 2.3161024E-06 MFLOPS: 1523.220 time(s): 48.65000 ¡ÚAMD64 ¥³¡¼¥É¤òÀ¸À®¡¢¼Â¹Ô¡Û em64t $ pgf95 -fastsse -O3 -Mprefetch=distance:8,nta -Minfo -tp k8-64 himenoBMTxp_s.f em64t $ ./a.out mimax= 129 mjmax= 65 mkmax= 65 imax= 128 jmax= 64 kmax= 64 Loop executed for 4500 times Gosa : 2.3161024E-06 MFLOPS: 1343.692 time(s): 55.15000

°Ê²¼¤Î¥ê¥¹¥È¤Ï¡¢³ÆCPUÍѤ˰ۤʤëºÇŬ²½¤ò¹Ô¤Ã¤¿ºÝ¤Î¥¢¥»¥ó¥Ö¥é½ÐÎϤÎÈ´¿è¤Ç¤¢¤ë¡£¤³¤ÎÈæ³Ó¤Ë¤è¤Ã¤Æ¡¢³ÆCPU¤Î¥á¥â¥ê¡¦¥¢¡¼¥­¥Æ¥¯¥Á¥ã¤â´Þ¤á¤¿ÆÃÀ­¤Ë±þ¤¸¤Æ¡¢ºÇŬ²½¤¬¤Ê¤µ¤ì¤Æ¤¤¤ë¤³¤È¤¬Ê¬¤«¤ê¤Þ¤¹¡£

¡¡¡¡¡¡¡¡¡¡¡¡¡ÚAMD64 À¸À®¥³¡¼¥É¡Û¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡ÚEM64T À¸À®¥³¡¼¥É¡Û
¡¡¡¡¡¡¡¡¡¡
# lineno: 206                                              # lineno: 206

   prefetcht0      256(%rdi,%rcx)                   |         movups  516(%rsi,%rdx), %xmm2
   prefetcht0      2180356(%rdi,%rcx)               |         movups  -524(%rsi,%rdx), %xmm3
   prefetcht0      4360456(%rdi,%rcx)               |         movups  2180100(%rdi,%rdx), %xmm4
   prefetcht0      8720656(%rdi,%rcx)               |         movups  512(%rsi,%rdx), %xmm5
   prefetcht0      764(%rsi,%rcx)                   <
   prefetcht0      -268(%rsi,%rcx)                  <
   prefetcht0      10900756(%rdi,%rcx)              <
   prefetcht0      -32772(%rsi,%rcx)                <
   movlps  516(%rsi,%rcx), %xmm2                    <
   movlps  -524(%rsi,%rcx), %xmm3                   <
   movlps  2180100(%rdi,%rcx), %xmm4                <
   movlps  512(%rsi,%rcx), %xmm5                    <
   subl    $8, %eax                                           subl    $8, %eax
   movhps  524(%rsi,%rcx), %xmm2                    |         subps   -516(%rsi,%rdx), %xmm2
   movhps  -516(%rsi,%rcx), %xmm3                   <
   movhps  2180108(%rdi,%rcx), %xmm4                <
   movhps  520(%rsi,%rcx), %xmm5                    <
   subps   -516(%rsi,%rcx), %xmm2                   <
   mulps   %xmm4, %xmm5                                       mulps   %xmm4, %xmm5
   subps   508(%rsi,%rcx), %xmm2                    |         subps   508(%rsi,%rdx), %xmm2
   movlps  4360200(%rdi,%rcx), %xmm4                |         movups  4360200(%rdi,%rdx), %xmm4
   addps   %xmm2, %xmm3                                       addps   %xmm2, %xmm3
   movhps  4360208(%rdi,%rcx), %xmm4                |         movups  (%rsi,%rdx), %xmm2
   movlps  (%rsi,%rcx), %xmm2                       |         mulps   8720400(%rdi,%rdx), %xmm3
   mulps   8720400(%rdi,%rcx), %xmm3                |         mulps   (%rdi,%rdx), %xmm2

PGI ¥³¥ó¥Ñ¥¤¥é¤Ï¡¢AMD64/EM64T¤Î¥×¥í¥»¥Ã¥µÆÃÀ­¤Ë±þ¤¸¤¿ºÇŬ²½¤À¤±¤Ç¤Ê¤¯¡¢AMD ¼Ò¤Î NUMA ¥¢¡¼¥­¥Æ¥¯¥Á¥ã¤Ë¤â¡ÖºÇŬ²½¡×¤¹¤ëµ¡Ç½¤òÍ­¤·¡¢¥¤¥ó¥Æ¥ë¼Ò¤Î½¾Íè¤Î UMA ¥¢¡¼¥­¥Æ¥¯¥Á¥ã¤Ë¤âºÇŬ²½²Äǽ¤Ê¥³¥ó¥Ñ¥¤¥é¤Ç¤¹¡£´ØÏ¢¥ê¥ó¥¯





<<¡¡Ìá¤ë


¡¡¥½¥Õ¥Æ¥Ã¥¯¤Ï¡¢PGI À½ÉʤθøÇ§Àµµ¬ÂåÍýŹ¤Ç¤¹

¥µ¥¤¥È¥Þ¥Ã¥× ¤ªÌä¹ç¤»
Copyright 2006 SofTek Systems Inc. All Rights Reserved.