技术文章

Hudi数据湖技术引领大数据新风口(三)解决spark模块依赖冲突

Maynor996

An editor at Blogzine


  • 2023-07-26
  • 70天前
  • 3866
  • 8 Views
  • 100

解决spark模块依赖冲突

修改了Hive版本为3.1.2,其携带的jetty是0.9.3,hudi本身用的0.9.4,存在依赖冲突。

1)修改hudi-spark-bundle的pom文件,排除低版本jetty,添加hudi指定版本的jetty:

vim /opt/software/hudi-0.12.0/packaging/hudi-spark-bundle/pom.xml

在382行的位置,修改如下(红色部分):

<!-- Hive -->

  <dependency>

   <groupId>${hive.groupid}</groupId>

   <artifactId>hive-service</artifactId>

   <version>${hive.version}</version>

   <scope>${spark.bundle.hive.scope}</scope>

   <exclusions>

​    <exclusion>

​     <artifactId>guava</artifactId>

​     <groupId>com.google.guava</groupId>

​    </exclusion>

​    <exclusion>

​     <groupId>org.eclipse.jetty</groupId>

​     <artifactId>*</artifactId>

​    </exclusion>

​    <exclusion>

​     <groupId>org.pentaho</groupId>

​     <artifactId>*</artifactId>

​    </exclusion>

   </exclusions>

  </dependency>

 

  <dependency>

   <groupId>${hive.groupid}</groupId>

   <artifactId>hive-service-rpc</artifactId>

   <version>${hive.version}</version>

   <scope>${spark.bundle.hive.scope}</scope>

  </dependency>

 

  <dependency>

   <groupId>${hive.groupid}</groupId>

   <artifactId>hive-jdbc</artifactId>

   <version>${hive.version}</version>

   <scope>${spark.bundle.hive.scope}</scope>

   <exclusions>

​    <exclusion>

​     <groupId>javax.servlet</groupId>

​     <artifactId>*</artifactId>

​    </exclusion>

​    <exclusion>

​     <groupId>javax.servlet.jsp</groupId>

​     <artifactId>*</artifactId>

​    </exclusion>

​    <exclusion>

​     <groupId>org.eclipse.jetty</groupId>

​     <artifactId>*</artifactId>

​    </exclusion>

   </exclusions>

  </dependency>

 

  <dependency>

   <groupId>${hive.groupid}</groupId>

   <artifactId>hive-metastore</artifactId>

   <version>${hive.version}</version>

   <scope>${spark.bundle.hive.scope}</scope>

   <exclusions>

​    <exclusion>

​     <groupId>javax.servlet</groupId>

​     <artifactId>*</artifactId>

​    </exclusion>

​    <exclusion>

​     <groupId>org.datanucleus</groupId>

​     <artifactId>datanucleus-core</artifactId>

​    </exclusion>

​    <exclusion>

​     <groupId>javax.servlet.jsp</groupId>

​     <artifactId>*</artifactId>

​    </exclusion>

​    <exclusion>

​     <artifactId>guava</artifactId>

​     <groupId>com.google.guava</groupId>

​    </exclusion>

   </exclusions>

  </dependency>

 

  <dependency>

   <groupId>${hive.groupid}</groupId>

   <artifactId>hive-common</artifactId>

   <version>${hive.version}</version>

   <scope>${spark.bundle.hive.scope}</scope>

   <exclusions>

​    <exclusion>

​     <groupId>org.eclipse.jetty.orbit</groupId>

​     <artifactId>javax.servlet</artifactId>

​    </exclusion>

​    <exclusion>

​     <groupId>org.eclipse.jetty</groupId>

​     <artifactId>*</artifactId>

​    </exclusion>

   </exclusions>

</dependency>

 

  <!-- 增加hudi配置版本的jetty -->

  <dependency>

   <groupId>org.eclipse.jetty</groupId>

   <artifactId>jetty-server</artifactId>

   <version>${jetty.version}</version>

  </dependency>

  <dependency>

   <groupId>org.eclipse.jetty</groupId>

   <artifactId>jetty-util</artifactId>

   <version>${jetty.version}</version>

  </dependency>

  <dependency>

   <groupId>org.eclipse.jetty</groupId>

   <artifactId>jetty-webapp</artifactId>

   <version>${jetty.version}</version>

  </dependency>

  <dependency>

   <groupId>org.eclipse.jetty</groupId>

   <artifactId>jetty-http</artifactId>

   <version>${jetty.version}</version>

  </dependency>

否则在使用spark向hudi表插入数据时,会报错如下:

java.lang.NoSuchMethodError: org.apache.hudi.org.apache.jetty.server.session.SessionHandler.setHttpOnly(Z)V

img

2)修改hudi-utilities-bundle的pom文件,排除低版本jetty,添加hudi指定版本的jetty:

vim /opt/software/hudi-0.12.0/packaging/hudi-utilities-bundle/pom.xml

在405行的位置,修改如下(红色部分):

  <!-- Hoodie -->

  <dependency>

   <groupId>org.apache.hudi</groupId>

   <artifactId>hudi-common</artifactId>

   <version>${project.version}</version>

   <exclusions>

​    <exclusion>

​     <groupId>org.eclipse.jetty</groupId>

​     <artifactId>*</artifactId>

​    </exclusion>

   </exclusions>

  </dependency>

  <dependency>

   <groupId>org.apache.hudi</groupId>

   <artifactId>hudi-client-common</artifactId>

   <version>${project.version}</version>

   <exclusions>

​    <exclusion>

​     <groupId>org.eclipse.jetty</groupId>

​     <artifactId>*</artifactId>

​    </exclusion>

   </exclusions>

  </dependency>

 

 

<!-- Hive -->

  <dependency>

   <groupId>${hive.groupid}</groupId>

   <artifactId>hive-service</artifactId>

   <version>${hive.version}</version>

   <scope>${utilities.bundle.hive.scope}</scope>

   <exclusions>

​		<exclusion>

​     <artifactId>servlet-api</artifactId>

​     <groupId>javax.servlet</groupId>

​    </exclusion>

​    <exclusion>

​     <artifactId>guava</artifactId>

​     <groupId>com.google.guava</groupId>

​    </exclusion>

​    <exclusion>

​     <groupId>org.eclipse.jetty</groupId>

​     <artifactId>*</artifactId>

​    </exclusion>

​    <exclusion>

​     <groupId>org.pentaho</groupId>

​     <artifactId>*</artifactId>

​    </exclusion>

   </exclusions>

  </dependency>

 

  <dependency>

   <groupId>${hive.groupid}</groupId>

   <artifactId>hive-service-rpc</artifactId>

   <version>${hive.version}</version>

   <scope>${utilities.bundle.hive.scope}</scope>

  </dependency>

 

  <dependency>

   <groupId>${hive.groupid}</groupId>

   <artifactId>hive-jdbc</artifactId>

   <version>${hive.version}</version>

   <scope>${utilities.bundle.hive.scope}</scope>

   <exclusions>

​    <exclusion>

​     <groupId>javax.servlet</groupId>

​     <artifactId>*</artifactId>

​    </exclusion>

​    <exclusion>

​     <groupId>javax.servlet.jsp</groupId>

​     <artifactId>*</artifactId>

​    </exclusion>

​    <exclusion>

​     <groupId>org.eclipse.jetty</groupId>

​     <artifactId>*</artifactId>

​    </exclusion>

   </exclusions>

  </dependency>

 

  <dependency>

   <groupId>${hive.groupid}</groupId>

   <artifactId>hive-metastore</artifactId>

   <version>${hive.version}</version>

   <scope>${utilities.bundle.hive.scope}</scope>

   <exclusions>

​    <exclusion>

​     <groupId>javax.servlet</groupId>

​     <artifactId>*</artifactId>

​    </exclusion>

​    <exclusion>

​     <groupId>org.datanucleus</groupId>

​     <artifactId>datanucleus-core</artifactId>

​    </exclusion>

​    <exclusion>

​     <groupId>javax.servlet.jsp</groupId>

​     <artifactId>*</artifactId>

​    </exclusion>

​    <exclusion>

​     <artifactId>guava</artifactId>

​     <groupId>com.google.guava</groupId>

​    </exclusion>

   </exclusions>

  </dependency>

 

  <dependency>

   <groupId>${hive.groupid}</groupId>

   <artifactId>hive-common</artifactId>

   <version>${hive.version}</version>

   <scope>${utilities.bundle.hive.scope}</scope>

   <exclusions>

​    <exclusion>

​     <groupId>org.eclipse.jetty.orbit</groupId>

​     <artifactId>javax.servlet</artifactId>

​    </exclusion>

​    <exclusion>

​     <groupId>org.eclipse.jetty</groupId>

​     <artifactId>*</artifactId>

​    </exclusion>

   </exclusions>

</dependency>

 

  <!-- 增加hudi配置版本的jetty -->

  <dependency>

   <groupId>org.eclipse.jetty</groupId>

   <artifactId>jetty-server</artifactId>

   <version>${jetty.version}</version>

  </dependency>

  <dependency>

   <groupId>org.eclipse.jetty</groupId>

   <artifactId>jetty-util</artifactId>

   <version>${jetty.version}</version>

  </dependency>

  <dependency>

   <groupId>org.eclipse.jetty</groupId>

   <artifactId>jetty-webapp</artifactId>

   <version>${jetty.version}</version>

  </dependency>

  <dependency>

   <groupId>org.eclipse.jetty</groupId>

   <artifactId>jetty-http</artifactId>

   <version>${jetty.version}</version>

  </dependency>

否则在使用DeltaStreamer工具向hudi表插入数据时,也会报Jetty的错误。

2.2.6 执行编译命令

mvn clean package -DskipTests -Dspark3.2 -Dflink1.13 -Dscala-2.12 -Dhadoop.version=3.1.3 -Pflink-bundle-shade-hive3

2.2.7 编译成功

编译成功后,进入hudi-cli说明成功:

img

img

编译完成后,相关的包在packaging目录的各个模块中:

img

比如,flink与hudi的包:

img

下一章 核心概念

后记

📢博客主页:https://manor.blog.csdn.net

📢欢迎点赞 👍 收藏 ⭐留言 📝 如有错误敬请指正!
📢本文由 Maynor 原创,首发于 CSDN博客🙉
📢不能老盯着手机屏幕,要不时地抬起头,看看老板的位置⭐
📢数据湖专栏持续更新,欢迎订阅:https://blog.csdn.net/xianyu120/category_12388063.html

版权声明:

本文为[Maynor996]所创,转载请带上原文链接,感谢

https://blog.csdn.net/xianyu120/article/details/131910413


评论数 0



留下回复

如果您是个网络喷子或者键盘侠,那么建议您多看少说。