7.FLINK Source\基于集合\基于文件\基于Socket\自定义Source--随机订单数量\自定义Source\自定义Source-MySQL

发布时间:2023-10-28 13:00

7.Source
7.1.基于集合
7.2.基于文件
7.3.基于Socket
7.4.自定义Source–随机订单数量
7.4.1.自定义Source
7.5.自定义Source-MySQL

7.Source

7.FLINK Source\基于集合\基于文件\基于Socket\自定义Source--随机订单数量\自定义Source\自定义Source-MySQL_第1张图片

7.1.基于集合

基于集合的Source
一般用于学习测试时编造数据时使用。
env.fromElements(可变参数);
env.fromCollection(各种集合);
env.generateSequence(开始,结束);
env.fromSequence(开始,结束);

import org.apache.flink.api.common.RuntimeExecutionMode;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;

import java.util.Arrays;

/**
 * TODO
 *
 * @author tuzuoquan
 * @date 2022/4/1 21:52
 */
public class SourceDemo01_Collection {

    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setRuntimeMode(RuntimeExecutionMode.AUTOMATIC);

        //TODO 1.source
        DataStream<String> ds1 = env.fromElements("hadoop spark flink", "hadoop spark flink");
        DataStream<String> ds2 = env.fromCollection(Arrays.asList("hadoop spark flink", "hadoop spark flink"));
        DataStream<Long> ds3 = env.generateSequence(1, 100);
        DataStream<Long> ds4 = env.fromSequence(1, 100);

        //TODO 2.transformation

        //TODO 3.sink
        ds1.print();
        ds2.print();
        ds3.print();
        ds4.print();

        //TODO 4.execute
        env.execute();

    }

}

7.2.基于文件

一般用于学习测试时编造数据时使用
env.readTextFile(本地/HDFS文件/文件夹); //压缩文件也可以

public class SourceDemo02_File {
    public static void main(String[] args) throws Exception {
        //TODO 0.env
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setRuntimeMode(RuntimeExecutionMode.AUTOMATIC);

        //TODO 1.source
        DataStream<String> ds1 = env.readTextFile("data/input/words.txt");
        DataStream<String> ds2 = env.readTextFile("data/input/dir");
        DataStream<String> ds3 = env.readTextFile("data/input/wordcount.txt.gz");


        //TODO 2.transformation

        //TODO 3.sink
        ds1.print();
        ds2.print();
        ds3.print();

        //TODO 4.execute
        env.execute();
    }
}

7.3.基于Socket

需求:
1.在node1上使用nc -lk 9999向指定端口发送数据
nc是netcat的简称,原本是用来设置路由器,我们可以利用它向某个端口发送数据。
如果没有该命令可以下安装

yum install -y nc

2.使用Flink编写流处理应用程序实时统计单词数量

import org.apache.flink.api.common.RuntimeExecutionMode;
import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.util.Collector;

/**
 * @author tuzuoquan
 * @date 2022/4/1 23:49
 */
public class SourceDemo03_Socket {

    public static void main(String[] args) throws Exception {
        //TODO 0.env
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setRuntimeMode(RuntimeExecutionMode.AUTOMATIC);

        //TODO 1.source
        DataStream<String> line = env.socketTextStream("node1", 9999);

        //TODO 2.transformation
        /*SingleOutputStreamOperator words = lines.flatMap(new FlatMapFunction() {
            @Override
            public void flatMap(String value, Collector out) throws Exception {
                String[] arr = value.split(" ");
                for (String word : arr) {
                    out.collect(word);
                }
            }
        });

        words.map(new MapFunction>() {
            @Override
            public Tuple2 map(String value) throws Exception {
                return Tuple2.of(value,1);
            }
        });*/

        //注意:下面的操作将上面的2步合成了1步,直接切割单词并记为1返回
        SingleOutputStreamOperator<Tuple2<String,Integer>> wordAndOne =
                line.flatMap(new FlatMapFunction<String, Tuple2<String, Integer>>() {
                    @Override
                    public void flatMap(String value, Collector<Tuple2<String, Integer>> out) throws Exception {
                        String[] arr = value.split(" ");
                        for (String word : arr) {
                            out.collect(Tuple2.of(word, 1));
                        }
                    }
                });

        SingleOutputStreamOperator<Tuple2<String, Integer>> result =
                wordAndOne.keyBy(t -> t.f0).sum(1);

        // TODO 3.sink
        result.print();

        // TODO 4.execute
        env.execute();

    }

}

7.4.自定义Source–随机订单数量

注意:lombok的使用

<dependency>
    <groupId>org.projectlombokgroupId>
    <artifactId>lombokartifactId>
    <version>1.18.2version>
    <scope>providedscope>
dependency>
package demo3;

import lombok.AllArgsConstructor;
import lombok.Data;
import lombok.NoArgsConstructor;

/**
 * @author tuzuoquan
 * @date 2022/4/2 0:02
 */
@Data
@AllArgsConstructor
@NoArgsConstructor
public class Order {
    private String id;
    private Integer userId;
    private Integer money;
    private Long createTime;
    
}

7.4.1.自定义Source

随机生成数据

Flink还提供了数据源接口,我们实现该接口就可以实现自定义数据源,不同的接口有不同的功能,分类如下:
SourceFunction: 非并行数据源(并行度只能 = 1)
RichSourceFunction: 多功能非并行数据源(并行度只能 = 1)
ParallelSourceFunction: 并行数据源(并行度能够 >= 1)
RichParallelSourceFunction: 多功能并行数据源(并行度能够 >= 1)

需求
每隔1秒随机生成一条订单信息(订单ID、用户ID、订单金额、时间戳)

要求:

  • 随机生成订单ID (UUID)
  • 随机生成用户ID (0 - 2)
  • 随机生成订单金额 (0 - 100)
  • 时间戳为当前的系统时间

ItVuer - 免责声明 - 关于我们 - 联系我们

本网站信息来源于互联网,如有侵权请联系:561261067@qq.com

桂ICP备16001015号